Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for over-long filepaths via GNU extension and fix hardlinks #77

Merged
merged 26 commits into from
Nov 19, 2023

Conversation

Bodigrim
Copy link
Contributor

This is #50 rebased, all credit due to @hasufell.

@hasufell
Copy link
Member

@Bodigrim do you want a review for this? It's quite a long time ago I wrote this.

@Bodigrim Bodigrim force-pushed the gnu-tar-long-filenames branch 3 times, most recently from 11d6191 to 33ddcda Compare November 17, 2023 23:59
@Bodigrim
Copy link
Contributor Author

@hasufell do you remember how you tested this branch long ago?

@hasufell
Copy link
Member

@hasufell do you remember how you tested this branch long ago?

Pretty sure I used actual tar archives. I can try to give this a go maybe.

@hasufell
Copy link
Member

Quite interestingly htar hangs on all operations indefinitely for me. I sprinkled traces all over the place but could not detect a loop. The codebase abuses laziness heavily, so it's hard to find.

@hasufell
Copy link
Member

So, to test long filepaths, I changed the arbitrary instance like so:

diff --git a/test/Codec/Archive/Tar/Types/Tests.hs b/test/Codec/Archive/Tar/Types/Tests.hs
index d2acc43..912bebb 100644
--- a/test/Codec/Archive/Tar/Types/Tests.hs
+++ b/test/Codec/Archive/Tar/Types/Tests.hs
@@ -48,7 +48,7 @@ instance Arbitrary TarPath where
   arbitrary = either error id
             . toTarPath False
             . FilePath.Posix.joinPath
-          <$> listOf1ToN (255 `div` 5)
+          <$> listOf1ToN 500
                          (elements (map (replicate 4) "abcd"))

   shrink = map (either error id . toTarPath False)

And then discovered that commit 0af50df seems to break the code. If I revert it, the test suite passes.

@hasufell
Copy link
Member

And then discovered that commit 0af50df seems to break the code. If I revert it, the test suite passes

I was able to fix it this way:

diff --git a/test/Codec/Archive/Tar/Types/Tests.hs b/test/Codec/Archive/Tar/Types/Tests.hs
index d2acc43..134b141 100644
--- a/test/Codec/Archive/Tar/Types/Tests.hs
+++ b/test/Codec/Archive/Tar/Types/Tests.hs
@@ -45,19 +45,26 @@ instance Arbitrary Entry where
       | perms' <- shrinkIntegral perms ]

 instance Arbitrary TarPath where
-  arbitrary = either error id
-            . toTarPath False
+  arbitrary = fromTarPathResult
+            . toTarPath' False
             . FilePath.Posix.joinPath
-          <$> listOf1ToN (255 `div` 5)
+          <$> listOf1ToN 500
                          (elements (map (replicate 4) "abcd"))

-  shrink = map (either error id . toTarPath False)
+  shrink = map (fromTarPathResult . toTarPath' False)
          . map FilePath.Posix.joinPath
          . filter (not . null)
          . shrinkList shrinkNothing
          . FilePath.Posix.splitPath
          . fromTarPathToPosixPath

+-- errors only on FileNameEmpty, because long filenames are supported
+fromTarPathResult :: ToTarPathResult -> TarPath
+fromTarPathResult FileNameEmpty = error "File name empty"
+fromTarPathResult (FileNameOK tp) = tp
+fromTarPathResult (FileNameTooLong tp) = tp
+
+
 instance Arbitrary LinkTarget where
   arbitrary = maybe (error "link target too large") id
             . toLinkTarget

@hasufell
Copy link
Member

hasufell commented Nov 18, 2023

This actually makes me question the code. See the tests here:

prop_write_read_ustar :: [Entry] -> Bool
prop_write_read_ustar entries =
foldr Next Done entries' == read (write entries')
where
entries' = [ e { entryFormat = UstarFormat } | e <- entries ]
prop_write_read_gnu :: [Entry] -> Bool
prop_write_read_gnu entries =
foldr Next Done entries' == read (write entries')
where
entries' = [ e { entryFormat = GnuFormat } | e <- entries ]
prop_write_read_v7 :: [Entry] -> Bool
prop_write_read_v7 entries =
foldr Next Done entries' == read (write entries')
where
entries' = [ limitToV7FormatCompat e { entryFormat = V7Format }
| e <- entries ]

I'm not sure we would expect long filenames to work under non-GNU formats?

@hasufell
Copy link
Member

ustar and v7 formats don't support it: https://www.gnu.org/software/tar/manual/html_chapter/Formats.html

My tests show that tar cli utility is permissive when it comes to unpacking/listing, but rejects writing such archives when you specify --format=ustar for example.

It doesn't seem we allow great control to the users over the format, but the following patch might be an improvement in strictness anyway (and will also make ustar and v7 property tests fail with long filenames, which I would expect):

diff --git a/Codec/Archive/Tar/Write.hs b/Codec/Archive/Tar/Write.hs
index 0d76e14..f3dcf5e 100644
--- a/Codec/Archive/Tar/Write.hs
+++ b/Codec/Archive/Tar/Write.hs
@@ -38,6 +38,8 @@ write es = LBS.concat $ map putEntry es ++ [LBS.replicate (512*2) 0]
 putEntry :: Entry -> LBS.ByteString
 putEntry entry = case entryContent entry of
   NormalFile       content size -> LBS.concat [ header, content, padding size ]
+  OtherEntryType 'L' _ _
+    | entryFormat entry /= GnuFormat -> error "Long filename support is a GNU extension"
   OtherEntryType _ content size -> LBS.concat [ header, content, padding size ]
   _                             -> header
   where

@hasufell
Copy link
Member

And sorry to bother for more, but I think we should also support

/* Identifies the next file on the tape as having a long linkname. */
#define GNUTYPE_LONGLINK 'K'

https://www.gnu.org/software/tar/manual/html_node/Standard.html

Otherwise it seems rather inconsistent. I can write a patch for that.

@Bodigrim
Copy link
Contributor Author

Otherwise it seems rather inconsistent. I can write a patch for that.

Sure, that would be most appreciated.

Shall we merge this then? I tested it on examples from #49, seems doing well.

@hasufell
Copy link
Member

Shall we merge this then?

I'd rather have it cleaned up a bit more and add tests with adjusted arbitrary instance? There are also a few other things I'm starting to notice. E.g. handling of unknown filetypes isn't entirely correct. From the spec:

The typeflag field specifies the type of file archived. If a particular implementation does not recognize or permit the specified type, the file will be extracted as if it were a regular file. As this action occurs, tar issues a warning to the standard error.

@Bodigrim Bodigrim force-pushed the gnu-tar-long-filenames branch 2 times, most recently from a0c8b4a to 4c0c755 Compare November 19, 2023 00:21
@Bodigrim
Copy link
Contributor Author

I've added a test to check that pack/unpack roundtrips with long file names + restricted long names to GnuFormat. It would be nice to test symlinks, but I'd rather do it in a separate PR later.

@hasufell
Copy link
Member

It would be nice to test symlinks, but I'd rather do it in a separate PR later

I'm already done, just adding tests. Should I push to this PR or create another one?

@hasufell
Copy link
Member

hasufell commented Nov 19, 2023

@Bodigrim
Copy link
Contributor Author

Hmm, something feels off: both

cabal run htar -- --list test/data/long.tar

and even

tar --list test/data/long.tar

just freeze on my machine. And yet the test suite succeeds. How is it possible?..

@hasufell
Copy link
Member

htar needs an explicit --file= parameter, otherwise it expects stdin.

Not sure why tar wouldn't be able to read it. I can check.

@Bodigrim
Copy link
Contributor Author

I'm on MacOS, so it's bsdtar.

$ tar --version
bsdtar 3.5.3 - libarchive 3.5.3 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8

It might not support GNU extensions well indeed...

@Bodigrim
Copy link
Contributor Author

Nevermind, tar needs f as well. All good.

@hasufell
Copy link
Member

Btw, now I remember when I debug printed filepaths, that they had a trailing NUL char in unpack.

Most filepath operations rely on withCString to implicitly truncate, but that may no longer be true when using OsPath at some point.

It may make sense to sanitize them after extraction from the headers.

@Bodigrim Bodigrim force-pushed the gnu-tar-long-filenames branch 2 times, most recently from ff975c0 to ffba5e9 Compare November 19, 2023 15:23
@Bodigrim Bodigrim merged commit 3075222 into master Nov 19, 2023
30 checks passed
@Bodigrim
Copy link
Contributor Author

Let's go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants