Bug 836558 - MidnightCommander: long file-names in tar archives truncated
MidnightCommander: long file-names in tar archives truncated
Status: RESOLVED UPSTREAM
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other
13.1 Milestone 4
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Jochen Keil
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-24 18:57 UTC by Dirk Weber
Modified: 2021-07-25 09:11 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
a sample extfs for mc which correctly lists and copies out long file names (1.80 KB, text/plain)
2013-08-24 18:57 UTC, Dirk Weber
Details
sed script (544 bytes, text/plain)
2013-09-28 12:55 UTC, Tomas Cech
Details
improved untar extfs (GNU tar only, polished awk part) related to http://lists.opensuse.org/opensuse-factory/2013-08/msg00486.html do not pretend to support anything but GNU tar (1.96 KB, text/plain)
2013-10-12 18:59 UTC, Dirk Weber
Details
improved untar extfs which handles file names which happen to match permissiond, time, date, size (2.08 KB, text/plain)
2013-11-12 19:22 UTC, Dirk Weber
Details
improved untar extfs which handles copy out of archive members beginning with ./ (4.94 KB, application/x-shellscript)
2017-08-13 07:30 UTC, Dirk Weber
Details
improved untar extfs with support for star (5.72 KB, text/plain)
2018-06-04 17:25 UTC, Dirk Weber
Details
improved untar extfs with optional workaround for mc-4.8.25 (8.89 KB, application/x-shellscript)
2020-10-22 17:37 UTC, Dirk Weber
Details
updated untar extfs (for mc-4.8.26 and below) (9.89 KB, application/x-shellscript)
2021-07-25 09:11 UTC, Dirk Weber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Weber 2013-08-24 18:57:35 UTC
Created attachment 554071 [details]
a sample extfs for mc which correctly lists and copies out long file names

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0

tar archives support in MidnightCommander:
tar archives containing long file-names or long path-name + file-name, e.g. resulting from nested sub-directories: file names are truncated.

It seems the length of the path-name + file-name is limited to 100
characters, longer names are truncated.
They are displayed truncated in the mc panel, and when the file (or its
parent directory in the archive) is copied out of the archive with F5 it
will have a truncated name.
Therefore deeply nested tar archives can not safely be un-tarred with
mc.

There exist different formats of tar archives, "newer" formats (e.g. posix which is now the default) allow long file names (unlimited length).

The problem happens in openSUSE 12.3 and also 13.1M4.


Reproducible: Always

Steps to Reproduce:
mkdir testdir
echo test >
"testdir/testname_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789.txt"
tar cJf testdir.tar.xz testdir

start mc, enter testdir.tar.xz
Actual Results:  
the displayed file name is limited to 
testname_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123

copy out (F5) will create the file with the truncated name.

The tar archive itself is correct and tar xJf correctly restores the
full file-names.


Expected Results:  
the displayed file names or the file names created during copy out (F5) should be the correct file names recorded in the tar archive

The attached VFS "untar" uses the installed tar (or star or gtar) to browse tar archives with mc and to copy out files, therefore avoiding the limitation of the mc internal tar
list and copyout working 
rm, mkdir, rmdir partly working (only uncompressed tar archives)
copyin works partly (only uncompressed tar archives, archive rootdir)
Comment 1 Tomas Cech 2013-09-27 20:42:46 UTC
Reproduced. It seems that filename size limit is hardcoded in src/vfs/tar/tar.c
Comment 2 David Haller 2013-09-28 09:28:09 UTC
Thomas: yes. I said so some time ago ... *dig* in ah, yes:
http://lists.opensuse.org/opensuse-factory/2013-08/msg00464.html and esp. http://lists.opensuse.org/opensuse-factory/2013-08/msg00466.html
http://lists.opensuse.org/opensuse-factory/2013-08/msg00468.html
See also https://www.midnight-commander.org/ticket/2201


Thanks Dirk, for the start of an utar script! I'll have to check it out
though[1] to even put it in with a req on GNU tar. And the check for a
emtpy "$TAR" should be moved up to the assignment.

Your quoting of variables is fine, which is often neglected, but your awk
needs some serious polishing. Parsing the 'tar tvaf' output via char-positions
is just a no-no. Use awk's field-splitting as it was intended. But also: POSIX
tar (at least star) does not support 'tar tvaf' and -a is incompatible anyway:

        -a,-atime       reset access time after storing file

and the format of '-t -v' output of gtar/star differs as well (and star lists
files on stderr!)  [tar is gtar here]:

$ star -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>/dev/null | wc -l
0
$ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>/dev/null | wc -l
1724
$ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 >/dev/null | wc -l
0

$ star -t -v -j -f xemacs-21.4.22.tar.bz2 2>&1 | head -2
      0 drwxr-xr-x  acs/acs Dec 31 03:55 2008 xemacs-21.4.22/
    888 -rw-r--r--  acs/acs Mar 28 03:27 1997 xemacs-21.4.22/BUGS
$ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>&1 | head -2
drwxr-xr-x acs/acs           0 2008-12-31 03:55 xemacs-21.4.22/
-rw-r--r-- acs/acs         888 1997-03-28 03:27 xemacs-21.4.22/BUGS

So, I strongly think I'll scrap the shell-script idea and use perl and
Archive::Tar. It is POSIX compatible, allows access to file attributes via a
hash with 'stat'-like fields, and support gzip and bzip2 compressed files. 

Perl has support for e.g. Lzma, Lzop, Xz, and Zip... On the other hand, that'd
get us into dependency hell (requiring lmza, lzop, xz, etc.).

This needs more work. mc should be able to handle the (de)compression via
extfs/archive.sh or lib/util.* and then "extfs/utar" would have to handle only
uncompressed archives. That should work, c.f. 'deb*' and 'uar' ...

Anyway: the problem isn't fixed by just adding e.g. Dirk's "utar" script.
Read the whole thread on os-factory mentioned above ...

Upstream hasn't come up with a extfs-based-on-gnu-tar (or whatever) in
years ... It's not easy. I moved it up on my mental priorty list and if
I feel like it, I might improve your script to explicitly and only use
gnu-tar (version > x) to implement a workaround.

-dnh

[1] I've got a dir-hierarchy with "sick filenames" (including everything
    I could come up with that is not ASCII-NUL or '/' so e.g. linebreaks,
    *, ', ", \r, ... ;)
Comment 3 Tomas Cech 2013-09-28 12:55:37 UTC
Created attachment 560684 [details]
sed script

(In reply to comment #2)
> Thomas: yes. I said so some time ago ... *dig* in ah, yes:
> http://lists.opensuse.org/opensuse-factory/2013-08/msg00464.html and esp.
> http://lists.opensuse.org/opensuse-factory/2013-08/msg00466.html
> http://lists.opensuse.org/opensuse-factory/2013-08/msg00468.html
> See also https://www.midnight-commander.org/ticket/2201

I already read that. I found that after my comment and looking into code a bit.

> Thanks Dirk, for the start of an utar script! I'll have to check it out
> though[1] to even put it in with a req on GNU tar. And the check for a
> emtpy "$TAR" should be moved up to the assignment.

Yes, good work indeed.

> Your quoting of variables is fine, which is often neglected, but your awk
> needs some serious polishing. Parsing the 'tar tvaf' output via char-positions
> is just a no-no. Use awk's field-splitting as it was intended.

I've attached equivalent for awk code using sed with use of regexp, not nice but I believe it should work.

> But also: POSIX
> tar (at least star) does not support 'tar tvaf' and -a is incompatible anyway:
> 
>         -a,-atime       reset access time after storing file
> 
> and the format of '-t -v' output of gtar/star differs as well (and star lists
> files on stderr!)  [tar is gtar here]:
> 
> $ star -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>/dev/null | wc -l
> 0
> $ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>/dev/null | wc -l
> 1724
> $ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 >/dev/null | wc -l
> 0
> 
> $ star -t -v -j -f xemacs-21.4.22.tar.bz2 2>&1 | head -2
>       0 drwxr-xr-x  acs/acs Dec 31 03:55 2008 xemacs-21.4.22/
>     888 -rw-r--r--  acs/acs Mar 28 03:27 1997 xemacs-21.4.22/BUGS
> $ tar -t -v -j -f /newsw/xemacs-21.4.22.tar.bz2 2>&1 | head -2
> drwxr-xr-x acs/acs           0 2008-12-31 03:55 xemacs-21.4.22/
> -rw-r--r-- acs/acs         888 1997-03-28 03:27 xemacs-21.4.22/BUGS

Is really necessary to support all tar flavours? This will make things too complicated. I have never met an archive that couldn't be opened by GNU tar. I don't say it doesn't exist, I say that in my whole Linux history I haven't seen one.

> So, I strongly think I'll scrap the shell-script idea and use perl and
> Archive::Tar. It is POSIX compatible, allows access to file attributes via a
> hash with 'stat'-like fields, and support gzip and bzip2 compressed files.

Please no, do not bring more dependencies. And besides, less perl code could make the world more happy place :b

> Perl has support for e.g. Lzma, Lzop, Xz, and Zip... On the other hand, that'd
> get us into dependency hell (requiring lmza, lzop, xz, etc.).
> 
> This needs more work. mc should be able to handle the (de)compression via
> extfs/archive.sh or lib/util.* and then "extfs/utar" would have to handle only
> uncompressed archives. That should work, c.f. 'deb*' and 'uar' ...
> 
> Anyway: the problem isn't fixed by just adding e.g. Dirk's "utar" script.
> Read the whole thread on os-factory mentioned above ...

I read it all but I can't see why...

> Upstream hasn't come up with a extfs-based-on-gnu-tar (or whatever) in
> years ... It's not easy. I moved it up on my mental priorty list and if
> I feel like it, I might improve your script to explicitly and only use
> gnu-tar (version > x) to implement a workaround.

Let's use this first script to fix this bug and see if there will be issues with that, if even someone will lack such support. So yes, I'd say it's way to go.

> [1] I've got a dir-hierarchy with "sick filenames" (including everything
>     I could come up with that is not ASCII-NUL or '/' so e.g. linebreaks,
>     *, ', ", \r, ... ;)

This sounds like good test.
Comment 4 Dirk Weber 2013-10-12 18:59:07 UTC
Created attachment 563198 [details]
improved untar extfs (GNU tar only, polished awk part)

related to 
http://lists.opensuse.org/opensuse-factory/2013-08/msg00486.html
do not pretend to support anything but GNU tar
Comment 5 Dirk Weber 2013-10-16 18:34:52 UTC
(In reply to comment #4)
> Created an attachment (id=563198) [details]
> improved untar extfs (GNU tar only, polished awk part)

for openSUSE with GNU tar as default this is probably sufficient.
for upstream and different tar implementations probably the best solution would be to have separate branches for each (common) tar implementation with the specific parameters and reformatting the specific output.
Comment 6 Dirk Weber 2013-11-12 19:22:10 UTC
Created attachment 567156 [details]
improved untar extfs which handles file names which happen to match permissiond, time, date, size

during checking bnc#849082 I realized a weakness of the previously provided untar extfs by matching the "file name" $6 with the line.
The new version uses the time $5 as hook. This pattern should not appear on the line before the correct occurrence.
Comment 7 Chenzi Cao 2015-05-04 10:03:23 UTC
Hi Vladimir, would you please help to have a look at this issue? If it is fixed, would you please help to close it? Thank you!
Comment 8 Dirk Weber 2015-05-05 08:21:34 UTC
(In reply to Chenzi Cao from comment #7)
> Hi Vladimir, would you please help to have a look at this issue? If it is
> fixed, would you please help to close it? Thank you!

The issue still exists with mc-4.8.14-1.1.x86_64 from current Tumbleweed.
As I am using my untar extfs I am not bothered by it.
Comment 9 Jochen Keil 2015-06-02 12:53:57 UTC
After digging a bit through the source I've decided to reopened #2201.

https://www.midnight-commander.org/ticket/2201#comment:12
Comment 10 Jochen Keil 2015-06-03 12:39:29 UTC
This is an upstream issue which will be dealt with during the internal tar rewrite. Please refer to upstream ticket posted previously. In the mean time, affected could use the proposed extfs workaround.
Comment 11 Dirk Weber 2017-08-13 07:30:30 UTC
Created attachment 736387 [details]
improved untar extfs which handles copy out of archive members beginning with ./

The previous version of the "untar" extfs could not copy out members from tar archives with path/filename starting with "./".
Such names can be the result of "find" generated file lists as input for the creation of a tar archive.

Besides, this version contains experimental implementation to support other tar versions then GNU tar. GNU tar is the default.
Comment 12 Dirk Weber 2018-06-04 17:25:25 UTC
Created attachment 772390 [details]
improved untar extfs with support for star

Related to the upgrade to Leap 15.0 I reviewed the untar extfs and made some improvements. "star" support is now mainly working (list and copyout). I also added comments to document the status and possible problems. The default is GNU tar and star is tried as fallback if GNU tar is missing. As third but least polished option bsdtar is used.

To use this extfs for a user instead of the mc builtin handling do the following:

1) create the directory ~/.local/share/mc/extfs.d if it does not exist.

2) copy the attached "untar" script to this directory (~/.local/share/mc/extfs.d/untar) and make it executable.

3) in the directory ~/.config/mc edit the file mc.ext (if the file does not exist yet copy /etc/mc/mc.ext to ~/.config/mc/mc.ext). In this file change all occurrences of "utar" (the built in mc tarfile handling) by "untar". From now on mc will use the untar script to enter tar archives. 

The changes can be undone any time if you experience problems with the script.
Comment 13 Dirk Weber 2020-10-22 17:37:55 UTC
Created attachment 842943 [details]
improved untar extfs with optional workaround for mc-4.8.25

another improvement cycle...

I noticed the last version stopped working with
mc-4.8.25 in Tumbleweed.
It turns out this version of mc does not
accept file names containing "./", but this is
added as a filename delimiter by the untar extfs module
to allow archive members beginning with e.g. a blank.

So this version contains an alternative line to print
the list output filename without ./ delimiter at the beginning.
This works with mc-4.8.25 - see related comment in the code -
but breaks handling of archive member names beginning with
blank or other problematic characters and therefore
the standard behavior is to use the ./ delimiter.
Switch the comment on lines 90 / 92 for mc-4.8.25 support.
You notice that you need it when you enter an archive in mc which
contains files but the panel stays empty.

Besides, for the GNU tar mode this new version uses the --quoting-style
option for better filename delimitation.
Comment 14 Dirk Weber 2021-07-25 09:11:28 UTC
Created attachment 851211 [details]
updated untar extfs (for mc-4.8.26 and below)

The original issue still exists with mc-4.8.26.

But mc-4.8.26 as in openSUSE Leap 15.3 can again work with archive members starting with ./ if it gets them from an extfs module. 
Therefore add ./ to the beginning of filenames when needed, e.g. when the filename starts with  [:blank:] or "-".

This version also works with mc versions < 4.8.26, but with mc-4.8.25 archive members starting with [:blank:] or "-" will be skipped.

Improved escaping and improved usage of tar implementations other than GNU tar, i.e. star and bsdtar.