Bug 540598

Summary: Encoding of cyrillic filenames in zip archive, created under Windows, is incorrect
Product: [openSUSE] openSUSE 11.3 Reporter: Pavel Baranchikov <pavel>
Component: BasesystemAssignee: E-mail List <bnc-team-screening>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: aj, anixx, bruno, dvaleev, forgotten_2tZuqUHVIE, lazy.kent, meissner, msvec, pth
Version: Final   
Target Milestone: Final   
Hardware: i686   
OS: SUSE Other   
Whiteboard: maint:running:33540:low maint:released:11.2:33674
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: An archive file with cyrillic file names included
file with problem
screenshot of file roller
the same file opened in Ark/KDE3
what I see in embeeded viewer

Description Pavel Baranchikov 2009-09-21 05:38:29 UTC
Created attachment 319015 [details]
An archive file with cyrillic file names included

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.0.13) Gecko/2009080200 SUSE/3.0.13-0.1.2 Firefox/3.0.13

There are several discussions about the problem concerning cyrillic filenames in zip archives and unzip package. Unzip out-of-the-box (compiled from sources) does not choose filenames encoding correctly.

Developers from Ark say me, that the error is completely from info-zip project (https://bugs.kde.org/show_bug.cgi?id=204984).

There are sime patches to info-zip's unzip package, that makes unzip extract filenames with correct encoding. But maintainers of info-zip project rejected these patches (http://www.info-zip.org/board/board.pl?m-1248086794).

It would be nice to include this package in main openSuSE distribution.

Reproducible: Always

Steps to Reproduce:
1. Create zip-archive, containing files with cyrillic names under Windows.
2. Try to open it with unzip under SuSE
Actual Results:  
Filename encoding is incorrect. Example:

pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip
Archive:  ReportPacket_DBV90821CJ.zip
  inflating: ???????? ????? (????????).pdf
  inflating: ???????? ????? (??????????).pdf


Expected Results:  
Results, produced with natspec patch from sisyphus

pavel@rzn-sepak-bpa:~/backup> pavel@rzn-sepak-bpa:~/temp> unzip ReportPacket_DBV90821CJ.zip
Archive:  ReportPacket_DBV90821CJ.zip
  inflating: ????????? ????? (??????????).pdf
  inflating: ????????? ????? (??????????).pdf
Comment 1 Dinar Valeev 2010-03-18 11:18:21 UTC
We have found solution. But it requires additional libraries to convert file names on the fly. 

The library is librcc especially created for handling non utf encoded file names.

How we can proceed then? RPM packages are built on OBS and tested. Should we create submit request? 

The librcc and patched unzip are here:
http://download.opensuse.org/repositories/home:/Lazy_Kent/openSUSE_11.2/
Comment 2 Philipp Thomas 2010-03-18 13:45:55 UTC
*** Bug 575715 has been marked as a duplicate of this bug. ***
Comment 3 Alexander Naumov 2010-04-01 14:20:43 UTC
Submit request:

https://build.opensuse.org/stage/request/diff/34833
Comment 4 Forgotten User 2tZuqUHVIE 2010-04-07 12:27:17 UTC
This is also a problem with the letters 'æ ø å' used in some of the Scandinavian alphabets.

It's also an issue for tar's created by 7zip on Windows.

Unzip 6.0 and the packages from home:/Lazy_Kent/openSUSE_11.2/ mentioned in comment 1 doesn't change anything on my system. (11.2 x86_64)
Comment 5 Kyrill Detinov 2010-05-03 17:28:57 UTC
I made a submit request to Factory:
https://build.opensuse.org/request/diff/39326

Confirmed, it works at least with Russian, Czech and Slovak.
http://lizards.opensuse.org/2010/04/07/call-for-testing-unzip-feature/

% LANG=cs_CZ.utf8 unzip -l test-cz.zip                 
Archive:  test-cz.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
      117  03-18-10 15:24   aábcčdďeéěfghchiíjklmnňoópqrřsštťuúůvwxyýzžAÁBCČDĎEÉĚFGHCHIÍJKLMNŇOÓPQRŘSŠTŤUÚŮVWXYÝZŽ.txt
 --------                   -------
      117                   1 file
Comment 6 Philipp Thomas 2010-05-04 17:25:46 UTC
I won't accept the patch for openSUSE because upstream doesn't accept it and openSUSE would have to maintain this patch indefinitely. If this or a similiar patch gets accepted upstream I'll help in backporting it.
Comment 7 Ilya Chernykh 2010-05-04 17:38:04 UTC
What about changing the file to Sisyphus' patched version? If openSUSe cannot maintain it, let's Alt Linux team do the maintenance and regard them as upstream of a forked version?
Comment 8 Ilya Chernykh 2010-05-04 17:41:08 UTC
Well it is really annoying: nobody can open archives made under Windows. People of business say Linux is buggy: it even cannot open archives properly. The same say government officials.
Comment 9 Dinar Valeev 2010-05-09 23:25:33 UTC
@Philipp
Chances to push this patch to upstream is very small or even not possible at all. Other distributions tried to accomplish that without success.

The upstream statement is: The trend in IT is to use UTF8.
That's why patch is not accepted. 

Then why we can't accept this patch as openSUSE specific to close such annoying bug? And maintain it until good time comes. openSUSE maintain a numbers of specific patches for rpm, OpenOffice.org

If you won't maintain patch, please let community to do it. 
The patch is small. It introduces new header and changes few strings of main code. 

We tested patched unzip for two-three months and it just works. Also we got positive feedback on Czech and Slovak in addition to Russian language.
It also pretty applicable on latest 6.0 unzip version.
Comment 10 Philipp Thomas 2010-05-10 17:29:43 UTC
OK, after thinking about this I have added the patch to our unzip and will keep it at least as long as the package builds and the patch doesn't need extra work. Kyrilk, would you be willing to act as co-maintainer? Or to ask more more broadly, would anyone of you be willing to comaintain zip/unzip?
I'll also try to get an update for 11.2 out of the door.
Comment 11 Kyrill Detinov 2010-05-10 18:13:48 UTC
Philipp, I made sr#39767.

At the moment we have librcc0 in Factory only. So we may build patched unzip against Factory.
I added %if 0%{?suse_version} > 1120 for all the chahges.

> would you be willing to act as co-maintainer?

Yes, I'd like to take this role.
Comment 12 Marcus Meissner 2010-05-20 15:24:26 UTC
do we really want to take 2 new libraries for 11.2? not sure.
Comment 13 Bruno Friedmann 2010-05-20 16:46:27 UTC
In reply to C12
More & more customers are having incoming zip from differents encodings and it's really a pain to explain, oh this zip should be unzip under window to get the right encoding. We look like charlot.

So as 11.2 as a long life in front of it, yes I'm voting for having it include as fast as possible. The bug start under 11.2, so I feel it's better to close it on 11.2, and be sure it was integrated in 11.3

Or (I'm only seeing ma world part :-) ) there's a much complicated implication, if so it should be explain.
Comment 14 Christian Dengler 2010-05-25 13:43:06 UTC
I'm not happy about adding two new libraries to a released product, but in this case I think it should be fine if someone will maintain them. (+1)
Comment 15 Marcus Meissner 2010-05-25 13:49:35 UTC
so lets do it. :)
Comment 16 Swamp Workflow Management 2010-05-25 13:56:21 UTC
The SWAMPID for this issue is 33540.
This issue was rated as low.
Please submit fixed packages as soon as possible.
Also create a patchinfo file using this link:
https://swamp.suse.de/webswamp/wf/33540
Comment 17 Christian Dengler 2010-05-25 14:00:53 UTC
Update process started ... be so kind and submit fixed sources and a patchinfo.
Comment 18 Philipp Thomas 2010-05-26 09:24:54 UTC
@Marcus: which is the second new library? unzip only needs librcc0.
Comment 19 Marcus Meissner 2010-05-26 09:27:04 UTC
librcc however requires librcd
Comment 20 Swamp Workflow Management 2010-06-17 12:22:32 UTC
Update released for: librcc-devel, librcc0, librcc0-debuginfo, librcc0-debugsource, librcd-devel, librcd0, librcd0-debuginfo, librcd0-debugsource, rcc-runtime, rcc-runtime-debuginfo, unzip, unzip-debuginfo, unzip-debugsource
Products:
openSUSE 11.2 (debug, i586, x86_64)
Comment 21 Christian Dengler 2010-06-17 12:24:15 UTC
Update released after a long testing phase in the test update channel.

Closing.
Comment 22 Ilya Chernykh 2010-09-22 19:37:56 UTC
Still does not work in File Roller under OpenSUSE 11.3.
Comment 23 Ilya Chernykh 2010-09-22 19:39:57 UTC
Created attachment 391041 [details]
file with problem
Comment 24 Ilya Chernykh 2010-09-22 19:40:25 UTC
Created attachment 391042 [details]
screenshot of file roller
Comment 25 Ilya Chernykh 2010-09-22 19:40:53 UTC
The same file (bug.zip) opens well with Ark from KDE3.
Comment 26 Ilya Chernykh 2010-09-22 19:42:18 UTC
Created attachment 391043 [details]
the same file opened in Ark/KDE3
Comment 27 Kyrill Detinov 2010-09-23 04:10:46 UTC
Works OK.

% unzip -l bug-540598_bug.zip
Archive:  bug-540598_bug.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
    72704  09-20-10 23:11   Коммерческое предложение..doc
   388608  09-20-10 23:11   прайс на палатки и снаряжение14.09.2010.xls
 --------                   -------
   461312                   2 files

Open a bug against File Roller. No problem with unzip.
Comment 28 Ilya Chernykh 2010-09-23 04:38:51 UTC
Does File Roller use unzip in this case?
Comment 29 Kyrill Detinov 2010-09-23 14:36:25 UTC
It should use unzip. But I found an interesting bugreport:
https://bugzilla.gnome.org/show_bug.cgi?id=611257
Comment 30 Ilya Chernykh 2010-09-23 15:03:16 UTC
Удалил p7zip. Теперь в File Roller все нормально, но встроенный просмотрщик архивов в КДЕ3 все равно показывает мусор (в Ark все нормально).
Comment 31 Ilya Chernykh 2010-09-23 15:05:08 UTC
Removed p7zip. Now all OK in File Roller, but embeeded viewer in KDE3 still shows garbage (in Ark all OK).
Comment 32 Ilya Chernykh 2010-09-23 15:05:32 UTC
Created attachment 391237 [details]
what I see in embeeded viewer
Comment 33 Kyrill Detinov 2010-09-23 16:43:58 UTC
Same here. Krusader 1.90.0 shows file names correctly.
As you know, nobody interested to fix KDE3 bugs.
Comment 34 Ilya Chernykh 2010-09-23 17:36:49 UTC
Maybe this bug is fixed in Trinity. If not, it is possible to make a bugreport.
Comment 35 Bernhard Wiedemann 2016-04-15 09:54:11 UTC
This is an autogenerated message for OBS integration:
This bug (540598) was mentioned in
https://build.opensuse.org/request/show/39794 Factory / unzip
https://build.opensuse.org/request/show/40783 11.2 / librcd0
https://build.opensuse.org/request/show/40784 11.2 / librcc0
https://build.opensuse.org/request/show/40785 11.2 / unzip
https://build.opensuse.org/request/show/40799 11.2:Test / unzip