Bug 135497

Summary: perl decode_utf8 function broken?
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Bjoern Jacke <bjacke>
Component: BasesystemAssignee: Michael Schröder <mls>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: andreas.hanke, lmuelle, mls, torgen
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: patch to make convmv work with Perl 5.8.7

Description Bjoern Jacke 2005-11-24 17:37:33 UTC
using convmv on SuSE 10.0 (perl 5.8.7) is broken with it's utf-8 detection routines:

in a directory with a latin1-encoded file named "für" do this:

# convmv -f latin1 -t utf8 -r für
Skipping, already UTF-8: ./für

though "für" is not UTF-8 encoded. This does not look like a convmv bug to me but like a malfunction of Perl's decode_utf8 which is used in convmv's looks_like_utf8 function. This used to work with all perl versions having the necessary unicode support (5.8.0 - 5.8.6).
decode_utf8 of a non-utf-8 string does not seem to return "false".
Comment 1 Michael Schröder 2005-11-29 10:42:05 UTC
This was on p5porters yesterday, they said that it is a bug to assume that decode_utf8 returns false on error. Not a perl bug.
Reassigned to convmv maintainer.
Comment 2 Bjoern Jacke 2005-11-29 11:17:04 UTC
from man perluniintro:

       o   How Do I Detect Data That's Not Valid In a Particular Encoding?

           Use the "Encode" package to try converting it.  For example,

               use Encode 'decode_utf8';
               if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {
                   # valid
               } else {
                   # invalid
               }

If the Perl (::Encode) hackers change the behaviour of their functions whenever they like to and not even change the documentation, which explictly says that the function should behave this (old) way ... cool. Please drop Perl ;-)
Comment 3 Michael Schröder 2005-11-29 11:22:21 UTC
Good point. I'll ask them.
Comment 4 Bjoern Jacke 2005-12-08 19:31:57 UTC
reassign to perl maintainer, whatever you want to do :-)
For Mike - as a workaround, please try the attached diff (the 2 hunks might have some offsets, sorry) to make convmv work with the broken decode_utf8
Comment 5 Bjoern Jacke 2005-12-08 19:33:37 UTC
Created attachment 60134 [details]
patch to make convmv work with Perl 5.8.7
Comment 6 Bjoern Jacke 2005-12-09 17:13:16 UTC
having extended the regression test i found that another small fix in the workaround was needed. 1.09 is out right now fixing exactly this topic.
But what should be fixed in the end is the Perl bug...
Comment 7 Mike Fabian 2005-12-12 13:43:04 UTC
NEEDINFO → Andreas Jaeger <aj@suse.de>.

Andreas, can we make a YOU update from convmv 1.08 to convmv 1.09 for
SuSE Linux 10.0?

Currently convmv doesn't work at all on SuSE Linux 10.0, with
Björn's update it works again.

Comment 8 Andreas Jaeger 2005-12-12 13:57:48 UTC
Let's fix this for 10.1, only.
Comment 9 Mike Fabian 2005-12-12 14:14:44 UTC
I already submitted convmv 1.09 to STABLE, i.e. this is already fixed
for 10.1. And I uploaded updated packages of convmv to

    ftp://ftp.suse.com/pub/projects/m17n/10.0/

But why not fix it for 10.0? It's quite easy, isn't it?
And currently convmv is useless on 10.0.




Comment 11 Andreas Jaeger 2005-12-14 08:17:25 UTC
Mike, where is convmv used?  I don't see this as a critical component.  Our policy is to fix security issues and critical bug fixes only, we cannot fix every bug that's out there and is not catched during the beta phase.
Comment 12 Andreas Jaeger 2005-12-16 14:40:11 UTC
Update approved after some offline discussion - swamp ID is   	 Maintenance-Tracker-3207
Comment 13 Mike Fabian 2005-12-16 16:08:49 UTC
Thank you very much!

Reassign to me to do the update of convmv.

Comment 14 Mike Fabian 2005-12-20 11:53:48 UTC
Update with patchinfo submitted to 10.0.
Comment 15 Mike Fabian 2005-12-20 11:54:34 UTC
Reassign to Michael Schröder <mls@suse.de> because the bug
in perl still remains.
Comment 16 Anja Stock 2006-01-09 09:30:20 UTC
released
Comment 17 Bjoern Jacke 2006-01-09 10:39:05 UTC
reopen: I guess there is no fix for this bug (Perl decode_utf8 function) released.
Comment 18 Michael Schröder 2006-01-09 10:41:38 UTC
You have to convince the perl gurus first that this is a perl bug.
Comment 19 Anja Stock 2006-02-24 10:21:37 UTC
are they convinced now? :-)
Comment 20 Bjoern Jacke 2006-02-24 16:39:13 UTC
they obviously have plenty of time and don't care so much about breaking applications.
Comment 21 Bjoern Jacke 2006-10-16 16:05:39 UTC
this change has been reverted upstream in the meantime. It's up to you now whether you provide fixed RPMs or not.
Comment 22 Michael Schröder 2006-10-16 16:22:47 UTC
Dan has reverted it? I didn't see this on p5p. Is this in blead or in the maintenance tree?
Comment 23 Bjoern Jacke 2006-10-16 19:25:17 UTC
http://rt.cpan.org/Public/Bug/Display.html?id=16698
points to RT #14559 which says it's fixed in Encode 2.13. However I didn't cross check that's really true.
Comment 24 Stephan Kulow 2007-09-29 08:39:44 UTC
Closing old bugs. >=10.1 is fixed from what I understand