Bugzilla – Bug 135497
perl decode_utf8 function broken?
Last modified: 2007-09-29 08:39:44 UTC
using convmv on SuSE 10.0 (perl 5.8.7) is broken with it's utf-8 detection routines: in a directory with a latin1-encoded file named "für" do this: # convmv -f latin1 -t utf8 -r für Skipping, already UTF-8: ./für though "für" is not UTF-8 encoded. This does not look like a convmv bug to me but like a malfunction of Perl's decode_utf8 which is used in convmv's looks_like_utf8 function. This used to work with all perl versions having the necessary unicode support (5.8.0 - 5.8.6). decode_utf8 of a non-utf-8 string does not seem to return "false".
This was on p5porters yesterday, they said that it is a bug to assume that decode_utf8 returns false on error. Not a perl bug. Reassigned to convmv maintainer.
from man perluniintro: o How Do I Detect Data That's Not Valid In a Particular Encoding? Use the "Encode" package to try converting it. For example, use Encode 'decode_utf8'; if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) { # valid } else { # invalid } If the Perl (::Encode) hackers change the behaviour of their functions whenever they like to and not even change the documentation, which explictly says that the function should behave this (old) way ... cool. Please drop Perl ;-)
Good point. I'll ask them.
reassign to perl maintainer, whatever you want to do :-) For Mike - as a workaround, please try the attached diff (the 2 hunks might have some offsets, sorry) to make convmv work with the broken decode_utf8
Created attachment 60134 [details] patch to make convmv work with Perl 5.8.7
having extended the regression test i found that another small fix in the workaround was needed. 1.09 is out right now fixing exactly this topic. But what should be fixed in the end is the Perl bug...
NEEDINFO → Andreas Jaeger <aj@suse.de>. Andreas, can we make a YOU update from convmv 1.08 to convmv 1.09 for SuSE Linux 10.0? Currently convmv doesn't work at all on SuSE Linux 10.0, with Björn's update it works again.
Let's fix this for 10.1, only.
I already submitted convmv 1.09 to STABLE, i.e. this is already fixed for 10.1. And I uploaded updated packages of convmv to ftp://ftp.suse.com/pub/projects/m17n/10.0/ But why not fix it for 10.0? It's quite easy, isn't it? And currently convmv is useless on 10.0.
Mike, where is convmv used? I don't see this as a critical component. Our policy is to fix security issues and critical bug fixes only, we cannot fix every bug that's out there and is not catched during the beta phase.
Update approved after some offline discussion - swamp ID is Maintenance-Tracker-3207
Thank you very much! Reassign to me to do the update of convmv.
Update with patchinfo submitted to 10.0.
Reassign to Michael Schröder <mls@suse.de> because the bug in perl still remains.
released
reopen: I guess there is no fix for this bug (Perl decode_utf8 function) released.
You have to convince the perl gurus first that this is a perl bug.
are they convinced now? :-)
they obviously have plenty of time and don't care so much about breaking applications.
this change has been reverted upstream in the meantime. It's up to you now whether you provide fixed RPMs or not.
Dan has reverted it? I didn't see this on p5p. Is this in blead or in the maintenance tree?
http://rt.cpan.org/Public/Bug/Display.html?id=16698 points to RT #14559 which says it's fixed in Encode 2.13. However I didn't cross check that's really true.
Closing old bugs. >=10.1 is fixed from what I understand