Bug 100116

Summary: encoding bug
Product: [SUSE Tools] TaskJuggler Reporter: Maxime Delorme <mdelorme>
Component: HTML-ReportsAssignee: Chris Schlaeger <cs>
Status: RESOLVED FIXED QA Contact: Chris Schlaeger <cs>
Severity: Normal    
Priority: P5 - None    
Version: TaskJuggler   
Target Milestone: ---   
Hardware: i686   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: My tjp file
the right one

Description Maxime Delorme 2005-08-02 08:59:10 UTC
I'm on Mandriva 10.2 using ISO-8859-15
When my tjp file is encoding in UTF-8
when I write my task in french such as "Amélioration du contrôle"
I've got "Amélioration du contrÎle" in the html report
moreover in the section rawhead I can write with some accentuated character and
everything OK

When my tjp file is encoding in ISO-8859-15
when I write my task in french such as "Amélioration du contrôle"
I've got "Amélioration du contrôle" in the html report
but in the section rawhead I can write with some accentuated character and buggy

So I can have in the same time task and rawhead in the right format
Comment 1 Chris Schlaeger 2005-08-05 01:55:08 UTC
TaskJuggler does not try to detect the encoding of your .tjp files. It assumes 
that the file is encoded in the same encoding as your locale. So if your file 
is UTF-8, you need to set your locale to UTF-8 as well. Then the encoding 
problems should be gone. 
Comment 2 Maxime Delorme 2005-08-05 07:42:13 UTC
When encoding my .tjp file in ISO-8859-15 I still have a bug 
I can't write 
rawhead
    '<table border="1">
    <tr>
        <td><a href="Tasks-Overview.html">Tâches hebdo</a></td>
...
'
but I need to write 
rawhead
    '<table border="1">
    <tr>
        <td><a href="Tasks-Overview.html">T&acirc;ches hebdo</a></td>
...'

Otherwise the word "Tâches" appears like that "T�ches hebdo" witch is the
display by the browser using UTF8, The browser is force to use this encoding,
because you write the following at the top of the HTML reports.

<head>
<title>Task Report</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
But the task name is well translated : you translate "contrôle" into
"contr&#x00f4;le" that will display correct in UTF-8, or any encoding .
for my task : 
task CT "contrôle" {

}
Comment 3 Chris Schlaeger 2005-08-14 02:24:21 UTC
Can you send me the output of 'locate' and 'file <yourproject.tjp>'? 
Comment 4 Maxime Delorme 2005-08-16 08:03:53 UTC
Created attachment 46126 [details]
My tjp file
Comment 5 Maxime Delorme 2005-08-16 08:05:28 UTC
Well I suppose that it was "locale" and not "locate"
so 
[Max]$ locale
LANG=fr_FR
LC_CTYPE=fr_FR
LC_NUMERIC=fr_FR
LC_TIME=fr_FR
LC_COLLATE=fr_FR
LC_MONETARY=fr_FR
LC_MESSAGES=fr_FR
LC_PAPER=fr_FR
LC_NAME=fr_FR
LC_ADDRESS=fr_FR
LC_TELEPHONE=fr_FR
LC_MEASUREMENT=fr_FR
LC_IDENTIFICATION=fr_FR
LC_ALL=
Comment 6 Chris Schlaeger 2005-08-18 17:42:24 UTC
That all looks fine and I can't reproduce the problem here on my SUSE Linux   
9.3. When I set LC_ALL to fr_FR and process the ISO-8859-15 (attachement id 
46126) the HTML reports look fine. I could not find an accented character that 
was wrong. 
 
When I try the UTF-8 file and set LC_ALL to fr_FR.UTF-8 again the HTML reports 
look fine. 
 
So as long as your locale matches the file encoding you should be good. 
Comment 7 Maxime Delorme 2005-08-18 18:15:16 UTC
Created attachment 46590 [details]
the right one

I remove the html entities from the rawhead section
Comment 8 Sergey Kogan 2005-11-16 11:52:26 UTC
I think I've run into this problem when using Russian (koi8-r) encoding for project.

This problem is caused by using latin-1 output transformations at least in macros and HTML report code. Try "grep -i latin1 *.cpp" to see what is going on.

I've hacked around this problem by replacing

- s.setEncoding(QTextStream::Latin1);                                        

with
+ s.setEncoding(QTextStream::UnicodeUTF8);

all around in reports/report elements.

This is not a clean solution (some HTML post-processing is needed), but it let's me use cyrillic in tjp-files and get readable reports.
Comment 9 Chris Schlaeger 2005-11-18 15:18:30 UTC
The HTML files are always latin1 since non-ASCII characters are hex encoded. If you want to use non-ASCII characters you have to use a UTF8 locale and encode your files in UTF8. Other encodings will not work properly.
Comment 10 Maxime Delorme 2005-11-18 15:39:36 UTC
This bug is not closed !
You make mistake, I've your read Comment #2 
If your produce html report whith <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
You must check that user' data follow this charset, but when you work in a locale machine with ISO-8859-15 , user'data are used this charset ! You must provide conversion

Otherwise, your software is broken one, I'm afraid.
Of course they are way to do with it, but that's not the right way.