Bug 144726

Summary: bad codepages for russian man pages
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Aleksandr Shubnik <a.shubnik>
Component: ConsoleAppsAssignee: Mike Fabian <mfabian>
Status: VERIFIED FIXED QA Contact: Karl Eichwalder <ke>
Severity: Normal    
Priority: P5 - None CC: coolo, ke
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: SuSE Linux 10.0   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: utf8.patch

Description Aleksandr Shubnik 2006-01-23 08:09:19 UTC
I use OpenSuSE 10.0 and i much like this system. But i found
one bug in man subsystem of your distribution. In /usr/share/man/ru/
directory placed russian versions of some manpages. But all of this
files in different codepages but not in KOI8-R. In same time only
those russian manpages which have KOI8-R i can read on the screen
(in spite of system use UTF-8 coding). Therefore if all of your russian
manpages convert to KOI8-R all must be fine. Hope that you fix this
little bug in 10.1 release.
   Aleksandr Shubnik <alshu@tut.by>
Comment 1 Karl Eichwalder 2006-01-23 09:04:11 UTC
Mike, this might be a groff limitation?
Comment 2 Mike Fabian 2006-01-23 11:08:19 UTC
Yes, groff still doesn't support UTF-8 input. Therefore, the
sources of the man-pages have to be in legacy encodings
(ISO-8859-15 for German, KOI8-R for Russian, EUC-JP for Japanese, ...).

/usr/bin/nroff is patched to call groff in the appropriate legacy
locale and convert the result to UTF-8. But of course the sources of
the man-pages need to be encoded in the "right" legacy encoding then.
For Russian this should be KOI8-R.

To fix the problem for Russian now, we should make sure that all
sources for Russian man-pages are KOI8-R encoded. 

As soon as groff supports UTF-8 input, we will convert the sources of
all man-pages to UTF-8. But right now this is not possible. 
Comment 3 Mike Fabian 2006-01-23 11:36:39 UTC
I made a new bugreport (bug #144766) for usr/share/man/ru/man1/artsmessage.1.gz
because this file seems to be completely unusable for groff.
Comment 4 Mike Fabian 2006-01-23 17:39:19 UTC
I submitted the groff package with a workaround to allow
man-page sources to be in UTF-8.

-------------------------------------------------------------------
Mon Jan 23 18:31:45 CET 2006 - mfabian@suse.de

- Bugzilla #144726: add workaround to allow UTF-8 encoded sources
  of man-pages. Some packages already contain man-pages with
  UTF-8 encoded man-page sources, for example "mc". Hopefully
  one day groff will really support this. Until then a workaround
  is better than nothing.

-------------------------------------------------------------------
Comment 5 Mike Fabian 2006-01-23 17:40:21 UTC
Created attachment 64580 [details]
utf8.patch

extended utf8.patch to allow the source of man-pages to be UTF-8 encoded.
Comment 6 Mike Fabian 2006-01-23 17:42:09 UTC
The new part of this patch checks wether the input received by
/usr/bin/nroff is already UTF-8 encoded. If yes, it converts the input
back to the appropriate legacy encoding before feeding it into groff.

Of course this will discard characters which cannot be converted back
to the legacy encoding, but I think it is is better than nothing.


Comment 7 Mike Fabian 2006-01-23 17:43:11 UTC
Closing as FIXED.