Bugzilla – Bug 144726
bad codepages for russian man pages
Last modified: 2006-02-08 09:43:44 UTC
I use OpenSuSE 10.0 and i much like this system. But i found one bug in man subsystem of your distribution. In /usr/share/man/ru/ directory placed russian versions of some manpages. But all of this files in different codepages but not in KOI8-R. In same time only those russian manpages which have KOI8-R i can read on the screen (in spite of system use UTF-8 coding). Therefore if all of your russian manpages convert to KOI8-R all must be fine. Hope that you fix this little bug in 10.1 release. Aleksandr Shubnik <alshu@tut.by>
Mike, this might be a groff limitation?
Yes, groff still doesn't support UTF-8 input. Therefore, the sources of the man-pages have to be in legacy encodings (ISO-8859-15 for German, KOI8-R for Russian, EUC-JP for Japanese, ...). /usr/bin/nroff is patched to call groff in the appropriate legacy locale and convert the result to UTF-8. But of course the sources of the man-pages need to be encoded in the "right" legacy encoding then. For Russian this should be KOI8-R. To fix the problem for Russian now, we should make sure that all sources for Russian man-pages are KOI8-R encoded. As soon as groff supports UTF-8 input, we will convert the sources of all man-pages to UTF-8. But right now this is not possible.
I made a new bugreport (bug #144766) for usr/share/man/ru/man1/artsmessage.1.gz because this file seems to be completely unusable for groff.
I submitted the groff package with a workaround to allow man-page sources to be in UTF-8. ------------------------------------------------------------------- Mon Jan 23 18:31:45 CET 2006 - mfabian@suse.de - Bugzilla #144726: add workaround to allow UTF-8 encoded sources of man-pages. Some packages already contain man-pages with UTF-8 encoded man-page sources, for example "mc". Hopefully one day groff will really support this. Until then a workaround is better than nothing. -------------------------------------------------------------------
Created attachment 64580 [details] utf8.patch extended utf8.patch to allow the source of man-pages to be UTF-8 encoded.
The new part of this patch checks wether the input received by /usr/bin/nroff is already UTF-8 encoded. If yes, it converts the input back to the appropriate legacy encoding before feeding it into groff. Of course this will discard characters which cannot be converted back to the legacy encoding, but I think it is is better than nothing.
Closing as FIXED.