Bug 144726 - bad codepages for russian man pages
Summary: bad codepages for russian man pages
Status: VERIFIED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: ConsoleApps (show other bugs)
Version: Final
Hardware: Other SuSE Linux 10.0
: P5 - None : Normal
Target Milestone: ---
Assignee: Mike Fabian
QA Contact: Karl Eichwalder
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-23 08:09 UTC by Aleksandr Shubnik
Modified: 2006-02-08 09:43 UTC (History)
2 users (show)

See Also:
Found By: Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
utf8.patch (3.71 KB, patch)
2006-01-23 17:40 UTC, Mike Fabian
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksandr Shubnik 2006-01-23 08:09:19 UTC
I use OpenSuSE 10.0 and i much like this system. But i found
one bug in man subsystem of your distribution. In /usr/share/man/ru/
directory placed russian versions of some manpages. But all of this
files in different codepages but not in KOI8-R. In same time only
those russian manpages which have KOI8-R i can read on the screen
(in spite of system use UTF-8 coding). Therefore if all of your russian
manpages convert to KOI8-R all must be fine. Hope that you fix this
little bug in 10.1 release.
   Aleksandr Shubnik <alshu@tut.by>
Comment 1 Karl Eichwalder 2006-01-23 09:04:11 UTC
Mike, this might be a groff limitation?
Comment 2 Mike Fabian 2006-01-23 11:08:19 UTC
Yes, groff still doesn't support UTF-8 input. Therefore, the
sources of the man-pages have to be in legacy encodings
(ISO-8859-15 for German, KOI8-R for Russian, EUC-JP for Japanese, ...).

/usr/bin/nroff is patched to call groff in the appropriate legacy
locale and convert the result to UTF-8. But of course the sources of
the man-pages need to be encoded in the "right" legacy encoding then.
For Russian this should be KOI8-R.

To fix the problem for Russian now, we should make sure that all
sources for Russian man-pages are KOI8-R encoded. 

As soon as groff supports UTF-8 input, we will convert the sources of
all man-pages to UTF-8. But right now this is not possible. 
Comment 3 Mike Fabian 2006-01-23 11:36:39 UTC
I made a new bugreport (bug #144766) for usr/share/man/ru/man1/artsmessage.1.gz
because this file seems to be completely unusable for groff.
Comment 4 Mike Fabian 2006-01-23 17:39:19 UTC
I submitted the groff package with a workaround to allow
man-page sources to be in UTF-8.

-------------------------------------------------------------------
Mon Jan 23 18:31:45 CET 2006 - mfabian@suse.de

- Bugzilla #144726: add workaround to allow UTF-8 encoded sources
  of man-pages. Some packages already contain man-pages with
  UTF-8 encoded man-page sources, for example "mc". Hopefully
  one day groff will really support this. Until then a workaround
  is better than nothing.

-------------------------------------------------------------------
Comment 5 Mike Fabian 2006-01-23 17:40:21 UTC
Created attachment 64580 [details]
utf8.patch

extended utf8.patch to allow the source of man-pages to be UTF-8 encoded.
Comment 6 Mike Fabian 2006-01-23 17:42:09 UTC
The new part of this patch checks wether the input received by
/usr/bin/nroff is already UTF-8 encoded. If yes, it converts the input
back to the appropriate legacy encoding before feeding it into groff.

Of course this will discard characters which cannot be converted back
to the legacy encoding, but I think it is is better than nothing.


Comment 7 Mike Fabian 2006-01-23 17:43:11 UTC
Closing as FIXED.