Bugzilla – Bug 446710
nroff formats documentation with nonprintable characters
Last modified: 2008-11-20 17:04:41 UTC
Logged in as root on a text console, not using any personal settings: perldoc perlrun "/tmp/Q0ozIHSDX" may be a binary file. See it anyway? y All minus characters '-' in the manual page are replaced by an inverted question mark. the utf8 bytes of this glyph are hex e2 88 92, codepoint 2212 is a mathematical minus sign. Tested and first seen with perldoc, but nroff -man has the same issue, whereas nroff -mandocdb looks good.
It’s U+2212 MINUS SIGN. It’s only displayed as an inverted question mark because your terminal settings and/or locale are wrong.
What are your locale settings?
locale LANG=POSIX LC_CTYPE=en_US.UTF-8 LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= this is default settings directly after install. /etc/sysconfig/console has CONSOLE_FONT="lat9w-16.psfu" CONSOLE_UNICODEMAP="" CONSOLE_SCREENMAP="trivial" CONSOLE_ENCODING="UTF-8"
> Tested and first seen with perldoc, but nroff -man has the same issue, whereas > nroff -mandocdb looks good. I think nobody runs nroff -man, if you use man perlrun nroff is called with the option "-mandocdb" (You can see that in the output of "man -d perlrun"). With the option "-mandocdb" the file /usr/share/groff/site-tmac/tmac.andocdb is used, which contains: root@magellan:~# cat /usr/share/groff/site-tmac/tmac.andocdb .\" tmac.(m)andocdb .\" .\" This is part of the man(db) package of SuSE Linux .\" Author: Werner Fink .\" .\" Just like tmac.andoc but .\" load either tmac.andb or tmac.doc .\" .if !\n(.g .ab These macros require groff. .de Dd .rm Dd .do mso tmac.doc \\*(Dd\\ .. .de TH .rm TH .do mso tmac.andb \\*(TH\\ .. .if '\*[.T]'utf8' \{\ . char \- \N'45' . char - \N'45' . char ' \N'39' . char ` \N'96' .\} .. root@magellan:~# I.e. this file contains workarounds to keep some ASCII characters commonly used in source code or command lines instead of replacing them with fancy Unicode characters which cause problems when the source code ist pasted or one searches in the source code. If perldoc doesn’t used the option -mandocdb, our above workaround is not used. Why does perldoc not use -mandocdb?
The file /usr/share/groff/site-tmac/tmac.andocdb is SuSE specific as it was never accepted upstream in man-db. Nevertheless it would be perfect if perldoc would use mandocdb if available.
'nroff -man' is standard, how about fixing it?
Note: man uses nroff, it does never format anything.
Submitted to Factory: ------------------------------------------------------------------- Thu Nov 20 17:48:24 CET 2008 - mfabian@suse.de - bnc#446710: add the workarounds from /usr/share/groff/site-tmac/tmac.andocdb (man package) directly to groff. These workarounds are to avoid rendering - as U+2010 (HYPHEN), \- as U+2212 (MINUS SIGN), ` as U+2018 (LEFT SINGLE QUOTATION MARK), and ' as U+2019 (RIGHT SINGLE QUOTATION MARK). Using these non-ASCII characters for rendering man-pages with programm examples and command line options is confusing and prevents cut and paste of code examples impossible. -------------------------------------------------------------------
Created attachment 254022 [details] bnc446710.patch Patch used.
Closing as FIXED.
Now the workaround from the man package (/usr/share/groff/site-tmac/tmac.andocdb) is not needed anymore, but it doesn’t do any harm either.