Bug 446710 - nroff formats documentation with nonprintable characters
Summary: nroff formats documentation with nonprintable characters
Status: RESOLVED FIXED
Alias: None
Product: openSUSE 11.1
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Beta 5
Hardware: Other Other
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Mike Fabian
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-19 17:33 UTC by Juergen Weigert
Modified: 2008-11-20 17:04 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
bnc446710.patch (570 bytes, patch)
2008-11-20 17:01 UTC, Mike Fabian
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Juergen Weigert 2008-11-19 17:33:07 UTC
Logged in as root on a text console, not using any personal settings:

perldoc perlrun
"/tmp/Q0ozIHSDX" may be a binary file.  See it anyway? y

All minus characters '-' in the manual page are replaced by an inverted question mark. the utf8 bytes of this glyph are hex e2 88 92, codepoint 2212 is a mathematical minus sign.

Tested and first seen with perldoc, but nroff -man has the same issue, whereas nroff -mandocdb looks good.
Comment 1 Mike Fabian 2008-11-19 18:08:07 UTC
It’s U+2212 MINUS SIGN. It’s only displayed as an inverted question
mark because your terminal settings and/or locale are wrong.

Comment 2 Mike Fabian 2008-11-19 18:09:38 UTC
What are your locale settings?
Comment 3 Juergen Weigert 2008-11-19 19:51:16 UTC
locale
LANG=POSIX
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

this is default settings directly after install.
/etc/sysconfig/console has
CONSOLE_FONT="lat9w-16.psfu"
CONSOLE_UNICODEMAP=""
CONSOLE_SCREENMAP="trivial"
CONSOLE_ENCODING="UTF-8"
Comment 4 Mike Fabian 2008-11-20 15:22:53 UTC
> Tested and first seen with perldoc, but nroff -man has the same issue, whereas
> nroff -mandocdb looks good.

I think nobody runs nroff -man, if you use

    man  perlrun

nroff is called with the option "-mandocdb" (You can see that in the
output of "man -d perlrun"). With the option "-mandocdb" the file

/usr/share/groff/site-tmac/tmac.andocdb

is used, which contains:

root@magellan:~# cat /usr/share/groff/site-tmac/tmac.andocdb
.\" tmac.(m)andocdb
.\"
.\" This is part of the man(db) package of SuSE Linux
.\" Author: Werner Fink
.\"
.\" Just like tmac.andoc but
.\" load either tmac.andb or tmac.doc
.\"
.if !\n(.g .ab These macros require groff.
.de Dd
.rm Dd
.do mso tmac.doc
\\*(Dd\\
..
.de TH
.rm TH
.do mso tmac.andb
\\*(TH\\
..
.if '\*[.T]'utf8' \{\
.  char \- \N'45'
.  char  - \N'45'
.  char  ' \N'39'
.  char  ` \N'96'
.\}
..
root@magellan:~#

I.e. this file contains workarounds to keep some ASCII characters
commonly used in source code or command lines instead of replacing
them with fancy Unicode characters which cause problems when the
source code ist pasted or one searches in the source code.

If perldoc doesn’t used the option -mandocdb, our above workaround
is not used.

Why does perldoc not use -mandocdb?




Comment 5 Dr. Werner Fink 2008-11-20 15:39:14 UTC
The file /usr/share/groff/site-tmac/tmac.andocdb is SuSE specific as it
was never accepted upstream in man-db. Nevertheless it would be perfect
if perldoc would use mandocdb if available.
Comment 6 Michael Schröder 2008-11-20 15:46:39 UTC
'nroff -man' is standard, how about fixing it?
Comment 7 Dr. Werner Fink 2008-11-20 16:00:29 UTC
Note: man uses nroff, it does never format anything.
Comment 8 Mike Fabian 2008-11-20 17:00:59 UTC
Submitted to Factory:
-------------------------------------------------------------------
Thu Nov 20 17:48:24 CET 2008 - mfabian@suse.de

- bnc#446710: add the workarounds from
  /usr/share/groff/site-tmac/tmac.andocdb (man package) directly
  to groff. These workarounds are to avoid rendering - as
  U+2010 (HYPHEN), \- as U+2212 (MINUS SIGN), ` as U+2018
  (LEFT SINGLE QUOTATION MARK), and ' as U+2019 (RIGHT SINGLE
  QUOTATION MARK). Using these non-ASCII characters for rendering
  man-pages with programm examples and command line options is
  confusing and prevents cut and paste of code examples
  impossible.

-------------------------------------------------------------------
Comment 9 Mike Fabian 2008-11-20 17:01:57 UTC
Created attachment 254022 [details]
bnc446710.patch

Patch used.
Comment 10 Mike Fabian 2008-11-20 17:02:38 UTC
Closing as FIXED.
Comment 11 Mike Fabian 2008-11-20 17:04:41 UTC
Now the workaround from the man package
(/usr/share/groff/site-tmac/tmac.andocdb) is not needed anymore,
but it doesn’t do any harm either.