Bugzilla – Bug 683857
man: new Unicode characters in use
Last modified: 2011-06-16 07:36:02 UTC
Starting with openSUSE 11.4, /usr/bin/man outputs the character U+2010 when it breaks a word where it previously used U+002D. As a result, since many fonts do not have the U+2010 character (including terminus on xterm, and especially the text console), a replacement graphic such as a rectange is displayed instead. The soft hyphen at U+00AD could be used instead, or switching back to just plain ASCII hyphens.
man uses groff for character mapping and less for output on the terminal
That seems to be regression of dropped bnc446710.patch - see bug 446710. However it seems the fonts/devutf8/R is not the place for it anymore. With u2010 24 0 0x002D in that file I've got echo "\[u2010]" | nroff -mandoc -Tutf8 | head -n 1 | od -x 0000000 80e2 0a90 0000004 which is hyphen in utf-8 only ascii seems to produce proper replacement echo "\[u2010]" | nroff -mandoc -Tascii | head -n 1 | od -x 0000000 0a2d 0000004 even if I was not able to realize in which .tmac file is this mapping one. There's no big difference in loaded tmac files between devascii and devutf8. Only in later case the unicode.tmac and latin.tmac are called after tty.tmac. Only one solution I'm aware of is revert the logic of unicode.tmac - instead of current mapping of 0x2d to 0x2010 et all .\" unicode.tmac .\" .char - \[hy] .char ` \[oq] .char ' \[cq] .\" EOF use .\" unicode.tmac .\" .char \[hy] - .char \[oq] ` .char \[cq] ' .\" EOF but that might cause unwanted side-effects in case someone else use non tty output. So maybe we can name it as deunicode.tmac and call it in tty.tmac instead of unicode one. Werner: what do you think?
uh forget that - I patched tty.tmac to not include unicode.tmac, which changes the 0x2d to 0x2010. I don't think we need to change it back. I'm going to sent a fix to M17N soon.
The problem has been fixed in M17N[1] groff by commit 12 [2]. The tty.tmac no longer include unicode.tmac, so ascii chars will be not replaced. Feel free to test it before I'll submit it to Factory from M17N repository [1]. [1] http://download.opensuse.org/repositories/M17N/openSUSE_11.4/ [2] https://build.opensuse.org/package/rdiff?commit=12&linkrev=base&package=groff&project=M17N
I have updated to the package, but still see U+2010 used for wordbreaks.
Can you get me an example? Which man page and under which conditions. Thanks.
Created attachment 427556 [details] Test manpage groff-1.20.1-183.1.x86_64.rpm from M17N/openSUSE_11.4. $ locale LANG=en_US.UTF-8 LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=POSIX LC_TIME=POSIX LC_COLLATE=POSIX LC_MONETARY=POSIX LC_MESSAGES=nb_NO.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Running inside xterm-268: $ man -l test.1 | pcregrep -o '[^\w]+' | sort -u ... ? When adding | hexdump -C, this will produce "e2 80 90", which is a sign of U+2010.
Updated patch adds the deunicode.tmac, which turns those unicodization off on tty. Then hexdump -C returns 00000000 2d 0a |-.| 00000002 Commited as a revision13 to M17N/groff.
Submitted into openSUSE:Factory by request 72760 - I assume you can use the version from M17N, so no maintenance update is requested, thus closing.
This is an autogenerated message for OBS integration: This bug (683857) was mentioned in https://build.opensuse.org/request/show/72760 Factory / groff
(In reply to comment #2) > That seems to be regression of dropped bnc446710.patch - see bug 446710. > However it seems the fonts/devutf8/R is not the place for it anymore. With > > u2010 24 0 0x002D > > in that file I've got > > echo "\[u2010]" | nroff -mandoc -Tutf8 | head -n 1 | od -x > 0000000 80e2 0a90 > 0000004 > > which is hyphen in utf-8 > > only ascii seems to produce proper replacement > > echo "\[u2010]" | nroff -mandoc -Tascii | head -n 1 | od -x > 0000000 0a2d > 0000004 > > even if I was not able to realize in which .tmac file is this mapping one. > There's no big difference in loaded tmac files between devascii and devutf8. > Only in later case the unicode.tmac and latin.tmac are called after tty.tmac. > > Only one solution I'm aware of is revert the logic of unicode.tmac - instead of > current mapping of 0x2d to 0x2010 et all > > .\" unicode.tmac > .\" > .char - \[hy] > .char ` \[oq] > .char ' \[cq] > .\" EOF > > use > > .\" unicode.tmac > .\" > .char \[hy] - > .char \[oq] ` > .char \[cq] ' > .\" EOF > > but that might cause unwanted side-effects in case someone else use non tty > output. So maybe we can name it as deunicode.tmac and call it in tty.tmac > instead of unicode one. > > Werner: what do you think? I came upon this bug while googling deunicode.tmac due to a new rpmlint error for a few package's man pages. This is from lilv, a package I'm preparing for factory : lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2jack.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/serdi.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/lilv.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdURI.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdNode.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/sordi.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/serd.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdChunk.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/sord.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2ls.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2info.1.gz 69: can't find macro file `deunicode.tmac' This man page may contain problems that can cause it not to be formatted as intended. Is there a package that provides deunicode.tmac?
As of * Mon Jun 06 2011 mvyskocil@suse.cz - - fix bnc#682913: device X100 is missing * create new groff-devx package containing all devX devices, as they need X for build - fix bnc#683857: Unicode characters in use * groff-1.20.1-deunicode.patch adds deunicode.tmac to tty.tmac removes all unecessary unicode characters in tty output I still get 0x2010 as a dash separator.
-
Sorry, I accidentally tested the groff from 11.3. However the deunicode.tmac is not the proper solution. The working one is simple - change the soft-hyphenation char to - That is what the new version is doing # To be sure I'm testing the right version! $ rpm -q --changelog groff | head -n 4* Wed Jun 08 2011 mvyskocil@suse.cz - fix bnc#683857: Unicode characters in use properly * change the soft hyphenation char to - in tty.tmac $ man -l test.1 | pcregrep -o '[^\w]+' | sort -u | grep -- '-' | hexdump -C 00000000 2d 0a |-.| 00000002 Commited as revision 17 to M17N/groff
Now does what was wanted.
This is an autogenerated message for OBS integration: This bug (683857) was mentioned in https://build.opensuse.org/request/show/73067 11.4 / groff https://build.opensuse.org/request/show/73070 Factory / groff
Update released for: groff, groff-debuginfo, groff-doc Products: openSUSE 11.4 (debug, i586, x86_64)