Bugzilla – Bug 162501
wrong "locale -a" output for ja_JP.SHIFT_JISX0213 and hy_AM.armscii-8
Last modified: 2017-08-10 09:50:24 UTC
The ja_JP.SHIFT_JISX0213 locale exists: mfabian@magellan:/tmp$ LANG=ja_JP.SHIFT_JISX0213 locale charmap SHIFT_JISX0213 mfabian@magellan:/tmp$ But it in the form listed by "locale -a": mfabian@magellan:/tmp$ locale -a | grep -i shift ja_JP.shiftjisx0213 mfabian@magellan:/tmp$ it is not accepted by glibc: mfabian@magellan:/tmp$ LANG=ja_JP.shiftjisx0213 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 mfabian@magellan:/tmp$
The same problem exists for the locale hy_AM.ARMSCII-8: mfabian@magellan:~/bin$ locale -a | grep -i armscii hy_AM.armscii8 mfabian@magellan:~/bin$ LC_ALL=hy_AM.armscii8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 mfabian@magellan:~/bin$ LC_ALL=hy_AM.armscii-8 locale charmap ARMSCII-8 mfabian@magellan:~/bin$ LC_ALL=hy_AM.ARMSCII-8 locale charmap ARMSCII-8 mfabian@magellan:~/bin$
Reassigned to glibc maintainer Petr Baudis <pbaudis@novell.com>.
hy_AM.armscii8 and ja_JP.shiftjisx0213 are the only two locales which suffer from this problem.
Created attachment 113680 [details] bugzilla-162501-locale-test.sh Test script to find the locales suffering from this problem. Output: mfabian@magellan:~/bin$ ./bugzilla-162501-locale-test.sh hy_AM.armscii8 ja_JP.shiftjisx0213 mfabian@magellan:~/bin$
By the way, the ja_JP.SHIFT_JISX0213 is not yet supported by X even when the spellings accepted by glibc are used: mfabian@magellan:~/c$ LC_ALL=ja_JP.shift_jisx0213 XSupportsLocale False. mfabian@magellan:~/c$ LC_ALL=ja_JP.SHIFT_JISX0213 XSupportsLocale False. mfabian@magellan:~/c$ Adding X maintainer Stefan Dirsch <sndirsch@novell.com> to CC:.
The locale names listed by glibc in the “locale -a” output are “normalized”. i.e. ‘-’characters in the locale name are removed and everything in the encoding part of the locale name is converted to lowercase. But for most locales glibc accepts not only the normalized spelling but also the more common spellings as input, i.e. both the normalized spelling de_DE.utf8 and the spelling according to the standard de_DE.UTF-8 work: mfabian@magellan:~/c$ LANG=de_DE.UTF-8 locale charmap UTF-8 mfabian@magellan:~/c$ LANG=de_DE.utf8 locale charmap UTF-8 mfabian@magellan:~/c$ Several SuSE Linux releases ago, glibc was even more liberal in which spellings were accepted, even something like de_DE.u-T_f-_8 was accepted. This is not the case any more: mfabian@magellan:~/c$ LANG=de_DE.u-T_f-_8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 mfabian@magellan:~/c$ It is probably OK if glibc refuses to accept such weird non-standard spellings. But at least glibc should accept the spellings which glibc itself lists with “locale -a”. If one wants to find out which locales exist will use “locale -a” or look in /usr/lib/locale. The one will of course expect that the encoding will work in the spelling found there. It is not nice to expect from the user to guess at which places in the encoding ‘-’ or ‘_’ characters have to be added to make the locale work.
I talked to Mike and we agreed that it would be a non-trivial task to add support Xlocale suport for ja_JP.SHIFT_JISX0213 and we won't do it as long as we don't get a request to do so.
Yes. But it would be nice if at least the spelling problem in glibc could be fixed.
Werner just stumbled about this problem again while investigating bug #341594. I.e. this is really an annoying problem and it really should be fixed.
Change product to openSUSE 11.0.
The encoding SHIFT_JISX0213 is not ASCII compatible: SHIFT_JISX0213 maps 0x5C = U+00A5 0x7E = U+203E The way glibc implements locales, such a locale cannot be ISO C99 compliant. (See <http://www.sourceware.org/ml/libc-locales/2006-q3/msg00054.html> and <http://www.sourceware.org/ml/libc-alpha/2000-10/msg00311.html>). It is also very likely that many programs malfunction in these locales. I would suggest to declare this locale unsupported. This locale is in fact not contained in glibc/localedata/SUPPORTED; probably no one uses it.
It would be nice if ja_JP.SHIFT_JISX0213 would not be required but it is a matter of fact that ja_JP.SJIS of the glibc does not cover the full set of the SHIFT JIS glyphs (see bug #189239).
I cannot view bug #189239. But from my experience, when people say that they are wondering why some characters/glyphs are "missing" in Shift_JIS, what they want is CP932, not Shift_JISX0213. Why? Because CP932 is the Japanese Windows encoding, used by maybe 95% of the computers in that country, whereas nearly no one is using Shift_JISX0213. Try creating a ja_JP.CP932 locale...
Bruno Haible> The encoding SHIFT_JISX0213 is not ASCII compatible: Bruno Haible> Bruno Haible> SHIFT_JISX0213 maps Bruno Haible> 0x5C = U+00A5 Bruno Haible> 0x7E = U+203E The SHIFT_JIS encoding has the same problem, i.e. it is not ASCII compatible either. Bruno Haible> The way glibc implements locales, such a locale cannot be ISO C99 compliant. Bruno Haible> (See <http://www.sourceware.org/ml/libc-locales/2006-q3/msg00054.html> Bruno Haible> and <http://www.sourceware.org/ml/libc-alpha/2000-10/msg00311.html>). Bruno Haible> It is also very likely that many programs malfunction in these locales. Bruno Haible> I would suggest to declare this locale unsupported. This locale is in fact Bruno Haible> not contained in glibc/localedata/SUPPORTED; probably no one uses it. And the ja_JP.SJIS loclae is not in glibc/localedata/SUPPORTED either. → both ja_JP.SJIS *and* ja_JP.SHIFT_JISX0213 are not ISO C99 compliant and should be avoided.
Bruno Haible> Try creating a ja_JP.CP932 locale... Might be useful. Although it would probably be nicer if everybody switched to using ja_JP.UTF-8 instead of such legacy locales.
What about the hy_AM.armscii-8 locale? Does anybody really need that? Any reasons why one cannot use hy_AM.UTF-8? Anyway, if the locales ja_JP.SHIFT_JISX0213 and hy_AM.ARMSCII-8 are available in glibc and listed by the “locale -a” command, shouldn’t they be accepted in the spelling as they are listed by the “locale -a” command? “locale -a” lists the UTF-8 locales like ja_JP.utf8. That doesn’t look so nice to me because UTF-8 is the preferred spelling, i.e. it would be nicer if it were listed as ja_JP.UTF-8 by the “locale -a” command. But as ja_JP.utf8 and ja_JP.UTF-8 are treated as identical by glibc and it doesn’t cause problems if users look at the output of “locale -a” and then set LANG=ja_JP.utf8 because this works as well. Shouldn’t the weird legacy locales like ja_JP.SHIFT_JISX0213 and hy_AM.ARMSCII-8 also be accepted in the way they are listed by “locale -a”? Certainly they should be avoided, but unfortunately they are there. Not accepting them in the way as listed by “locale -a” just adds to the confusion.
> What about the hy_AM.armscii-8 locale? It is supported by glibc. Therefore I have no idea why it causes problems with "locale -a". Did someone try to debug it? > Does anybody really need that? For a positive answer, you may need to ask Pablo Saratxaga. I have two pieces of facts: - There are only 2 Armenian PO files in TP, KDE, GNOME, and they are all encoded in UTF-8. <http://translationproject.org/team/hy.html> <http://i18n.kde.org/team-infos.php?teamcode=hy> <http://www.gnome.org/i18n/> - ARMSCII-8 is not among the encodings supported by GNU gettext and coreutils for 6 years now. It has not been reported as a bug. > Any reasons why one cannot use hy_AM.UTF-8? I guess that people are using hy_AM.UTF-8 and that it works well. > “locale -a” lists the UTF-8 locales like ja_JP.utf8. That doesn’t look > so nice to me because UTF-8 is the preferred spelling, i.e. it would > be nicer if it were listed as ja_JP.UTF-8 by the “locale -a” command. Yes, I agree with you. But try to convince Ulrich Drepper.
I think the best long-term solution is to automatically create normalized aliases for module names in gconv code, I will implement that later.
Reproducer still working on tumbleweed. @Andreas: could you please review/close it.