Bug 162501 - wrong "locale -a" output for ja_JP.SHIFT_JISX0213 and hy_AM.armscii-8
Summary: wrong "locale -a" output for ja_JP.SHIFT_JISX0213 and hy_AM.armscii-8
Status: CONFIRMED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Linux
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Andreas Schwab
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-31 13:35 UTC by Mike Fabian
Modified: 2017-08-10 09:50 UTC (History)
7 users (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
bugzilla-162501-locale-test.sh (166 bytes, text/plain)
2007-01-18 16:12 UTC, Mike Fabian
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Fabian 2006-03-31 13:35:38 UTC
The ja_JP.SHIFT_JISX0213 locale exists:

mfabian@magellan:/tmp$ LANG=ja_JP.SHIFT_JISX0213 locale charmap
SHIFT_JISX0213
mfabian@magellan:/tmp$ 

But it in the form listed by "locale -a":

mfabian@magellan:/tmp$ locale -a | grep -i shift
ja_JP.shiftjisx0213
mfabian@magellan:/tmp$

it is not accepted by glibc:

mfabian@magellan:/tmp$ LANG=ja_JP.shiftjisx0213 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968
mfabian@magellan:/tmp$
Comment 2 Mike Fabian 2007-01-18 16:06:03 UTC
The same problem exists for the locale hy_AM.ARMSCII-8:

mfabian@magellan:~/bin$ locale -a | grep -i armscii
hy_AM.armscii8
mfabian@magellan:~/bin$ LC_ALL=hy_AM.armscii8 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968
mfabian@magellan:~/bin$ LC_ALL=hy_AM.armscii-8 locale charmap
ARMSCII-8
mfabian@magellan:~/bin$ LC_ALL=hy_AM.ARMSCII-8 locale charmap
ARMSCII-8
mfabian@magellan:~/bin$
Comment 3 Mike Fabian 2007-01-18 16:09:32 UTC
Reassigned to glibc maintainer Petr Baudis <pbaudis@novell.com>.
Comment 4 Mike Fabian 2007-01-18 16:11:07 UTC
hy_AM.armscii8 and ja_JP.shiftjisx0213 are the only two locales
which suffer from this problem.

Comment 5 Mike Fabian 2007-01-18 16:12:55 UTC
Created attachment 113680 [details]
bugzilla-162501-locale-test.sh

Test script to find the locales suffering from this problem.

Output:

mfabian@magellan:~/bin$ ./bugzilla-162501-locale-test.sh
hy_AM.armscii8
ja_JP.shiftjisx0213
mfabian@magellan:~/bin$
Comment 6 Mike Fabian 2007-01-18 16:17:17 UTC
By the way, the ja_JP.SHIFT_JISX0213 is not yet supported by X
even when the spellings accepted by glibc are used:

mfabian@magellan:~/c$ LC_ALL=ja_JP.shift_jisx0213 XSupportsLocale
False.
mfabian@magellan:~/c$ LC_ALL=ja_JP.SHIFT_JISX0213 XSupportsLocale
False.
mfabian@magellan:~/c$

Adding X maintainer Stefan Dirsch <sndirsch@novell.com> to CC:.

Comment 7 Mike Fabian 2007-01-18 16:29:24 UTC
The locale names listed by glibc in the “locale -a” output are 
“normalized”.  i.e. ‘-’characters in the locale name are removed
and everything in the encoding part of the locale name is converted to
lowercase.

But for most locales glibc accepts not only the normalized spelling but also
the more common spellings as input, i.e. both the normalized spelling de_DE.utf8
and the spelling according to the standard de_DE.UTF-8 work:

mfabian@magellan:~/c$ LANG=de_DE.UTF-8 locale charmap
UTF-8
mfabian@magellan:~/c$ LANG=de_DE.utf8 locale charmap
UTF-8
mfabian@magellan:~/c$

Several SuSE Linux releases ago, glibc was even more liberal in which spellings
were accepted, even something like de_DE.u-T_f-_8 was accepted.
This is not the case any more:

mfabian@magellan:~/c$ LANG=de_DE.u-T_f-_8 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968
mfabian@magellan:~/c$

It is probably OK if glibc refuses to accept such weird non-standard spellings.

But at least glibc should accept the spellings which glibc itself lists
with “locale -a”.

If one wants to find out which locales exist will use “locale -a” or
look in /usr/lib/locale. The one will of course expect that the
encoding will work in the spelling found there.
It is not nice to expect from the user to guess at which places in the encoding
‘-’ or ‘_’ characters have to be added to make the locale work.

Comment 8 Stefan Dirsch 2007-04-02 20:27:37 UTC
I talked to Mike and we agreed that it would be a non-trivial task to add support Xlocale suport for ja_JP.SHIFT_JISX0213 and we won't do it as long
as we don't get a request to do so.
Comment 9 Mike Fabian 2007-04-03 15:05:38 UTC
Yes.

But it would be nice if at least the spelling problem in glibc could be fixed.
Comment 10 Mike Fabian 2007-11-15 15:57:45 UTC
Werner just stumbled about this problem again while investigating bug
#341594. I.e. this is really an annoying problem and it really should
be fixed.

Comment 11 Mike Fabian 2007-11-15 16:06:00 UTC
Change product to openSUSE 11.0.
Comment 12 Bruno Haible 2008-03-24 19:00:14 UTC
The encoding SHIFT_JISX0213 is not ASCII compatible:

SHIFT_JISX0213 maps
  0x5C = U+00A5
  0x7E = U+203E

The way glibc implements locales, such a locale cannot be ISO C99 compliant.
(See <http://www.sourceware.org/ml/libc-locales/2006-q3/msg00054.html>
and <http://www.sourceware.org/ml/libc-alpha/2000-10/msg00311.html>).
It is also very likely that many programs malfunction in these locales.
I would suggest to declare this locale unsupported. This locale is in fact
not contained in glibc/localedata/SUPPORTED; probably no one uses it.
Comment 13 Dr. Werner Fink 2008-03-25 10:30:44 UTC
It would be nice if ja_JP.SHIFT_JISX0213 would not be required but it is a
matter of fact that ja_JP.SJIS of the glibc does not cover the full set of
the SHIFT JIS glyphs (see bug #189239).
Comment 14 Bruno Haible 2008-03-25 11:37:42 UTC
I cannot view bug #189239. But from my experience, when people say that they
are wondering why some characters/glyphs are "missing" in Shift_JIS, what they
want is CP932, not Shift_JISX0213. Why? Because CP932 is the Japanese Windows
encoding, used by maybe 95% of the computers in that country, whereas nearly no
one is using Shift_JISX0213.

Try creating a ja_JP.CP932 locale...
Comment 15 Mike Fabian 2008-03-25 21:36:46 UTC
Bruno Haible> The encoding SHIFT_JISX0213 is not ASCII compatible:
Bruno Haible> 
Bruno Haible> SHIFT_JISX0213 maps
Bruno Haible>   0x5C = U+00A5
Bruno Haible>   0x7E = U+203E

The SHIFT_JIS encoding has the same problem, i.e. it is not ASCII
compatible either. 

Bruno Haible> The way glibc implements locales, such a locale cannot be ISO C99 compliant.
Bruno Haible> (See <http://www.sourceware.org/ml/libc-locales/2006-q3/msg00054.html>
Bruno Haible> and <http://www.sourceware.org/ml/libc-alpha/2000-10/msg00311.html>).
Bruno Haible> It is also very likely that many programs malfunction in these locales.
Bruno Haible> I would suggest to declare this locale unsupported. This locale is in fact
Bruno Haible> not contained in glibc/localedata/SUPPORTED; probably no one uses it.

And the ja_JP.SJIS loclae is not in glibc/localedata/SUPPORTED
either.

→ both ja_JP.SJIS *and* ja_JP.SHIFT_JISX0213 are not ISO C99
compliant and should be avoided.
Comment 16 Mike Fabian 2008-03-25 21:40:45 UTC
Bruno Haible> Try creating a ja_JP.CP932 locale...

Might be useful. Although it would probably be nicer if everybody
switched to using ja_JP.UTF-8 instead of such legacy locales.


Comment 17 Mike Fabian 2008-03-25 21:55:51 UTC
What about the hy_AM.armscii-8 locale? Does anybody really need that?
Any reasons why one cannot use hy_AM.UTF-8?

Anyway, if the locales ja_JP.SHIFT_JISX0213 and hy_AM.ARMSCII-8 are
available in glibc and listed by the “locale -a” command, shouldn’t
they be accepted in the spelling as they are listed by the “locale -a”
command?

“locale -a” lists the UTF-8 locales like ja_JP.utf8. That doesn’t look
so nice to me because UTF-8 is the preferred spelling, i.e. it would
be nicer if it were listed as ja_JP.UTF-8 by the “locale -a” command.

But as ja_JP.utf8 and ja_JP.UTF-8 are treated as identical by glibc
and it doesn’t cause problems if users look at the output of “locale
-a” and then set LANG=ja_JP.utf8 because this works as well.

Shouldn’t the weird legacy locales like ja_JP.SHIFT_JISX0213 and
hy_AM.ARMSCII-8 also be accepted in the way they are listed by “locale
-a”? Certainly they should be avoided, but unfortunately they are
there. Not accepting them in the way as listed by “locale -a” just
adds to the confusion.
Comment 18 Bruno Haible 2008-03-25 23:39:33 UTC
> What about the hy_AM.armscii-8 locale?

It is supported by glibc. Therefore I have no idea why it causes problems with
"locale -a". Did someone try to debug it?

> Does anybody really need that?

For a positive answer, you may need to ask Pablo Saratxaga.
I have two pieces of facts:
  - There are only 2 Armenian PO files in TP, KDE, GNOME, and they are all
    encoded in UTF-8.
    <http://translationproject.org/team/hy.html>
    <http://i18n.kde.org/team-infos.php?teamcode=hy>
    <http://www.gnome.org/i18n/>
  - ARMSCII-8 is not among the encodings supported by GNU gettext and coreutils
    for 6 years now. It has not been reported as a bug.

> Any reasons why one cannot use hy_AM.UTF-8?

I guess that people are using hy_AM.UTF-8 and that it works well.

> “locale -a” lists the UTF-8 locales like ja_JP.utf8. That doesn’t look
> so nice to me because UTF-8 is the preferred spelling, i.e. it would
> be nicer if it were listed as ja_JP.UTF-8 by the “locale -a” command.

Yes, I agree with you. But try to convince Ulrich Drepper.
Comment 19 Petr Baudis 2008-12-05 01:06:53 UTC
I think the best long-term solution is to automatically create normalized aliases for module names in gconv code, I will implement that later.
Comment 20 Tomáš Chvátal 2017-08-10 09:50:24 UTC
Reproducer still working on tumbleweed.

@Andreas: could you please review/close it.