|
Bugzilla – Full Text Bug Listing |
I see Japanese and some Latin characters for numbers and device names, so those are definitely in the font. That begs the question what that is that we are not seeing. I suspect that this might be a problem in the translation files. I just saw in the translation file that you are the translator. :-) Created attachment 870513 [details] Japanese translations for the "storage" textdomain Created from downloading the latest yast2-trans-ja package yast2-trans-ja-84.87.20230516.e4ba802a-150500.3.3.1.noarch.rpm (zypper download yast2-trans-ja), unpacking it with "unrpm" and then converting it back from .mo format to .po format with > msgunfmt ./usr/share/YaST2/locale/ja/LC_MESSAGES/storage.mo >storage.ja.po At a first glance (and not knowing any Japanese), the translations look good to me. Maybe some parts of that storage proposal are sent through the translation functions twice in the YaST code. AFAICS the "?????" texts all come from libstorage-ng; things like (AFAICS) "Create partition", "Create partition table", i.e. actions from the action-graph. So far, I can't find them in any of the .mo files in this translations package. I found them in libstorage-ng-lang in file /usr/share/locale/ja/LC_MESSAGES/libstorage-ng.mo . Created attachment 870514 [details]
Japanese translations for the "libstorage" textdomain
From libstorage-ng-lang-4.5.155-lp155.1.1.noarch, converted back from .mo to .po format with
msgunfmt /usr/share/locale/ja/LC_MESSAGES/libstorage-ng.mo >libstorage.jp.po
This is where the text "BIOS Boot Partition" comes from: https://github.com/openSUSE/libstorage-ng/blob/master/storage/Devices/PartitionImpl.cc#L1732-L1734 and then it must be somewhere here https://github.com/openSUSE/libstorage-ng/tree/master/storage/CompoundAction/Formatter probably in https://github.com/openSUSE/libstorage-ng/blob/master/storage/CompoundAction/Formatter/Partition.cc but where exactly? And is any translated text translated once again, or recoded again? I don't see it. Arvin, please take a look. First of all I can reproduce the problem with German during the installation from DVD. And it has been broken for about 2 months! I cannot reproduce the problem on my Tumbleweed system. So maybe something is missing on the DVD? Some files needed to convert encodings? I will add some debugging to libstorage-ng to see if I can find a problem there. YaST does not set the environment correctly: The codeset is "ANSI_X3.4-1968" (that is logged by libstorage-ng). With a wrong codeset the dgettext function cannot represent the result. So looks as if a call to setlocale is missing. Ruby might use another method for translations that works without setlocale. Josef, Martin, was there a change in that area recently? (About two months or so ago) I see nothing on ruby bindings side. It is done at https://github.com/yast/yast-ruby-bindings/blob/master/src/ruby/yast/i18n.rb and it is not change for 4 years and code is pretty straightforward. So maybe it is more related to startup script? During init we initialize locale, but I expect it needs to be switched by installer https://github.com/yast/yast-ruby-bindings/blob/master/src/ruby/yast/i18n.rb as this is only when ruby-bindings is initialized initially. In instalation lang switching should be done by this code https://github.com/yast/yast-country/blob/master/language/src/modules/Language.rb#L728 but also the last change is 1 year old. and really low level is WFM handling of language that calls that setlocale and it is not touched for 7 years, so probably also not reason - https://github.com/yast/yast-core/blob/master/wfm/src/Y2WFMComponent.cc#L434 Steffen, was there a change in the inst-sys that might have that effect about 2-3 months ago? Some locale data that are now no longer on the image maybe? I just checked yast-installation/startup/, and the last change there was in mid-May (and completely unrelated). *** Bug 1217446 has been marked as a duplicate of this bug. *** Firstly, as Arvin noted in comment 10, the locale is just not set correctly. It starts out with the correct value in install.inf but when you check the yast process env you see ANSI_X3.4-1968 is set. Not sure where this happens. There is in fact this change: https://bugzilla.suse.com/show_bug.cgi?id=1216448 that looks related. Not so much the inst-sys package change that just removes an unwanted dependency (the termcap package was never used) but the underlying termcap/ncurses package reorg. I can reproduce the problem with a recent Tumbleweed: openSUSE-Tumbleweed-DVD-x86_64-Snapshot20231218-Media.iso for both German and Japanese, the special characters - but only for messages from libstorage-ng - are replaced by '?', like reported here. "- Partition /dev/sda2 l?schen - Partition /dev/sda3 unterswap einh?ngen ..." But the rest of the UI (buttons, workflow steps on the left side panel) was OK with regular 'äöüÄÖÜ' umlaut characters. It works without a problem with Leap 15.5, though: openSUSE-Leap-15.5-DVD-x86_64-Build491.1-Media.iso I checked the environment variables and the output of the 'locale' command in both scenarios: - Booting with startshell=1 and checking in that shell - OK - Starting the Qt installation and starting an 'xterm' from there - OK - Checking /proc/<yast-pid>/environ and doing some command line magic to make it readable (the individual environment variables are separated by a 0 byte): tr '\000' '\n' </proc/<yast-pid>/environ >/tmp/env.txt OK, in particular no 'ANSI_X3.4-1968' In all cases, it was some variation of 'en_US.UTF-8' or 'de_DE.UTF8' or 'ja_JP.UTF8'. BUT 1:install:/var/log/YaST2 # grep 'ANSI' y2log 2023-12-20 14:34:09 <1> install(4685) [Ruby] modules/Console.rb(SelectFont):123 Language en_US -> Console encoding ANSI_X3.4-1968 2023-12-20 09:34:09 <1> install(4685) [libstorage] EnvironmentImpl.cc(extra_log):194 codeset ANSI_X3.4-1968 even in the language / keyboard / license workflow step without touching anything. Maybe stupid question, but here we see Encoding.console set to WFM.SetLanguage(@language): https://github.com/yast/yast-country/blob/master/console/src/modules/Console.rb#L97-L121 ... and logged in y2log as "ANSI_X3.4-1968" But https://github.com/yast/yast-core/blob/master/wfm/src/Y2WFMComponent.cc#L433-L502 returns proposedEncoding which according to y2log is "ANSI_X3.4-1968" but it also logs this line: https://github.com/yast/yast-core/blob/master/wfm/src/Y2WFMComponent.cc#L460 (Y2WFMComponent.cc(SetLanguage):460 SET encoding to: UTF-8" in y2log) BUT - if line 460 in the code is reached then proposedEncoding is never set (other branch of the 'if') - or am I missing something? Right now I am totally confused where that "ANSI_X3.4-1968" might come from. According to Stack Overflow, it's a prehistoric version of ASCII from the time when dinosaurs roamed the earth: https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding Q: > What is "ANSI_X3.4-1968" encoding? A: > This is another name for USAS X3.4-1968, a revision of ASCII > that is distinguished by being: > > - the first revision to allow a linefeed (LF) to occur on its > own (i.e. not preceded by or followed by a carriage return > (CR)). > > - the revision that introduced the common name of (US-)ASCII. > > This is basically ASCII as we think of it, although there were > two minor revisions that followed it. And then there are some web articles about problems with Python when somebody did something wrong when generating locale data; they meant to use UTF-8, just like we, but they got that "ANSI_X3.4-1968" instead. I wonder if something similar happened to us. I already checked glibc-locale in the inst-sys, and it looked good. Starting yast.ssh again after clearing the logs, then just selecting the German keyboard:
> 1:install:/var/log/YaST2 # grep 'GET encoding for' y2log
> .
> 11:25:54 <0> install(25690) [wfm] Y2WFMComponent.cc(SetLanguage):481 GET encoding for en_US.UTF-8: UTF-8
> 11:25:58 <0> install(25690) [wfm] Y2WFMComponent.cc(SetLanguage):481 GET encoding for en_US: ANSI_X3.4-1968
> 11:26:01 <0> install(25791) [wfm] Y2WFMComponent.cc(SetLanguage):481 GET encoding for en_US: UTF-8
> 11:26:02 <0> install(25792) [wfm] Y2WFMComponent.cc(SetLanguage):481 GET encoding for en_US: UTF-8
eh - WHAT?
From now on, every time I switch the language, I always get this: Y2WFMComponent.cc(SetLanguage):481 GET encoding for de_DE: ANSI_X3.4-1968 GET encoding for de_DE: ANSI_X3.4-1968 GET encoding for it_IT: ANSI_X3.4-1968 GET encoding for ja_JP: ANSI_X3.4-1968 GET encoding for ru_RU: ANSI_X3.4-1968 GET encoding for en_US: ANSI_X3.4-1968 https://github.com/yast/yast-core/blob/master/wfm/src/Y2WFMComponent.cc#L471 proposedEncoding = nl_langinfo (CODESET); ... y2debug ( "GET encoding for %s: %s", currentLanguage.c_str(), proposedEncoding.c_str() ); Did the behavior of nl_langinfo() change? Are we missing some file that it needs in the inst-sys? AFAICS the YaST code in all those areas hasn't changed at all for many years. Created attachment 871507 [details]
Test program for nl_langinfo()
// Simple test program for nl_langinfo()
//
// Build (even without a Makefile) with:
// make nl_langinfo
//
// Usage:
// nl_langinfo_test [<locale>]
//
// Examples:
// nl_langinfo_test de_DE.UTF-8
// nl_langinfo_test de_DE
#include <stdio.h> // printf(), fprintf()
#include <stdlib.h> // exit()
#include <locale.h> // setlocale()
#include <langinfo.h> // nl_langinfo()
int main( int argc, char *argv[] )
{
char * old_locale = NULL;
char * new_locale = NULL;
char * encoding = NULL;;
if ( argc == 2 )
new_locale = argv[1];
old_locale = setlocale( LC_CTYPE, NULL );
if ( new_locale )
setlocale( LC_CTYPE, new_locale );
else
new_locale = old_locale;
encoding = nl_langinfo( CODESET );
printf( "Old locale: %s\n", old_locale );
printf( "Encoding: %s for language %s\n",
encoding, new_locale );
}
On my Leap 15.5: [sh @ balrog] ...~/src/nl_langinfo_test 42 % make nl_langinfo_test cc nl_langinfo_test.c -o nl_langinfo_test [sh @ balrog] ...~/src/nl_langinfo_test 43 % nl_langinfo_test Old locale: C Encoding: ANSI_X3.4-1968 for language C [sh @ balrog] ...~/src/nl_langinfo_test 44 % nl_langinfo_test de_DE.UTF8 Old locale: C Encoding: UTF-8 for language de_DE.UTF8 [sh @ balrog] ...~/src/nl_langinfo_test 45 % nl_langinfo_test de_DE Old locale: C Encoding: ISO-8859-1 for language de_DE [sh @ balrog] ...~/src/nl_langinfo_test 46 % nl_langinfo_test de Old locale: C Encoding: ANSI_X3.4-1968 for language de =========================== So ANSI_X3.4-1968 is indeed a fallback if no encoding can be determined from the .UTF-8 suffix or from the language part (de_DE) alone, or if nothing is set (locale "C"). Created attachment 871510 [details]
Test program for nl_langinfo() V0.2
// Simple test program for nl_langinfo()
//
// Build (even without a Makefile) with:
// make nl_langinfo
//
// Usage:
// nl_langinfo_test [<locale>]
//
// Examples:
// nl_langinfo_test de_DE.UTF-8
// nl_langinfo_test de_DE
#include <stdio.h> // printf(), fprintf()
#include <stdlib.h> // exit()
#include <locale.h> // setlocale()
#include <langinfo.h> // nl_langinfo()
int main( int argc, char *argv[] )
{
char empty[] = "";
char * old_locale = NULL;
char * new_locale = NULL;
char * encoding = NULL;
char * lc_ctype = getenv( "LC_CTYPE" );
if ( argc == 2 )
new_locale = argv[1];
else
new_locale = empty;
old_locale = setlocale( LC_CTYPE, NULL );
if ( new_locale )
setlocale( LC_CTYPE, new_locale );
else
new_locale = old_locale;
encoding = nl_langinfo( CODESET );
if ( lc_ctype )
printf( "LC_CTYPE: '%s'\n", lc_ctype );
printf( "Old locale: '%s'\n", old_locale );
printf( "Encoding: %s for locale '%s'\n",
encoding, new_locale );
}
Now using an empty string, not NULL, to fall back to the LC_CTYPE environment variable.
> [sh @ balrog] ...~/src/nl_langinfo_test 64 % LC_CTYPE=cs_CZ.UTF-8 nl_langinfo_test
> LC_CTYPE: 'cs_CZ.UTF-8'
> Old locale: 'C'
> Encoding: UTF-8 for locale ''
> [sh @ balrog] ...~/src/nl_langinfo_test 65 % LC_CTYPE=cs_CZ nl_langinfo_test
> LC_CTYPE: 'cs_CZ'
> Old locale: 'C'
> Encoding: ISO-8859-2 for locale ''
> [sh @ balrog] ...~/src/nl_langinfo_test 66 % LC_CTYPE=cs nl_langinfo_test
> LC_CTYPE: 'cs'
> Old locale: 'C'
> Encoding: ANSI_X3.4-1968 for locale ''
So, as long as a country (_DE, _CZ) is specified, it falls back to the old-style encodings: ISO-8859-1 (Latin1) for de_De, ISO-8859-2 (Latin2) for cs_CZ.
If only the language is specified, no country, it falls back to ANSI_X3.4-1968.
Only if the encoding is explicitly specified (de_DE.UTF-8, cs_CZ.UTF-8) it uses that one.
Since that was already the behavior on Leap 15.5, something must have changed on TW; probably something in our languages map.
I just did a > git diff -r upstream/SLE-15-SP5..master in yast-country: https://github.com/yast/yast-country/pull/320/files which led to this change that also mentions 'ANSI_X3.4-1968': https://github.com/yast/yast-country/pull/311/commits/11bb0a205cc0719747ef590e6c7f390e64d9d007 This change was necessary while working on https://github.com/yast/yast-country/pull/311 because the CI test build had failed on TW back then, which prior to that change had always returned 'UTF-8', and now it returned 'ANSI_X3.4-1968'. Back then in August 2023 we weren't aware that this was an indicator of something bigger that would affect us in other areas as well: Changes somewhere in the locale handling in TW's Glibc. Martin, does this ring any bell? IMHO we should use a pragmatic approach here. That prehistoric 'ANSI_X3.4-1968' is not useful for anybody in any locale these days. It's not even (7 bit) ASCII which is aready outdated and creates a ton of problems. So, if there isn't any more sophisticated specialized encoding like the (relatively) modern Japanese, Korean or Chinese ones or the slightly outdated ISO-8859-1, -2, -9, we should simply fall back to UTF-8. First of all, ANSI_X3.4-1968 is simply ASCII. Glibc apparently prefers a specific name, and 1968 is simply the oldest revision that matches what glibc wants. (If you're curious, the 1967 revision did not allow a LF without a CR, so Linux obviously cannot have that, and 1965 even had '@' in a different place. 1963 had no lowercase letters. https://www.aivosto.com/articles/charsets-7bit.html , linked from https://en.wikipedia.org/wiki/ASCII , tells way more about that.) (In reply to Stefan Hundhammer from comment #26) % LC_CTYPE=cs nl_langinfo_test > > LC_CTYPE: 'cs' > > Old locale: 'C' > > Encoding: ANSI_X3.4-1968 for locale '' "cs" mapping to ASCII is just a case of not checking for errors as there is no "cs" locale. I just did some experiments with the latest TW NET ISO with debugging:
> [Ruby] modules/Language.rb(GetLocaleString):534 locale de_DE.UTF-8
> [wfm] Y2WFMComponent.cc(SetLanguage):481 GET encoding for de_DE: ANSI_X3.4-1968
> [wfm] Y2WFMComponent.cc(SetLanguage):499 WFM SetLanguage("de_DE"), Encoding("ANSI_X3.4-1968")
> [wfm] Y2WFMComponent.cc(SetLanguage):460 SET encoding to: UTF-8
> [wfm] Y2WFMComponent.cc(SetLanguage):499 WFM SetLanguage("en_US"), Encoding("UTF-8")
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):351 Dynamic Proxy: [UI::SetConsoleFont] with [9] params
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):360 Namespace created from UI
> [ui] YUINamespace.cc(createFunctionCall):1045 overloaded SetConsoleFont, 1@4
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):395 Call SetConsoleFont
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):401 Append parameter ""
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):401 Append parameter "eurlatgr.psfu"
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):401 Append parameter ""
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):401 Append parameter ""
> [Ruby] binary/Yast.cc(ycp_module_call_ycp_function):401 Append parameter "de_DE"
> [ui] YUINamespace.cc(finishParameters):925 Actual type: <unspec> (string, string, string, string, string)
> [ui] YUINamespace.cc(finishParameters):942 Candidate: void SetConsoleFont (string, string, string, string, string) MATCH: 0
> [Ruby] modules/Console.rb(SelectFont):123 Language de_DE -> Console encoding ANSI_X3.4-1968
> [Ruby] modules/Language.rb(WfmSetLanguage):755 Setting the current language
> [Ruby] modules/Language.rb(WfmSetGivenLanguage):732 Language changed from de_DE to de_DE encoding: UTF-8 use_utf8: true
I.e. initially, the encoding is set up correctly to UTF-8, but then in Console.rb SelectFont(), it is messed up again, falling back to ANSI_X3.4-1968. So the problem does not seem to be in the generic Y2WFMComponent.cc part, but in Console.rb. I don't know WHY it uses ANSI_X3.4-1968 here, but I know for a fact that we have translations for higher-level encodings, mostly UTF-8, and UTF-8 is a superset of ASCII (ANSI_X3.4-1968). So I am going to try catching this particular case that it tries to use ANSI_X3.4-1968, and I am beyond caring why it even does that; if it does, we'll simply override it with UTF-8. The first iteration of this did not help: https://github.com/yast/yast-country/pull/322 I can see the new message in the y2log, but still the special characters are broken: I used German (de_DE.UTF-8), and the "löschen" message is still "l?schen". Also, when I examine /proc/self/environ of the running YaST process, I get: LANG=en_US.UTF-8 Locale=en_US (irrelevant other environment variables omitted) Despite the y2log telling me that it is now using de_DE.UTF-8. So, it did do something, but obviously not enough. The 'locale' command in the xterm started with Ctrl-Shift-Alt-X from the running YaST process gave me LANG=de_DE.UTF-8 LC_CTYPE="de_DE.UTF-8" LC_NUMERIC=en_US.utf8 LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 LC_PAPER=de_DE.utf8 LC_NAME=de_DE.utf8 LC_ADDRESS=de_DE.utf8 LC_TELEPHONE=de_DE.utf8 LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=de_DE.utf8 LC_ALL= as expected after switching to German (de_DE.UTF-8). When I go back to the language selection workflow step and select Czech, I get all messages in Czech, and special characters look alright; except for those by libstorage-ng in the storage proposal where I also just see question marks. An xterm opened from there gives me 'locale' output of LANG=cs_CZ.UTF-8 LC_CTYPE="cs_CZ.UTF-8" LC_NUMERIC=en_US.utf8 LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8 LC_PAPER=de_DE.utf8 LC_NAME=de_DE.utf8 LC_ADDRESS=de_DE.utf8 LC_TELEPHONE=de_DE.utf8 LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=de_DE.utf8 LC_ALL=
> An xterm opened from there gives me 'locale' output of
>
> LANG=cs_CZ.UTF-8
> LC_CTYPE="cs_CZ.UTF-8"
Corection: all values are cz_CZ.UTF-8, and in the German case all are de_DE.UTF-8.
Created attachment 872716 [details]
Screenshot: storage proposal in Czech
Notice the broken special characters, but only in the storage proposal text that comes from libstorage-ng. The ones in the left side bar and on the buttons are correct.
For completeness: It's not just displaying the storage proposal, it's also the saved actions text in storage-inst: > 2024-02-13 16:20:55 +0100 > . > Partition /dev/sda2 (37.99 GiB) l?schen > Partition /dev/sda3 (2.00 GiB) unter swap einh?ngen > Partition /dev/sda2 (37.99 GiB) f?r / mit btrfs erstellen > . > Subvolume @ auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/var auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/usr/local auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/srv auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/root auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/opt auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/home auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/boot/grub2/x86_64-efi auf /dev/sda2 (37.99 GiB) erstellen > Subvolume @/boot/grub2/i386-pc auf /dev/sda2 (37.99 GiB) erstellen > 2024-02-13 16:21:51 +0100 > . > Odstranit odd?l /dev/sda2 (37.99 GiB) > P?ipojit odd?l /dev/sda3 (2.00 GiB) na p??pojn?m bodu swap > Vytvo?it odd?l /dev/sda2 (37.99 GiB) pro p??pojn? bod / se syst?mem soubor? btrfs > . > Vytvo?it podsvazek @ na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/var na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/usr/local na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/srv na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/root na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/opt na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/home na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/boot/grub2/x86_64-efi na /dev/sda2 (37.99 GiB) > Vytvo?it podsvazek @/boot/grub2/i386-pc na /dev/sda2 (37.99 GiB) In the YaST context, we have several C++ libraries using GNU gettext that should also be affected if this is really a general problem: - libyui with libyui-qt and libyui-ncurses - libyui-qt-pkg (the Qt package selector) and libyui-ncurses-pkg (its NCurses counterpart) - libzypp - libstorage-ng For the most part, libyui gets its user messages from the Ruby code which uses a Ruby Gem to emulate the functionality of GNU gettext, so that part largely uses Ruby mechanisms for anything related to locales and encodings. But both libyui-qt-pkg and libyui-ncurses-pkg are pure C++ code independent of the Ruby part. ---------------------------------------------------------------- libstorage-ng uses dgettext() and dngettext() from GNU gettext: https://github.com/openSUSE/libstorage-ng/blob/master/storage/Utils/Text.cc#L54-L65 > Text > _(const char* msgid) > { > return Text(msgid, dgettext("libstorage-ng", msgid)); > } > . > . > Text > _(const char* msgid, const char* msgid_plural, unsigned long int n) > { > return Text(n == 1 ? msgid : msgid_plural, dngettext("libstorage-ng", msgid, msgid_plural, n)); > } libstorage-ng logs its initial (!) encoding as: > [libstorage] EnvironmentImpl.cc(extra_log):194 > codeset ANSI_X3.4-1968 based on this line: https://github.com/openSUSE/libstorage-ng/blob/master/storage/EnvironmentImpl.cc#L194 > y2mil("codeset " << nl_langinfo(CODESET)); Despite the broken German umlaut characters in the storage proposal, I get correct ones in the Qt package selector, in the menus as well as in texts in lists like the package classification. The package and pattern descriptions are still in English, though, but that might be a byproduct of the used TW NET ISO; not sure if it downloads the package metadata again when switching languages. But the encoding problem that we see with libstorage-ng does not exist for libyui-qt-pkg. Created attachment 872734 [details]
Screenshot: Qt package selector in the same environment
Notice menu "Abhängigkeiten" and in the list "Nicht benötigte Pakete", "Zurückgeholte Pakete"; in the tabs in the lower half of the window "Abhängigkeiten".
No broken German characters here.
An xterm opened from the running YaST process has $LANG set to de_DE.UTF-8 (at program start that was en_US.UTF-8). No LC_xxx environment variable is set. When I go back now to the storage proposal and force a new one by going through the guided proposal and select ext4 instead of Btrfs, I get another one with broken German umlaut characters; so whatever the UI did to get correct ones did not affect libstorage-ng at all. I am pretty sure that the logged codeset ANSI_X3.4-1968 from libstorage-ng's initialization is a red herring. Something else must have changed. If it works for libyui-qt-pkg in the same environment, why does it not work for libstorage-ng? Created attachment 872735 [details]
Screenshot: Qt package selector in Czech
Going back again and selecting Czech in the language workflow step and then allowing the available online update repos, I get Czech in libzypp as well; not everything is translated, there are many English package descriptions left over, but many are translated, and they have correct Czech special characters.
The menus and buttons in the Qt package selector are also not missing any Czech special characters; no question mark placeholders visible anywhere.
Created attachment 872736 [details]
Screenshot: Qt package selector in Japanese
Sadly, no translated package descriptions, but the UI with its menus, buttons and tabs have Japanese characters, no question mark placeholders.
Created attachment 872737 [details]
Screenshot: NCurses storage proposal in German
Notice broken German umlaut characters in the libstorage messages as well, but not anywhere else in the UI (e.g. "Geführtes Setup").
Created attachment 872738 [details]
Screenshot: NCurses package selection in German
Notice correct German umlaut charachters in this pure C++ libyui-ncurses-pkg code as well.
I don't know where else to look. AFAICS only libstorage-ng is affected. If it were a general problem with std::string in C++, I would expected to be libzypp and at least libyui-ncurses-pkg to be affected as well, but they are not. -> Arvin The locales in the inst-sys look strange to me: In /usr/lib/locale de_DE.utf8 is a link to en_US.utf8. Can someone explain that? So I have written a tiny C program that just calls setlocale and nl_langinfo. This tiny program works fine in the inst-sys (outputs that the codeset is UTF-8). And copying barrel into the inst-sys it displays the messages from libstorage-ng correct with German umlauts. > The locales in the inst-sys look strange to me: In /usr/lib/locale
> de_DE.utf8 is a link to en_US.utf8. Can someone explain that?
Intentionally, to save space.
In my logs I see this message: [Ruby] modules/Console.rb(SelectFont):123 Language de_DE -> Console encoding ANSI_X3.4-1968 What is YaST doing here? That was what I tried to fix with this: https://github.com/yast/yast-country/pull/322/files ...which had absolutely zero effect. You can easily try it with copying that file into the inst-sys. Even a setlocale call (that does not fail) right before querying the codeset in libstorage-ng does not set the codeset to UTF-8. Even if I explicitly do setlocale(LC_ALL, "de_DE.UTF-8"). I have no idea how that is possible. Created attachment 872771 [details]
y2log with extra codeset debug logging
I have extended yast2-core to log the codeset at several places
and it changes from UTF-8 to ANSI_X3.4-1968 as can be seen in the
logs. Maybe the Timezone or YaST Perl stuff is at fault.
If that were the case, why is only libstorage-ng affected? As I showed with all those screenshots, it works for libyui-qt-pkg, libyui-ncurses-pkg and (to the extent that we have translations for the package metadata) libzypp. I also tried to set LANG and LC_CTYPE to en_US.UTF-8 or de_DE.UTF-8 before starting YaST. It didn't change anything. (In reply to Stefan Hundhammer from comment #54) > If that were the case, why is only libstorage-ng affected? > > As I showed with all those screenshots, it works for libyui-qt-pkg, > libyui-ncurses-pkg and (to the extent that we have translations for the > package metadata) libzypp. Well, I have tested with older Tumbleweed snapshots where the messages are still correct and there the codeset is UTF-8. > I also tried to set LANG and LC_CTYPE to en_US.UTF-8 or de_DE.UTF-8 before > starting YaST. It didn't change anything. I also do not understand why the stuff is so messed up. AFAIS after a setlocale(LC_ALL, "de_DE.UTF-8") the codeset must be UTF-8 (see comment #52). FWIW, I checked my collection of old TW media, and the change happened between - Snapshot20230717 - ok - Snapshot20230813 - bad That is libstorage-ng version 4.5.123 vs. 4.5.136. Which doesn't really have any suspicious changes. other differences: glibc 2.37 vs. glibc 2.38 libstdc++6 13.1.1 vs. libstdc++6 13.2.1 I couldn't spot a difference on the yast and ruby side. BTW, the same happens even with en_US.UTF-8. And, it's not limited to the installation system. I've checked on an installed Tumbleweed and it has the same issue. Codeset sticks to ANSI_X3.4-1968. Hm, that was not true. It does work in the installed system. Created attachment 872798 [details]
screenshot of another broken umlaut
During an installation test I noticed another message about a bad
password with a broken umlaut (so not from libstorage-ng). That
message was fine with older Tumbleweed snapshots.
(In reply to Stefan Hundhammer from comment #54) > If that were the case, why is only libstorage-ng affected? The others use bind_textdomain_codeset. YaST could try that for libstorage-ng. But it will not help with the password message (comment #62) and maybe even more. To rule out some missing locale stuff, I've added full glibc locale packages to the installation system and run the tests again. The result was the same. Old TW works, new TW not. Looking closer, I found that the yast process in the installation system maps *both*: /usr/lib/locale/en_US/LC_CTYPE /usr/lib/locale/en_US.utf8/LC_CTYPE When you switch the language to German it *additionally* maps /usr/lib/locale/de_DE/LC_CTYPE The codeset is wrong with new TW (and ok with old TW). If I run 'yast storage' directly in the installation system, it maps (as expected) only /usr/lib/locale/de_DE.utf8/LC_CTYPE and you get the correct codeset. But strangely this is the same behavior in old and new TWs. I would at this point suspect locale handling in YaST is broken. Maybe has always been and it just shows now. /usr/lib/locale/en_US/LC_CTYPE is read in /usr/lib/YaST2/bin/y2start. And the locale is UTF-8 when y2start is started. And, finally, starting the installer - that is, /usr/lib/YaST2/startup/YaST2.First-Stage - in a normal Tumbleweed system also gets the codeset wrong. So, I think I can rule out anything related to the installation system setup. I don't see any change between SLE-15-SP5 and master in yast-country/language.
> src/yast/yast-country/language % git diff upstream/SLE-15-SP5..master .
> src/yast/yast-country/language %
Similarly, I don't see any change that might be even remotely relevant between SLE-15-SP5 and master in yast-ruby-bindings.
The YaST Ruby part uses the 'Fastgettext' Ruby GEM. Does that do any C calls to any locale-related function? I don't know.
What I also don't see is any call to bindtextdomain() in the libstorage-ng code. How does that work? Does it rely on some code from the outside to do that for its textdomain? Also, since ALL our C++ libs use this sequence bindtextdomain( "mytextdomain", LOCALEDIR ); bind_textdomain_codeset( "mytextdomain", "utf8" ); why not do the same in libstorage-ng? I don't think that any of our code or our translations work well with a non-UTF-8 locale. That ship has sailed some 20 years ago. IMHO we are doing this techno-voodoo dance around the encoding in vain, and we have been nailing the encoding down to UTF-8 in many areas for decades anyway. We might as well do it one more time if that's what it takes to make this work. (In reply to Stefan Hundhammer from comment #69) > What I also don't see is any call to bindtextdomain() in the libstorage-ng > code. How does that work? Does it rely on some code from the outside to do > that for its textdomain? The default location for textdomains works fine for libstorage-ng. > Also, since ALL our C++ libs use this sequence > > bindtextdomain( "mytextdomain", LOCALEDIR ); > bind_textdomain_codeset( "mytextdomain", "utf8" ); > > why not do the same in libstorage-ng? If YaST always wants UTF-8 then YaST can force the encoding. But the problem of comment #62 shows that there are still more problems. The message in comment #62 has one broken and one correct umlaut ("Möchten"). One text comes from this Perl module: https://github.com/yast/yast-users/blob/master/src/modules/UsersSimple.pm#L321 And the translation is correct in /usr/share/YaST2/locale/de/LC_MESSAGES/users.mo, I checked: > % msgunfmt /usr/share/YaST2/locale/de/LC_MESSAGES/users.mo | grep -B 1 -A 0 'Kleinbuchstaben' > msgid "You have used only lowercase letters for the password." > msgstr "Sie haben nur Kleinbuchstaben für das Passwort verwendet." The other text, also from users.mo, comes from Ruby code: > msgid "Really use this password?" > msgstr "Möchten Sie dieses Passwort wirklich verwenden?" Either here: https://github.com/yast/yast-users/blob/master/src/lib/y2users/password_helper.rb#L56 or here: https://github.com/yast/yast-users/blob/master/src/include/users/dialogs.rb#L1191 So obviously something doesn't work well with Perl as well. Martin, any good idea? (In reply to Steffen Winterfeldt from comment #60) > And, it's not limited to the installation system. > > I've checked on an installed Tumbleweed and it has the same issue. Codeset > sticks to ANSI_X3.4-1968. I can confirm this. And using some extra debugging I see that calling perl_construct in YPerl.cc changes the codeset from UTF-8 to ANSI_X3.4-1968. The update to perl 5.38 happened in between the date in comment #56. Created attachment 872862 [details]
demo program showing the codeset change
Confirmed with Arvin's test program from comment #75: Leap 15.5 with perl-5.26.1: locale de_DE.UTF-8 codeset UTF-8 codeset UTF-8 locale de_DE.UTF-8 codeset UTF-8 Tumbleweed with perl-5.38.2: locale de_DE.UTF-8 codeset UTF-8 codeset ANSI_X3.4-1968 locale de_DE.UTF-8 codeset ANSI_X3.4-1968 So, Arvin, Steffen, do you think that a change in Perl is messing up the encoding of our YaST process? Or is that only showing a problem deeper down, for example in Glibc or in the Glibc locales? AFAIS both perl and glibc where updated in that time. Anyway, I think starting the embedded perl interpreter should not modify the locale in the first place. Arvin, Stefan, Steffen: Thanks a lot for the debugging! I just fetched the source tarballs for the two Perl versions, perl-5.26.1 and perl-5.38.2, and there are MASSIVE differences in the one file that seems to do most of the locale handling: locale.c. It looks like they did major refactoring in that area. That does not necessarily mean that this is actually where the problem comes from, but IMHO it's a very strong hint. Let's ask the perl maintainer. Michael, any change in perl 5.38 that justifies comment 76? I just experimented with Arvin's test program to restore the encoding to the initial value after calling perl_alloc(), but no matter what I do, it will remain at ANSI_X3.4-1968. I tried setting LC_CTYPE and LC_ALL; it doesn't make a difference. Is there a new LC_xxx that overrides this? Created attachment 872867 [details]
Arvin's test program hacked up to save and restore the encoding
% LC_ALL=de_DE.UTF-8 LD_LIBRARY_PATH=$PERL_LIB ./arvin-yperl-tester-hacked-01
initial locale: de_DE.UTF-8
initial codeset: UTF-8
codeset after perl_alloc: ANSI_X3.4-1968
setting locale to de_DE.UTF-8
setlocale returned de_DE.UTF-8
locale after setting de_DE.UTF-8: de_DE.UTF-8
codeset after setting de_DE.UTF-8: ANSI_X3.4-1968
% echo $PERL_LIB /usr/lib/perl5/5.38.2/x86_64-linux-thread-multi/CORE AFAICS one call to uselocale( LC_GLOBAL_LOCALE ); after perl_alloc() did the trick, even without needing to save and later restore LC_CTYPE. It makes sense: In the Perl documentation, they write about more thread safety, and they are now using a lot of calls to newlocale(); in perl-5.26.1, there was only one. It seems likely that they now create a new locale and work on that one, while we use the old-style simple calls that don't take a locale_t argument. Are they working on the global one, or the current one? That's hard to say. Now the question is what will break if we add that uselocale( LC_GLOBAL_LOCALE ) in our embedded Perl interpreter YPerl.cc just after the perl_alloc() call. Created attachment 872869 [details]
Arvin's test program again in a changed version
Now no longer restoring the initial locale, just switching to the global one after initializing the embedded Perl interpreter.
Notice that parts are commented out with #if 0.
Output:
initial locale: de_DE.UTF-8
initial codeset: UTF-8
codeset after perl_alloc: ANSI_X3.4-1968
locale:de_DE.UTF-8
codeset: UTF-8
Actually, the damage is done after perl_construct(), not after perl_alloc(). (In reply to Stefan Hundhammer from comment #85) > AFAICS one call to > > uselocale( LC_GLOBAL_LOCALE ); > > after perl_alloc() did the trick, even without needing to save and later > restore LC_CTYPE. So that was the missing puzzle piece to understand why setlocale did not work anymore. Good finding. Inserting it in YPerl does indeed fix the two problems of this bug. Also after switching the language in YaST special characters are correct for me. Setting the locale only for one thread could also be used in YaST itself, e.g. in IniParser in the TemporaryLocale class. Sometimes switching the locale can be avoided completely by using function that take a locale as argument, e.g. nl_langinfo_l. OK, thanks for confirming that this really helps. So, AFAICS we have a problem on several levels here: (1) An embedded Perl interpreter messing up the locale enviromnent for the process that uses that embedded Perl. This is the root cause, and we may only see the tip of the iceberg here. (2) We should work around this by explicitly restoring the locale to the global one after initializing the embedded Perl interpreter. That would get our locale environment back into a sane state, but who knows how that affects the embedded Perl; if that means the locale is now wrong for any evaluated Perl code. (3) We should, if reasonably possible, use nl_langinfo_l() with the global locale instead of making the assumption that nobody messed up our locale environment from the outside, like from embedded Perl. (4) Possibly nl_langinfo() has become buggy. It appears to use the CURRENT locale, whatever was chosen last. But setlocale() appears to use the GLOBAL locale. That is inconsistent. Both should use the same thing. man 3 nl_langinfo: "nl_langinfo() returns a string which is the value corresponding to item in the program's current global locale." Hm... "current global locale"? I would interpret that as meaning "global", so the behavior of using the current locale seems to be wrong IMHO. man 3 setlocale: "The setlocale() function is used to set or query the program's current locale." This is explicitly about the CURRENT locale, not the GLOBAL one. man 3 nl_langinfo also says: "NOTES The behavior of nl_langinfo_l() is undefined if locale is the special locale object LC_GLOBAL_LOCALE or is not a valid locale object handle." Which means that (3) of comment #89 does not work. Yes, but that's why there's duplocale(). Franky, I wonder who came up with this locale handling stuff. PR with the fix: https://github.com/yast/yast-perl-bindings/pull/31 This will arrive in Factory / TW as yast2-perl-bindings-5.0.1. Merged the PR. Manually submitted to OBS with sudo rake osc:sr https://build.opensuse.org/request/show/1148801 Split off bug #1220195 for the Perl part. Our workaround https://github.com/yast/yast-perl-bindings/pull/31 appears to cause bug #1220375 (yast2 users crashing with an idle-looping 'Zypp-main' process with 100% CPU usage). We'll have to revert it or at least find a less aggressive approach. For the time being, we have to revert the workaround / fix https://github.com/yast/yast-perl-bindings/pull/31 to avoid crashes in Perl-based YaST modules like 'yast users' (bug #1220375). PR to revert the last change: https://github.com/yast/yast-perl-bindings/pull/32 Reopening this bug.. The Per part turned to be an upstream bug and mls has applied a fix to SUSE Perl in https://bugzilla.suse.com/show_bug.cgi?id=1220195#c10 and that fixes this bug |
Created attachment 870506 [details] Partitioning proposal screen in Japanese (for example) As shown in attached images, some screens/dialogs can not show non-alphabet characters, instead replaced with "?". I haven't tried it in all languages, but it seems to occur in multiple languages including CJK.