Bugzilla – Bug 367801
strange non ascii characters in text console after national keyboard loaded
Last modified: 2008-09-24 12:39:07 UTC
Created attachment 199193 [details] saved file with incorect chars When I change the keyboard layout in console (that means init 3) to my region > loadkeys sk-qwerty then it writes strange characters on console. When I write +?š??žýáíé it displays +?????áíé. I saved it to a file but tha charset doesn;t seem to be correct. Attaching file.
Looks like simple cat(ing) utf8 text on console doesn't work either, browsing html pages with non ascii characters gives the the same results.
ping
Hello. Is anybody to pick up this bug? I would really appreciate fixing this issue.
I've seen the same behavior for czech layout (cz-lat2-us). However, calling "yast keyboard set layout=slovak-qwerty" (or czech-qwerty) TWICE seems to "fix" the problem...
Jeff, can anybody take a look at this issue, please? It is untouched quite long time.
I don't see how this is a kernel bug, but 2.6.25.4 did have a fix in it for UTF-8 characters on the console, so this might be fixed now. Can you try Beta 3 and let us know?
It's still the same in Beta3 but it has kernel 2.6.25.3. I will try to upgrade the kernel.
Can't boot kernel 2.6.25.4: BUG: Int 14: CR2 1007c039. Shall I submit a new bug?
Yes, that's not good at all, .4 or newer is what we want to ship with... Please create a new bug.
It's filed as bug #392113.
(In reply to comment #6 from Greg Kroah-Hartman) > I don't see how this is a kernel bug, but 2.6.25.4 did have a fix in it for > UTF-8 characters on the console, so this might be fixed now. > I get kernel 2.6.25.4 to work. The behavior changes slightly. After "loadkeys sk-qwerty" it writes wrong encoded characters instead of question marks. It looks like latin2 encoding instead of UTF-8.
(In reply to comment #4 from Jiri Suchomel) > I've seen the same behavior for czech layout (cz-lat2-us). > However, calling "yast keyboard set layout=slovak-qwerty" (or czech-qwerty) > TWICE seems to "fix" the problem... > This stop working with kernel 2.6.25.4. It sets always English layout. Any idea Jiri? I think this will broke yast country module in next release.
I don't have an idea, the command above worked just by a chance. YaST country module just uses the loadkeys command, so the problem is elsewhere.
What is the value in /sys/module/vt/parameters/default_utf8 please? Does altering it change behavior in any way?
File /sys/module/vt/parameters/default_utf8 contains: "1" I can't change the file.
*** Bug 303957 has been marked as a duplicate of this bug. ***
It seems that there is more than one problem involved: - commit 04c71976 in the mainline kernel broke the remapping from the 8-bit keymap values (lower byte of KT_LETTER keysyms) to unicode. I just send an e-mail to the author of the patch and to LKML. The following patch fixes it for me: --- a/drivers/char/keyboard.c +++ b/drivers/char/keyboard.c @@ -678,10 +678,7 @@ static void k_deadunicode(struct vc_data *vc, unsigned int value, char up_flag) static void k_self(struct vc_data *vc, unsigned char value, char up_flag) { unsigned int uni; - if (kbd->kbdmode == VC_UNICODE) - uni = value; - else - uni = conv_8bit_to_uni(value); + uni = conv_8bit_to_uni(value); k_unicode(vc, uni, up_flag); } - there is a problem with the way YaST sets the console map (CONSOLE_SCREENMAP in /etc/sysconfig/console) for Czech. It uses the "trivial" mapping, which cannot work for anything else than latin1 characters. We need to set this to 8859-2, because that's the encoding the Czech keyboard map uses to represent the Czech characters. The kernel uses this map to convert the keymap-supplied values to unicode. - in /etc/init.d/kbd we first call loadkeys and we call setfont after that. This needs to be done the other way round, because the kernel uses the console map to convert the compose table (used for dead keys) to unicode. The order in which we do these twho things causes the dead keys not to work correctly, until rckbd is run for the second time (when the loadkeys will finally be called with the correct console map loaded). So, let's fix the kernel first - I'll report back when I get a reasonable response to the mail I just sent to LKML.
The kernel patch got accepted upstream, it will appear in the kernel cvs's HEAD with 2.6.26-rc7. Let's wait until it's in our KOTDs to make it easier for the YaST folks to test things. There is quite a lot to fix. Right now i noticed another thing in the /etc/sysconfig/kbd script that breaks things. The following line: dumpkeys | loadkeys -C "$KBD_TTY" --unicode can never work for anything but latin1. Dumpkeys gets the keymap from the kernel and does not know (the kernel does not know either) which 8-bit encoding it is in. It defaults to latin1 and when the output is piped to loadkeys it loads a completely bogus keymap. Simply put, loadkeys foo; dumpkeys produces a fundamentally different keymap than foo. We may pass the correct codepage to dumpkeys, but now that the kernel is fixed again, this whole line seems totally unnecessary. I'll re-assign this bug to the kbd maintainer when the fixed kernel appears in STABLE.
still not in STABLE, but the fixed kernel is already in KOTD ( ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/ ) Juergen, can you please try to fix the kbd package now? I am convinced that the "dumpkeys | loadkeys -C "$KBD_TTY" --unicode" line in the initscript is not necessary at all. If you think we need it, dumpkeys definitely need the codepage argument, otherwise it cannot work. (see comment #18) Also, setfont needs to be called before loadkeys (see comment #17). Steps to check whether it works: - set the following in /etc/sysconfig console: CONSOLE_FONT="lat2-16.psfu" CONSOLE_SCREENMAP="8859-2" CONSOLE_ENCODING="UTF-8" - set KEYTABLE="cz-us-qwertz.map.gz" in /etc/sysconfig/keyboard - reboot (this is important, do not only restart kbd) - in the console, press the "234567890" keys on a US keyboard. You should see the following characters: "ěščřžýáíé" (this should be working once you remove the "dumpkeys | loadkeys") - press "Shift" + the "= / +" key (next to backspace), followed by the "e" key. You should get "ě". This should start working once you call loadkeys after setfont (This is why the reboot is important - the console map stays loaded, so if you do "rckbd restart" without fixing the order of setfont/loadkeys, it seems to be working even though it's in fact broken)
thanks for comment#18. I never understood how this was supposed to work, but I left it in, as it helped me with latin1. I currently have no test machines available. Will take a while before I attend to this.
So, what is the current status of this bug? And which changes in YaST are required?
Ping. Please let's try to fix this old issue for Code 11...
mmarek submitted kbd-1.14 to STABLE. Is the issue is reproducible with the new version.
(In reply to comment #23 from Juergen Weigert) > mmarek submitted kbd-1.14 to STABLE. > Is the issue is reproducible with the new version. Sadly, it is. In https://build.opensuse.org/package/show?package=kbd&project=home%3Amichal-m%3Atest there's a kbd with the fixes suggested by jbohac. I'll clean it up and submit to factory tomorrow. (In reply to comment #21 from Jiri Suchomel) > So, what is the current status of this bug? And which changes in YaST are > required? Yes, as Jiri wrote, /etc/sysconfig/console:CONSOLE_SCREENMAP needs to be 8859-2 for the Czech (and Slovak presumably) keyboard to work.
(In reply to comment #24 from Michal Marek) > (In reply to comment #21 from Jiri Suchomel) > > So, what is the current status of this bug? And which changes in YaST are > > required? > > Yes, as Jiri wrote, /etc/sysconfig/console:CONSOLE_SCREENMAP needs to be > 8859-2 for the Czech (and Slovak presumably) keyboard to work. And what about other languages? YaST uses the table from /usr/share/YaST2/data/consolefonts.ycp - and currently for each keyboard layout there is only "none" or "trivial" (for UTF-8) value...
I don't know. pl, cs, sk and hu all use the lat2-16.psfu font, so far only sk and cz keyboards were reported as broken. I can try typing something in Hungarian, let's see...
All of pl, cs, sk and hu display black-on-white question marks instead of some characters when the trivial map is used and work with -m 8859-2. jbohac: does this mean that for every non-latin1 language YaST has to choose a special map? What has changed in the kernel that it doesn't work without a map anymore?
I am not aware of any change in the kernel. I think it simply never worked since we use Unicode for the console. I did not test, but I think that when the console is in an 8-bit encoding (e.g. 8859-2 for cs_CZ), it will work well with the trivial map. Or are you aware of any distro we shipped where it worked well in UTF-8 mode? So yes, we need to specify a screen map for each non-latin1 language in UTF-8 mode. I think the correct mapping should be easy to decode from the corresponding font's name...
We are using this fonts in YaST: lat9w-16.psfu iso07u-16.psfu lat2-16.psfu Cyr_a8x16.psfu iso09.f16n Languages would lat2 should use 8859-2 as CONSOLE_SCREENMAP. The ones (most of them, including en_US) with lat9w should probably go with 8859-9. I'm not sure about iso07u and iso09: is it correct to use 8859-7 and 8859-9? And even more I'm confused with cyrillic - currently we have this mapping in yast: // LANG font unicode map screen map, magic "ru_RU.KOI8-R" : [ "Cyr_a8x16.psfu", "", "koi2alt", "(K" ], "ru" : [ "Cyr_a8x16.psfu", "", "koi2alt", "(K" ], "ru_RU.UTF-8" : [ "Cyr_a8x16.psfu", "", "trivial", "(K" ], Now, what should be the screen map for ru_RU.UTF-8?
(In reply to comment #28 from Jiri Bohac) > I am not aware of any change in the kernel. I think it simply never worked > since we use Unicode for the console. ... > Or are you aware of any distro we shipped where it worked well in UTF-8 mode? You are right, it newer worked. I tried with a 10.2 kernel and it *seems* to work there: The 'š' (U+0161 LATIN SMALL LETTER S WITH CARON) key displays a 'š' on the console, but the application gets 0xc2 0xb9 (the latin2 code for 'š' encoded to UTF8), instead of the correct 0xc5 0xa1 code. (In reply to comment #29 from Jiri Suchomel) > I'm not sure about iso07u and iso09: is it correct to use 8859-7 and 8859-9? > > And even more I'm confused with cyrillic - currently we have this mapping in > yast: > > // LANG font unicode map screen map, magic > "ru_RU.KOI8-R" : [ "Cyr_a8x16.psfu", "", "koi2alt", "(K" ], > "ru" : [ "Cyr_a8x16.psfu", "", "koi2alt", "(K" ], > "ru_RU.UTF-8" : [ "Cyr_a8x16.psfu", "", "trivial", "(K" ], > > > Now, what should be the screen map for ru_RU.UTF-8? I really don't know (I don't know Russian), when I tried the other day, it seemed to work (it displayed some Russian characters) with the trivial map, but I might have fallen to a similar trap as above.
This is my, possibly wrong (!!) idea to determine the screen map for individual languages: The screen map should correspond to the encoding that loadkeys loads the keyboard map in. For Czech, iso-8859-2 is specified in the keymap (sing the charset keyword), which instructs loadkeys to convert the symbolic names (e.g. ecaron) to the correct KT_LATIN keycode with the 8859-2 value (instead for example the cp1250 code which would be used if the keymap was set to cp1250). So, if there is a "charset" keyword in the keyboard map we use by default for the language, it will almost definitely be what we need the screen map to be. Some maps needn't use the charset "keyword", however, because loadkeys will automatically use the code from the charset that it finds first in its list of charsets (see ksymtocode() in src/ksyms.c) - for these maps, it should probably work to simply find a couple of the non-latin1 symbols defined in the map and look up which table they are first defined in in ksyms.c. Some maps may be written entirely in unicode, either specifying "charset unicode" or by specifying the codes directly with the U+xxxx notation(e.g. bg_bds-utf8.map.gz) These should be OK with the trivial screen map I *think* ;-)
I adapted the screen map values for latin2 languages and Lithuanian in yast2-country-2.17.12. For the rest, I don't know the correct values, so it will be fixed when someone raises the specific issue.