Bug 302010

Summary: /etc/init.d/kbd: setfont breaks first Xserver start
Product: [openSUSE] openSUSE 11.0 Reporter: Stefan Dirsch <sndirsch>
Component: KernelAssignee: Juergen Weigert <jw>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: aj, coolo, cybernerd94, dmueller, eich, jeffm, nvbugs, rodw, roger.larsson, sreeves, sshaw, werner
Version: Alpha 2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: kbd-1.12-132.i586.rpm
kbd-1.12-132.x86_64.rpm
patch 001
patch 002
patch 003
new patch for 002

Description Stefan Dirsch 2007-08-21 00:35:19 UTC
Due to kbd no longer in Required-Start of earlyxdm the

  /bin/setfont -C $tty $CONSOLE_FONT $UMAP $SMAP

line in /etc/init.d/kbd breaks the first Xserver start completely. This only
happens *if* framebuffer is disabled. According to Egbert the handling of vga and fb consoles is quite different. The vga text console code path gets 
tested much less often.

I don't know if this issue can be fixed in setfont. Maybe it's even a kernel problem. I suggest to add kbd again to Required-Start of earlyxdm.
Comment 1 Stefan Dirsch 2007-08-21 00:36:07 UTC
*** Bug 299623 has been marked as a duplicate of this bug. ***
Comment 2 Stefan Dirsch 2007-08-21 00:44:14 UTC
*** Bug 297441 has been marked as a duplicate of this bug. ***
Comment 3 Egbert Eich 2007-08-21 04:31:52 UTC
The problem only occurs when vga text console is active which is a much less often checked code path as most people today use vesafb.
Appearantly setfont performs some hardware access despite of the fact that the console is in KD_GRAPHICS mode. 
The described fix is really a workaround that avoids the real problem. This issue should be looked at.
Please reassign it to me with RESOLVED LATER when done.
Comment 4 Stefan Dirsch 2007-08-21 04:47:33 UTC
*** Bug 298799 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Dirsch 2007-08-21 05:38:52 UTC
Reassigning to Coolo. earlyxdm is part of preload package.
Comment 6 Christoph Thiel 2007-08-21 06:38:11 UTC
Dirk, can you please take care of this bug?
Comment 7 Dirk Mueller 2007-08-21 14:05:27 UTC
hmm. I've looked a bit at the setfont code and the kernel code. I believe that dropping the kbd patch "be-nice-to-kdm" for setfont and replacing it with a return -1 in line 269 would fix the issue. 


one part I don't understand in the kernel is that the PIO_FONT calls don't always work on the vty that was passed in, but most of the time on the current foreground vty. 

however, sometimes they operate on the one that was passed in, and I guess here a check is missing if this is the one in foreground, and otherwise not trash the graphics card. 

to me, its a kernel bug. 
Comment 8 Dirk Mueller 2007-08-21 14:20:01 UTC
this was already noticed in 1998:

http://www.ussg.iu.edu/hypermail/linux/kernel/9808.0/0279.html

so not calling the "compatibility crap" from setfont might be a good idea. 
Comment 9 Juergen Weigert 2007-08-21 18:04:07 UTC
Compatibility crap removed from setfont.
No more fighting against xdm with delay loops.
We are faster now, so earlykbd is now again the way to go.

Submitted to stable.
Comment 10 Juergen Weigert 2007-08-21 18:04:56 UTC
Stefan, please test and close.
Comment 11 Stefan Dirsch 2007-08-21 19:06:34 UTC
Created attachment 158818 [details]
kbd-1.12-132.i586.rpm
Comment 12 Stefan Dirsch 2007-08-21 19:07:25 UTC
Created attachment 158819 [details]
kbd-1.12-132.x86_64.rpm
Comment 13 Stefan Dirsch 2007-08-21 19:18:26 UTC
Magnus, please test again with disabled framebuffer and with the updated kbd RPM installed. Also add kbd to Required-Start of earlyxdm.
Comment 14 Dirk Mueller 2007-08-21 20:04:09 UTC
thats not gonna fix it. I was just able to reproduce this issue on my ATI based laptop. 

and given some patience, I even figured out the kernel code being responsible for it. 

Comment 15 Magnus Boman 2007-08-22 08:08:56 UTC
Clearing needinfo. Awaiting further information as per Comment#14
Comment 16 Dirk Mueller 2007-08-22 09:43:15 UTC
Created attachment 158945 [details]
patch 001
Comment 17 Dirk Mueller 2007-08-22 09:43:37 UTC
Created attachment 158947 [details]
patch 002
Comment 18 Dirk Mueller 2007-08-22 09:44:00 UTC
Created attachment 158948 [details]
patch 003
Comment 19 Egbert Eich 2007-08-24 10:31:12 UTC
I'm not really sure if patch 2 and 3 are correct. On fbdev consoles the fonts can be set per console and the font data is not kept in hardware. So there is no reason to restrict setting fonts to be set only on the visible console. 
In fact the patch may break our setfont stuff as it will only set the correct font on the visible console.
That the font data is kept in hardware (and only there) and can only be set for all consoles at once is a property of the VGA console. 
This this problem should there be fixed there, which means the VGA console should bail out if someone tries to write a font while the console is in KD_GRAPHICS mode. 
According to the other comments here I assume (but have not checked) that only the console that is opened by the Xserver is actually put into KD_GRAPHICS.
Therefore we need to check all consoles in vgacon if one of them is this mode and bail if this is the case. 
This tiny (completely untested patch) should do this job:

--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ -1032,6 +1032,11 @@ static int vgacon_do_font_op(struct vgas
        unsigned short video_port_status = vga_video_port_reg + 6;
        int font_select = 0x00, beg, i;
        char *charmap;
+
+       for (i = 0; i < MAX_NR_CONSOLES; i++) {
+               if (vc_cons[i].d->vc_mode == KD_GRAPHICS)
+                       return -EINVAL;
+       }
        
        if (vga_video_type != VIDEO_TYPE_EGAM) {
                charmap = (char *) VGA_MAP_MEM(colourmap, 0);

This also means that we need to make rcxdm run *after* setfonts() has run.


(The only way to save/restore fonts on vga console is to read them out to user space and restore them from there).
Comment 20 Dirk Mueller 2007-08-24 11:21:57 UTC
Created attachment 159682 [details]
new patch for 002
Comment 21 Dirk Mueller 2007-08-24 11:24:21 UTC
I agree, patch 3 is not really correct. I've updated patch 2 to use vga_is_gfx instead, which should fix the bug. 
Comment 22 Jeff Mahoney 2007-08-25 15:30:44 UTC
Dirk, to clarify, your latest attachment is the fix?
Comment 23 Dirk Mueller 2007-08-27 07:51:04 UTC
the latest attachment is the minimally necessary fix. patch 001 is useful as well, though on factory we patched the setfont utility to no longer activate this code path. 

Note that there is a another duplicate of this bug tracking the very same bug for SLE10. 
Comment 24 Egbert Eich 2007-08-27 09:09:40 UTC
The second patch prevents setfont from accessing vga registers on the card when the card is in graphics mode KD_GRAPHICS as we assume, that someone else (ie. the Xserver) is in charge of the HW in which case accessing the vga registers may (at best) have no effect (not even the desired one) or (at worst) interfer with settings the graphics driver has made.
Comment 25 Stephan Kulow 2007-09-03 12:50:48 UTC
I'm going to revert the work around now that we have a fix. So dear kernel maintainers: please take care.
Comment 26 Stephan Kulow 2007-09-04 12:38:04 UTC
For the worried: we decided not to rush it into beta3 (in case beta3 delays, we'll put it in, if not not) but make an online update. Also to test kernel updates
Comment 27 Hannes Reinecke 2007-09-04 14:05:54 UTC
Patch has been added to our kernel CVS for STABLE.
Comment 28 Stefan Dirsch 2007-09-07 21:31:34 UTC
*** Bug 308769 has been marked as a duplicate of this bug. ***
Comment 29 Stefan Dirsch 2007-09-07 21:35:19 UTC
http://en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3_dev

* setfont breaks first Xserver start (Bug #302010) -> online update pending
Comment 30 Stephan Kulow 2007-09-08 05:45:02 UTC
no longer pending - it's released.
Comment 31 Stefan Dirsch 2007-09-08 05:57:04 UTC
Thanks. Scott, please test if the kernel update really fixes the problem.
Comment 32 Stefan Dirsch 2007-09-09 07:04:01 UTC
*** Bug 309015 has been marked as a duplicate of this bug. ***
Comment 33 Ladislav Michnovic 2007-09-11 08:57:09 UTC
I have applied the kernel update. Seemed to be fixed. But: Have you tried switch to init 3 and than back to init 5? The graphical output get distorted then again. 
Comment 34 Stephan Kulow 2007-09-11 09:20:33 UTC
don't hijack bugs. init5 -> init 3 -> init 5 is hardly the first X server start
Comment 35 Stefan Dirsch 2007-09-11 09:38:00 UTC
I'm not sure, Coolo. It could be the same issue. Ladislav, is the issue still reproducable after adding kbd as Required start to earlyxdm or enabling the kernel framebuffer (I'm asuming it is disabled)? Is the graphical output fixed after pressing Ctrl-Alt-BS (starts a new Xserver)?
Comment 36 Ladislav Michnovic 2007-09-11 09:59:26 UTC
(In reply to comment #35 from Stefan Dirsch)
> Ladislav, is the issue still reproducable after adding kbd as Required start to earlyxdm 

After that X server is broken right after booting. 

>or enabling the
> kernel framebuffer (I'm asuming it is disabled)?
How can I enable framebuffer on old PCI Matrox card? 
Maybe someone should try to reproduce this on different machine with more recent graphics card. 

> Is the graphical output fixed
> after pressing Ctrl-Alt-BS (starts a new Xserver)?
Yes. 

Comment 37 Stefan Dirsch 2007-09-11 10:15:51 UTC
> How can I enable framebuffer on old PCI Matrox card? 
Generic kernel framebuffer! Add vga=0x317 to each "kernel ..." line in /boot/grub/menu.lst. Even an old Matrox PCI card supports generic kernel framebuffer.
Comment 38 Dirk Mueller 2007-09-11 11:46:49 UTC
I think this problem has nothing to do with this particular bug. please open a new bugreport. 
Comment 39 Ladislav Michnovic 2007-09-11 11:47:18 UTC
After reboot it's everything O.K. also with framebuffer. I can't reproduce it even on my workstation. I think it's fixed. Closing. Sorry for the noise. 
Comment 40 Ladislav Michnovic 2007-09-11 11:52:55 UTC
*** Bug 298795 has been marked as a duplicate of this bug. ***
Comment 41 Stefan Dirsch 2007-09-11 13:03:08 UTC
> also with framebuffer
The issue does not occur with framebuffer. If it went away by enabling the framebuffer it is rather likely that the issue still exists. :-(
Comment 42 Ladislav Michnovic 2007-09-11 13:07:22 UTC
(In reply to comment #41 from Stefan Dirsch)
> > also with framebuffer
> The issue does not occur with framebuffer. If it went away by enabling the
> framebuffer it is rather likely that the issue still exists. :-(
> 

It stop occurring at all. It's probably hw problem. 
I have more strange problems on that machine so I think this needs proper testing on more computers with various hw constellations.
Comment 43 Stefan Dirsch 2007-09-11 13:11:24 UTC
Ok. So it's more a coincidence.
Comment 44 Rodney Wilder 2007-09-28 02:13:23 UTC
I've been having issues with X on my 10.3rc1 64 bit machine and all bugs I find matching my issues refer to this bug.  Was this supposed to have been fixed in rc1 or is this still a work in progress?
Comment 45 Stefan Dirsch 2007-10-15 08:14:52 UTC
*** Bug 331624 has been marked as a duplicate of this bug. ***
Comment 46 Stefan Dirsch 2007-10-28 14:53:42 UTC
Given the amount of bugs we got for this issue since releasing 10.3 I no longer think this Bug can be considered fixed. :-( 

Workaround to fix race condition between kbd and earlyxdm
--------------------------------------------------------- 

1) add kbd to "Should-Start:" line of /etc/init.d/earlyxdm
2) run insserv
3) reboot the machine

If this still doesn't help, add 'sleep <seconds>' right before 
'exec /etc/init.d/xdm ${1+"$@"}' in /etc/init.d/earlyxdm. Reboot.

IMHO we should make 1) the default for 11.0 and even provide an updated preload RPM for 10.3. I think that making sure that our customers get a proper keyboard after booting is more important than gaining maybe once second in the boot process.
Comment 47 Stefan Dirsch 2007-10-28 15:04:34 UTC
*** Bug 309484 has been marked as a duplicate of this bug. ***
Comment 48 Stefan Dirsch 2007-10-28 15:14:48 UTC
*** Bug 331528 has been marked as a duplicate of this bug. ***
Comment 50 Stephan Kulow 2007-10-29 09:06:01 UTC
Stephan, is there a way to find out from the system configuration if the problem will hit? I did not read all comments in all duplicates, but it seems it's frame buffer specific and we can disable earlyxdm only on those for 10.3
Comment 52 Stefan Dirsch 2007-10-29 10:06:32 UTC
(In reply to comment #50 from Stephan Kulow)
> Stephan, is there a way to find out from the system configuration if the
> problem will hit? I did not read all comments in all duplicates, but it seems
> it's frame buffer specific and we can disable earlyxdm only on those for 10.3

I'm not sure if it only happens with enabled framebuffer. I'm not aware of any confirmation of the affected persons in this bugreport that the issue has been fixed with the kernel patch when framebuffer is disabled. Magnus, Stephen (Shaw), Ladislav, Scott, Julian?

I'm not aware of any way to find out from the system configuration if the
problem will hit. I'm only aware of a way to prevent this issue. ;-)




Comment 53 Henryk Hecht 2007-10-29 18:54:04 UTC
In the case of bug #331528, the discriminating factor appears to be whether KBD_TTY in /etc/sysconfig/keyboard contains "tty1...tty6" or "tty1...tty20".  The latter is problematic, while the former isn't.  I can't speak for the other duplicates, but from the similarity of the reports, I think it is very likely that the same applies.
Comment 54 Stephan Kulow 2007-10-30 13:25:31 UTC
Stefan: if you add kbd to earlyxdm you pretty much erase the difference between earlyxdm and xdm, because kbd is started very late in the process (which is also the reason for the trouble :).

How can tty20 end up in KBD_TTY?

From init.d/kbd:
if test -z "$KBD_TTY"; then
        # >=tty7 left out intentionaly
        KBD_TTY="tty1 tty2 tty3 tty4 tty5 tty6"
fi

So I guess the fix is to filter out tty7 even if it's in sysconfig. JW, that would be your work around then :)
Comment 55 Ladislav Michnovic 2007-10-30 13:39:27 UTC
Regarding to bug #309484: I did kernel update with YOU, I don't have framebuffer and I still cant see correct special czech characters displayed. Instead "?" I see strange font. But no other problems occur.
Comment 56 Stefan Dirsch 2007-10-30 14:22:51 UTC
Well, I just don't believe that this issue only occurs on systems with tty7 and higher in KBD_TTY.
Comment 57 Stefan Dirsch 2007-10-30 14:23:46 UTC
Ladislav, how does your KBD_TTY line look like?
Comment 58 Ladislav Michnovic 2007-10-30 14:32:25 UTC
file /etc/init.d/kbd; from line 85:
# Calculate KBD_TTY array only once
# Caution: Keep in sync with earlykbd.init
#
if test -z "$KBD_TTY"; then
        # >=tty7 left out intentionaly
        KBD_TTY="tty1 tty2 tty3 tty4 tty5 tty6"
fi

newkbd=""
for tty in $KBD_TTY; do
    test -w /dev/$tty           || continue
    test -c /dev/$tty           || continue
    > /dev/$tty &> /dev/null    || continue
    newkbd="${newkbd:+$newkbd }/dev/$tty"
done
KBD_TTY="$newkbd"
unset newkbd
Comment 59 Stefan Dirsch 2007-10-30 14:59:03 UTC
So checking for tty7 wouldn't help at all.
Comment 60 Henryk Hecht 2007-10-30 22:58:32 UTC
Regarding comment #54: ttys 7-20 ended up in /etc/sysconfig/kbd because suse put them there at some point.  Just as with the other reporter who had the extra ttys, this system has been upgraded many times, so I've no idea at what point that was.  They stay in KBD_TTY because nothing in /etc/init.d/kbd removes them!

Regarding comments #58-59: KBD_TTY is only set in /etc/init.d/kbd if it is empty or unset at that point, after sourcing /etc/sysconfig/keyboard.  In other words, it just provides a default.  Certainly, in the case that it's tty{1..20} the conditional is dead code and the filter does nothing.  If setfont should really not be called on ttys > 6, then /etc/init.d/kbd should actually filter the list based on device name, not just on whether it is a writable character device.

If I understand the comments above, Mr. Michnovic no longer has any problems logging in, but only with display of Czech characters; this sounds like a very different bug from the setfont/kdm login bugs that have been marked as duplicates of this one...whatever the case, my own setfont problems were strictly limited to the extra ttys being setfont'd after earlyxdm, and this was the case for at least some other submitters.
Comment 61 Stefan Dirsch 2007-10-31 18:12:43 UTC
Ok. Let's begin to make sure to no longer use setfont for tty7 and higher.
Comment 62 Stefan Dirsch 2008-01-08 14:28:36 UTC
Mike is another victim.

magellan:/etc/sysconfig/keyboard

KBD_TTY="tty1 tty2 tty3 tty4 tty5 tty6 tty7 tty8 tty9 tty10 tty12 tty13 tty14 tty15 tty16 tty17 tty18 tty19 tty20"

Mike, could you remove anything higher than tty6? This should help in the future.
Comment 63 Mike Fabian 2008-01-08 15:11:55 UTC
Thank you, I've removed the entries higher than tty6 now.
Comment 64 Stefan Dirsch 2008-01-30 17:46:21 UTC
SUSE LINUX 10.1 / SLES10 was already limited to the range of tty1-tty6. So this changed somewhere between SLES9 (SUSE LINUX 9.1) and SLES10 from range of tty1-tty20 to range of tty1-tty6. So only users, who are updating since at least SuSE Linux 10.0 or SLES9 can/will be affected by this issue.

... -> 10.0 -> 10.1 -> 10.2 -> 10.3 -> ...
... -> SLES9 -> SLES10 -> SLES11 -> ...

I'm mostly concerned about SLES customers here.
Comment 66 Stefan Dirsch 2008-01-30 20:31:15 UTC
This one is much more easier. :-) Thanks, Jürgen!

  lsof /dev/tty[0-9]* | grep X | sed -e 's@.* @@'
Comment 67 Dr. Werner Fink 2008-01-31 11:02:58 UTC
With this /usr/bin/lsof should become /bin/lsof ..
and lsof can be used within other programs with e.g.

   lsof -nF cpn -- /dev/tty[0-9]*

which list with identifiers the  process command name, the
process ID, and the file name.

An other option would be to use fuser

    fuser -n file  /dev/tty[0-9] /dev/tty[0-9][0-9]  2>&1

but note that the pids are on stdout whereas the names will
be printed on stderr (therefore I've used the redirect).
And the pid has to be checked with ps or somehow else.

At last but not least the normal ps command could be used:

    . /etc/sysconfig/keyboard
    ps -o tty=,comm= t ${KBD_TTY// /,}
    tty4     mingetty
    tty5     mingetty
    tty6     mingetty
    tty2     mingetty
    tty3     mingetty
    tty7     X
    tty1     mingetty


with tty it looks like

    ps -o pid=,comm= t tty1,tty2,tty3,tty4,tty5,tty6,tty6,tty7 
    tty4     mingetty
    tty5     mingetty
    tty6     mingetty
    tty2     mingetty
    tty3     mingetty
    tty7     X
    tty1     mingetty
Comment 68 Stefan Dirsch 2008-02-04 08:31:25 UTC
Well, thinking about it again. It's still a race. So checking for X using some tty won't help us in the end. :-( Better just not using any tty > 6.
Comment 69 Stefan Dirsch 2008-02-10 01:28:12 UTC
*** Bug 360373 has been marked as a duplicate of this bug. ***
Comment 70 Stefan Dirsch 2008-03-21 17:49:11 UTC
(In reply to comment #68 from Stefan Dirsch)
> Well, thinking about it again. It's still a race. So checking for X using
> some tty won't help us in the end. :-( Better just not using any tty > 6.

Fixed for STABLE/Factory and openSUSE 11.0 Beta1.