Bug 152101

Summary: Switching to runlevel 1 confuses X
Product: [openSUSE] openSUSE 10.2 Reporter: Jon Nelson <jnelson-suse>
Component: X.OrgAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED NORESPONSE QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Normal    
Priority: P5 - None CC: dkukawka, suse-beta, werner
Version: RC 1   
Target Milestone: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /var/log/messages
Xorg.0.log
xorg.conf
'startx' run in a 'script' session
hwinfo all
xorg.conf from 10.3
Xorg.0.log from 10.3

Description Jon Nelson 2006-02-19 16:49:55 UTC
Did a 10.1beta4 install using gnome.
After booting, I can log in (as root), type 'init 1', and the machine will hard lock.

From /var/log/messages:

Feb 19 10:33:16 linux init: Switching to runlevel: 1
Feb 19 10:33:18 linux gconfd (root-3435): Received signal 15, shutting down cleanly
Feb 19 10:33:18 linux gconfd (root-3435): Exiting
Feb 19 10:33:19 linux gdm[3004]: Error reinitilizing server
Feb 19 10:33:24 linux auditd[2334]: The audit daemon is exiting.
Feb 19 10:33:24 linux kernel: audit(1140366804.628:6): audit_pid=0 old=2334 by auid=4294967295
Feb 19 10:33:25 linux sshd[2806]: Received signal 15; terminating.
Feb 19 10:33:25 linux zmd-bin: ShutdownManager (WARN): Shutting down daemon...
Feb 19 10:33:26 linux kernel: Kernel logging (proc) stopped.
Feb 19 10:33:26 linux kernel: Kernel log daemon terminating.
Feb 19 10:33:27 linux syslog-ng[2167]: syslog-ng version 1.6.8 going down

The first time I did this, X didn't even exit. The second time, X looks like it started to exit, but then failed - the screen display is corrupted.  I looked through /all/ of the logs and found nothing other than what you see above.

This is 100% reproduceable.

I can NOT reproduce if I issue the 'init 1' *from the console*, regardless of whether or not X is running. This only seems to happen if I issue the 'init 1' /while logged into graphically/.
Comment 1 Christian Boltz 2006-02-19 21:55:09 UTC
How hard is the machine locked really?
- can you ping it?
- can you login using ssh?
- does ctrl-alt-F1 work?

Besides that, it would be interesting to run   sleep 10 ; init 1   and switch to tty10 (ctrl-alt-F10) quickly. Are there more error messages?
Comment 2 Jon Nelson 2006-02-19 22:32:09 UTC
It locks almost instantly sometimes (more on that in a moment).  It does not respond to pings, ssh, the keyboard is dead.  It's dead Jim.

I notice that when I type 'init 1' and then hit enter, it's almost as if I hold the enter key down. If I actually hit enter (again), the machine locks that much sooner.

There are no more error messages.

I'll give your sleep 10; init 1 test a try.
Comment 3 Jon Nelson 2006-02-19 22:40:36 UTC
Gave it a try - sleep 10; init 1 in a logged in (graphical) does *not* hang the machine.
Comment 4 Michael Gross 2006-02-20 11:13:48 UTC
OK then... Jon: Attach the last 500 lines of your syslog here.
Comment 5 Jon Nelson 2006-02-20 14:55:45 UTC
All of the messages which might be relavant are already attached.
I did not remove any lines.
The line that follows 'syslog-ng version 1.6.8 going down' is syslog-ng coming back up several minutes later due my reseting the box.

I ran a recursive grep in /var/log for 'Feb 19 10:33' and that's all that is there.  Can you reproduce this?
Comment 6 Michael Gross 2006-02-20 16:30:34 UTC
The problem might be located somewhere you wouldn't expect or could be a result from another problem or condition. So please attach the complete log here.
Comment 7 Jon Nelson 2006-02-21 02:16:13 UTC
Created attachment 69411 [details]
/var/log/messages
Comment 8 Michael Gross 2006-02-21 15:02:17 UTC
Jon: Before we reassign this. Have you tried booting with ACPI=off or a combination of the options for the safe settings? Does the machine still lock then?
Comment 9 Jon Nelson 2006-02-21 15:20:34 UTC
I have not tried with ACPI=off. I'll try to give that a try tonight (it'll be about 11 hours from now).
Comment 10 Michael Gross 2006-02-21 17:20:42 UTC
Also try if the lockup occurs when entering other runlevels (0 and 6 especially).
Comment 11 Jon Nelson 2006-02-21 18:10:39 UTC
I'm just taking a guess, but is it possible that everything else is working just fine *except* that X is not being killed?

Feb 19 10:33:19 linux gdm[3004]: Error reinitilizing server

And the display also does not change from my graphical login.  I posit that everything else worked. This seems like a gdm/X/gnome problem.

I'll try all of the above tonight (ACPI=off, try runlevels 0 and 6 and others).
Comment 12 Jon Nelson 2006-02-22 02:36:10 UTC
Starting from an xterm, *none* of init 2, 3, or 4 seem to make /any/ difference. The disk flashes a bit, another xterm with tail -F /var/log/messages shows nothing special, but X does /not/ exit.  I can continue to type and do things in my xterm.

init 6 does reboot the machine - X does *not* exit until the "reboot" actually takes place, however my console stops taking my input almost immediately - what I see is that I type 'init 6', press enter, and then even though I've let go of the enter key, it *behaves* as though I've held it down.  If I hit the enter key even just once, the keyboard stops taking input alltogether.

init 0 also shuts the machine off with much the same behavior as above.

ACPI=off had no effect whatsoever.

As an added test, I replaced gdm with xdm in /etc/sysconfig/displaymanager

Upon init 1, X immediately exits. However, it returns me to the login screen for xdm!  Then the display is corrupted and the keyboard doesn't work.

For some reason, X (xdm/gdm/whatever) is being *restarted* instead of *stopped*.

Comment 13 Michael Gross 2006-02-22 12:05:25 UTC
> what I see is that I type 'init 6', press enter, and then even though I've let
> go of the enter key, it *behaves* as though I've held it down.

I already filed a report about this and I am beginning to suspect that these problems might be connected. Have a look at bug #152522 - although I must mention that on my system I did not watch such a behaviour.

This might nontheless be a problem with sysvinit itself.

Steffen: Can you provide a comment here?
Comment 14 Stefan Dirsch 2006-02-22 12:11:46 UTC
Steffen --> snwint
Comment 15 Michael Gross 2006-02-22 12:31:05 UTC
Sorry, Stefan, I ment you, this might be an X-realted problem. Is there something known, this could also be connected to the logout-problem somehow. Maby X does not handle SIGTERM correctly?
Comment 16 Stefan Dirsch 2006-02-22 15:13:29 UTC
I propose to switch to runlevel 3 first. If this crashes your machine we can talk about an X11 related issue.
Comment 17 Jon Nelson 2006-02-22 15:26:41 UTC
As I said in comment #12, I can type 'init 3' in an xterm and X does NOT exit, and in fact it's almost as if nothing happened.  I can still use the xterm to do things, but it is clearly in runlevel 3. 
Comment 18 Michael Gross 2006-02-22 15:33:27 UTC
Jon... please create a backtrace of the init-call - this might give us the required clue here.
Comment 19 Jon Nelson 2006-02-22 15:38:32 UTC
To be strictly clear, do you mean an strace?  The only way I know how to generate a backtrace is with gdb.

How about something like 'strace -o foo -tt -f -s 256 init 1'
Comment 20 Michael Gross 2006-02-22 15:53:45 UTC
It should do fine.
Comment 21 Jon Nelson 2006-02-22 20:14:19 UTC
init:

3281  12:19:18.228284 execve("/sbin/init", ["init", "1"], [/* 79 vars */]) = 0
3281  12:19:18.229417 uname({sys="Linux", node="linux", ...}) = 0
3281  12:19:18.229796 brk(0)            = 0x80c4000
3281  12:19:18.229970 brk(0x80c4c90)    = 0x80c4c90
3281  12:19:18.230145 set_thread_area({entry_number:-1 -> 6, base_addr:0x80c4830, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_no
t_present:0, useable:1}) = 0
3281  12:19:18.230502 brk(0x80e5c90)    = 0x80e5c90
3281  12:19:18.230684 brk(0x80e6000)    = 0x80e6000
3281  12:19:18.230882 umask(022)        = 022
3281  12:19:18.231071 geteuid32()       = 0
3281  12:19:18.231234 getpid()          = 3281
3281  12:19:18.231415 rt_sigaction(SIGSTOP, {SIG_IGN}, NULL, 8) = -1 EINVAL (Invalid argument)
3281  12:19:18.231637 rt_sigaction(SIGTERM, {SIG_IGN}, NULL, 8) = 0
3281  12:19:18.231817 rt_sigaction(SIGALRM, {0x8048250, [], 0}, NULL, 8) = 0
3281  12:19:18.232037 alarm(3)          = 0
3281  12:19:18.232217 open("/dev/initctl", O_WRONLY|O_LARGEFILE) = 3
3281  12:19:18.232455 write(3, "i\31\t\3\1\0\0\0001\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
3281  12:19:18.270479 close(3)          = 0
3281  12:19:18.270704 alarm(0)          = 3
3281  12:19:18.270895 exit_group(0)     = ?

and for good measure (lot of good it'll do me), xdm:

2887  12:19:03.579575 rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)

This shows that xdm never goes any signal or anything.
Comment 22 Michael Gross 2006-02-23 10:32:44 UTC
You mean that xdm does not follow/handle the signal or that is does not get one, cannot tell as you left out the argument to rt_sigsuspend.
Comment 23 Jon Nelson 2006-02-23 20:06:58 UTC
(In reply to comment #22)
> You mean that xdm does not follow/handle the signal or that is does not get
> one, cannot tell as you left out the argument to rt_sigsuspend.

That is the entire output from 

strage -o xdm.trace -f -s 256 -tt -p PID_OF_XDM

"I" did not leave the argument out, strace did. ;-)
Comment 24 Michael Gross 2006-02-24 17:06:44 UTC
This is either a problem with init or with X11, but I think it's X11 as init works well for other processes. I'm adding Werner into CC for a comment.
Comment 25 Stefan Dirsch 2006-02-24 17:37:50 UTC
Boot into runlevel 3. Use startx to start your Xsession, finish your Xsession, and tell us what exactly happenend. Attach /etc/X11/xorg.conf and /var/log/Xorg.0.log*.
Comment 26 Jon Nelson 2006-02-24 18:30:38 UTC
Created attachment 70239 [details]
Xorg.0.log
Comment 27 Jon Nelson 2006-02-24 18:31:04 UTC
Created attachment 70240 [details]
xorg.conf
Comment 28 Jon Nelson 2006-02-24 18:31:34 UTC
Created attachment 70241 [details]
'startx' run in a 'script' session
Comment 29 Stefan Dirsch 2006-02-24 18:44:10 UTC
And now tell us what happened exactly.
Comment 30 Jon Nelson 2006-02-24 21:03:06 UTC
D'oh!

I started X which started a gnome session.
Then I logged out of the session.
That's it.
I got back to my console, and except for all of the messages on the screen (which I provided in attachment 70241 [details]), nothing out of the ordinary happened.
Comment 31 Stefan Dirsch 2006-02-24 21:24:07 UTC
Thanks. This means that the Xserver is *not* responsible for the lockup.
Comment 32 Jon Nelson 2006-02-26 01:46:23 UTC
I just wiped that machine clean and did a beta5 install.
This time I chose KDE as the desktop.
I am not experiencing the problem with kdm and a kde desktop.
I am installing the full gnome system as well to try various combinations of xdm, kdm, and gdm combined with fvwm, kde, and gnome desktops.

I have a strong suspicion that it's not the display managers but the desktop environment.
Comment 33 Jon Nelson 2006-02-26 02:36:39 UTC
More data:

kdm + kde - no observed problems
kdm + gnome - no observed problems
kdm + fvwm2 - no observed problems
gdm + kde - no observed problems
gdm + gnome - no observed problems
gdm + fvwm2 - no observed problems

Additional note: the behavior that I see when I type 'init 1', which is what appears to be a held-down enter key, continues. Furthermore, if I hit enter twice, once to enter the command and once extra, I get the weird hanging behavior again - this time however (beta5) it looks as though X is incompletely shut down - I get a corrupted display and my 'console' takes up the upper left 1/4 of the screen, and it's green.  I can still type but only for a few more seconds and then it's hung as before.

One other weird thing:

the normal sysvinit shutdown messages and all that are on tty7 - the same terminal that X runs on normally.

Comment 34 Michael Gross 2006-02-27 16:31:42 UTC
So then it must be problem with sysvinit, Werner: Please have a look at this.
Comment 35 Dr. Werner Fink 2006-02-27 16:50:50 UTC
This has nothing todo with sysvinit.  The only thing what happens
with calling `init 1' is removing all running jobs and services and
after this the last thing what happens is a `init s' from the
boot script /etc/init.d/single .  And this works for me, after
`init 1' the system is awaiting the root password for maintenance
on /dev/console.
Comment 36 Michael Gross 2006-02-27 17:04:16 UTC
This seems to be a problem with your hardware. Please provide more information about your machine (`hwinfo'). Can you boot into single user mode (cold boot, apply S to your boot parameters). Have you tried this on other machines or does it fail only on this one?
Comment 37 Dr. Werner Fink 2006-02-27 17:24:25 UTC
I'd like also to know if there are any problems with hal, dbus, and/or
hotplug on this system.  And please choose an appropiate CPU type from
the list below.  Currently this is other.
Comment 38 Jon Nelson 2006-02-28 03:46:26 UTC
Created attachment 70563 [details]
hwinfo all
Comment 39 Jon Nelson 2006-02-28 04:28:19 UTC
Werner Fink: hal, dbus, etc... all seems to be just fine. I have a strong suspicion now that it has /something/ to do with the video driver - since beta3 I've had weird problems and all seem to be centering around the video.  In beta3 I had to remove the vga= option in grub to get the machine to boot non-glacially, and I had lots of video problems with beta3 and beta4 both.

Sadly, for now I can only try things on this one machine.  Booting into any user mode works great.

I tried an experiment: I removed the vga= parameter from the grub menu and rebooted. I now have no problems with init 1 or init pick-your-FAVORITE-runlevel. 


The vga parameter was 0x317
I suspect this to be an evil interaction with the framebuffer and X.
Comment 40 Dr. Werner Fink 2006-02-28 10:25:43 UTC
OK this looks like a hardware problem on this system with a Mobile
INTEL CPU.  Maybe the kernel people know more about such systems.
Comment 41 Olaf Kirch 2006-03-01 11:42:32 UTC
This sounds like the splash screen at shutdown steps on the X server's feet.
Note that the video card is a GeForce, and X11 uses the nv driver not
the framebuffer.

It seems that /etc/rc.d/rc1.d/S12splash should wait for X to release the
device.

Jon, in order to test this hypothesis, could you please re-enable the
vga=0x317 parameter in your grub config and uninstall the bootsplash
package instead?
Comment 42 Jon Nelson 2006-03-03 03:28:28 UTC
I have done a complete wipe and reinstall of beta 5. This is what I am experiencing:

if I type 'init 1' and DO NOT touch the keyboard again, everything works as expected.  If I hit the keyboard, it appears as though it leaves X because I see the gdm login screen again, shortly before the display is corrupted.

Furthermore, I performed a small experiment - it turns out that the machine is not actually hung - it is not network accessible, but control-alt-delete does work. Additionally, although the display is completely garbled, I assumed it was in fact in runlevel 1, waiting for me to enter root's password - I did so. I typed reboot and the machine did reboot.  Therefore, I must conclude that the machine is in fact entering runlevel 1 but X is somehow incompletely shutting down, or perhaps being incompletely started up and then killed off.

At your suggn. I removed the bootsplash package, rebooted, and tried again.

There was no improvement. :-(
Comment 43 Olaf Kirch 2006-03-03 09:31:58 UTC
Alright, so the machine isn't hung at all, it's just the display that
gets badly garbled.

IOW this sounds more like an X11 than a kernel problem. Assigning back
to sndirsch
Comment 44 Stefan Dirsch 2006-03-03 09:45:13 UTC
Check if this also happens with fbdev. Reconfigure with SaX2.

1) reboot into runlevel 3
2) "sax2 -r -m 0=fbdev -a"
3) Try to reproduce this problem.
Comment 45 Jon Nelson 2006-03-05 02:36:18 UTC
I get:

SaX: initializing please wait...
SaX: your current configuration will not be read in

SaX: wrong module syntax...
SaX: syntax: -m CardNr=CardModule[,...]

(and yes I used zero not 'oh')

Comment 46 Stefan Dirsch 2006-03-05 09:40:24 UTC
Marcus? What's wrong with "sax2 -r -m 0=fbdev -a"?
Comment 47 Marcus Schaefer 2006-03-06 15:02:03 UTC
sax2 -r -m 0=fbdev -a
SaX: initializing please wait...
SaX: your current configuration will not be read in

SaX: access to your display has been granted
SaX: startup

SaX: creating config file please wait...

SaX: Automatic configuration is done
SaX: The file /etc/X11/xorg.conf has been written

---

works for me. would suggest to use latest beta
Comment 48 Stefan Dirsch 2006-03-06 15:38:47 UTC
Yes, I think you should try Beta6. Maybe then you can setup a fbdev X11 configuration so we know if it's related to the native video driver.
Comment 49 Jon Nelson 2006-03-09 03:17:40 UTC
I tried beta6 (wipe and install). Same problem. For the remainder of the tests, I used fbdev after setting it up with the above command (which worked this time).

The *first* time it worked - X was killed and I got back to the gdm login screen, which itself was killed a few seconds later. While in runlevel 1 I ran 'init 5' and logged back in via gdm.  This time it did /not/ exist X and I got "frozen machine" thing (which I no longer am convinced is /frozen/, per-se, but the keyboard is unresponsive and of course the network is down, so it's got much the same behavior).

I repeated the test after a reboot. After invoking 'init 1' and pressing enter again in rapid succession, this time X exited quickly and stayed that way. 

How X behaves seems to depend at least partly on how quily after I enter 'init 1' that I hit enter again.  Has anybody else been able to reproduce the behavior?  Perhaps hitting 'enter' twice in rapid succession, once to enter the 'init 1' commandline and once again would do the trick?
Comment 50 Stefan Dirsch 2006-03-18 22:14:58 UTC
IMHO we won't be able to fix/investigate this issue in time for 10.1 release. Therefore I propose to test/investigate (if required) this with SuSE > 10.1 Beta again. ==> LATER
Comment 51 Stefan Dirsch 2006-07-21 15:53:03 UTC
This should be retested with SUSE 10.2 Alpha3.
Comment 52 Stefan Dirsch 2006-07-21 15:53:46 UTC
NEEDINFO.
Comment 53 Stefan Dirsch 2006-10-27 15:26:33 UTC
openSUSE 10.2 Beta1 has just been released.
Comment 54 Jon Nelson 2006-11-28 02:27:01 UTC
This is still a problem for OpenSUSE 10.2 RC1.

Comment 55 Stefan Dirsch 2006-12-21 14:26:34 UTC
Ok. This issue seems to be a very strange one (init 1 + immediately pressing the Return key crashes the machine !?!). I suggest to check the latest
driver from time to time by updating xorg-x11-server/xorg-x11-driver-video
packages from the xorg73 project.

  http://software.opensuse.org/download/xorg73/openSUSE_10.2/

Closing as LATER.
Comment 56 Stefan Dirsch 2007-08-09 19:27:15 UTC
Reopen.
Comment 57 Stefan Dirsch 2007-08-09 19:43:01 UTC
Any improvements with openSUSE 10.3 Beta1?
Comment 58 Stephan Kulow 2007-10-01 14:41:11 UTC
If you plan to test, please say so. Otherwise we'll have to close the bug
Comment 59 Jon Nelson 2007-10-01 15:12:11 UTC
I shall try to get to the test tonight - I was going to do it last night (really!) but I simply ran out of time.
Comment 60 Jon Nelson 2007-10-09 22:55:25 UTC
Clarify - make sure to read the comments - it does not crash the machine, only the display.  Also, it still happens with opensuse 10.3 GM.

This particular test machine (although it happens/happened to several) has an nvidia "go" chipset - I am *not* using the NVIDIA-supplied binary driver, just the free (xorg) driver.
Comment 61 Stefan Dirsch 2007-10-10 18:14:32 UTC
Could you attach /etc/X11/xorg.conf and /var/log/Xorg.0.log of openSUSE 10.3?
Thanks.
Comment 62 Jon Nelson 2007-10-11 00:08:18 UTC
Created attachment 177547 [details]
xorg.conf from 10.3
Comment 63 Jon Nelson 2007-10-11 00:08:48 UTC
Created attachment 177548 [details]
Xorg.0.log from 10.3
Comment 64 Stefan Dirsch 2007-10-11 00:55:41 UTC
I would like to know if this issue is related to nv driver at all. Could you create a fbdev configuration by running "sax2 -r -m 0=fbdev" and try again?
Comment 65 Jon Nelson 2007-10-11 01:13:51 UTC
Give me 10 minutes...
Comment 66 Jon Nelson 2007-10-11 01:38:00 UTC
After I switched to using fbdev:

I tried things twice.

first time, I used 'su -' to become root in a konsole.
I typed: 'init 1' and the konsole behaved as though I had held down the enter key.
I pressed the enter key *once*.
Shortly thereafter X exited and, apparently, tried to log me back in (X was *killed* and kdm restarted!!)
Eventually, I got the expected login prompt (console)
I logged in at runlevel 1. No display corruption!

Then I ran 'init 5' to bring X back up, which happened fine.

Then I tried again, but this time I did *not* hit the enter key a second time.
I eventually saw a login prompt, but the keyboard was unresponsive. Caps lock, num lock, etc... failed to light up when pressed, and so on. No disk activity on power button indicating possible shutdown, and inserting and removing PCMCIA cards also did nothing (this was expected).

I had to kill the power and reboot it.

And now I'm here!
Comment 67 Stefan Dirsch 2007-10-11 01:50:50 UTC
Ok. So the issue is not related to a special driver.
Comment 68 Stefan Dirsch 2007-10-11 02:11:04 UTC
Which laptop is this? This issue is so strange. I think it can only be fixed if we have such a laptop here for testing.
Comment 69 Stefan Dirsch 2007-10-16 07:33:20 UTC
Not sure when we'll see a response here. Please reopen once you can provide
the requested information.