|
Bugzilla – Full Text Bug Listing |
| Summary: | Switching to runlevel 1 confuses X | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Jon Nelson <jnelson-suse> |
| Component: | X.Org | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED NORESPONSE | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | dkukawka, suse-beta, werner |
| Version: | RC 1 | ||
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
/var/log/messages
Xorg.0.log xorg.conf 'startx' run in a 'script' session hwinfo all xorg.conf from 10.3 Xorg.0.log from 10.3 |
||
|
Description
Jon Nelson
2006-02-19 16:49:55 UTC
How hard is the machine locked really? - can you ping it? - can you login using ssh? - does ctrl-alt-F1 work? Besides that, it would be interesting to run sleep 10 ; init 1 and switch to tty10 (ctrl-alt-F10) quickly. Are there more error messages? It locks almost instantly sometimes (more on that in a moment). It does not respond to pings, ssh, the keyboard is dead. It's dead Jim. I notice that when I type 'init 1' and then hit enter, it's almost as if I hold the enter key down. If I actually hit enter (again), the machine locks that much sooner. There are no more error messages. I'll give your sleep 10; init 1 test a try. Gave it a try - sleep 10; init 1 in a logged in (graphical) does *not* hang the machine. OK then... Jon: Attach the last 500 lines of your syslog here. All of the messages which might be relavant are already attached. I did not remove any lines. The line that follows 'syslog-ng version 1.6.8 going down' is syslog-ng coming back up several minutes later due my reseting the box. I ran a recursive grep in /var/log for 'Feb 19 10:33' and that's all that is there. Can you reproduce this? The problem might be located somewhere you wouldn't expect or could be a result from another problem or condition. So please attach the complete log here. Created attachment 69411 [details]
/var/log/messages
Jon: Before we reassign this. Have you tried booting with ACPI=off or a combination of the options for the safe settings? Does the machine still lock then? I have not tried with ACPI=off. I'll try to give that a try tonight (it'll be about 11 hours from now). Also try if the lockup occurs when entering other runlevels (0 and 6 especially). I'm just taking a guess, but is it possible that everything else is working just fine *except* that X is not being killed? Feb 19 10:33:19 linux gdm[3004]: Error reinitilizing server And the display also does not change from my graphical login. I posit that everything else worked. This seems like a gdm/X/gnome problem. I'll try all of the above tonight (ACPI=off, try runlevels 0 and 6 and others). Starting from an xterm, *none* of init 2, 3, or 4 seem to make /any/ difference. The disk flashes a bit, another xterm with tail -F /var/log/messages shows nothing special, but X does /not/ exit. I can continue to type and do things in my xterm. init 6 does reboot the machine - X does *not* exit until the "reboot" actually takes place, however my console stops taking my input almost immediately - what I see is that I type 'init 6', press enter, and then even though I've let go of the enter key, it *behaves* as though I've held it down. If I hit the enter key even just once, the keyboard stops taking input alltogether. init 0 also shuts the machine off with much the same behavior as above. ACPI=off had no effect whatsoever. As an added test, I replaced gdm with xdm in /etc/sysconfig/displaymanager Upon init 1, X immediately exits. However, it returns me to the login screen for xdm! Then the display is corrupted and the keyboard doesn't work. For some reason, X (xdm/gdm/whatever) is being *restarted* instead of *stopped*. > what I see is that I type 'init 6', press enter, and then even though I've let > go of the enter key, it *behaves* as though I've held it down. I already filed a report about this and I am beginning to suspect that these problems might be connected. Have a look at bug #152522 - although I must mention that on my system I did not watch such a behaviour. This might nontheless be a problem with sysvinit itself. Steffen: Can you provide a comment here? Steffen --> snwint Sorry, Stefan, I ment you, this might be an X-realted problem. Is there something known, this could also be connected to the logout-problem somehow. Maby X does not handle SIGTERM correctly? I propose to switch to runlevel 3 first. If this crashes your machine we can talk about an X11 related issue. As I said in comment #12, I can type 'init 3' in an xterm and X does NOT exit, and in fact it's almost as if nothing happened. I can still use the xterm to do things, but it is clearly in runlevel 3. Jon... please create a backtrace of the init-call - this might give us the required clue here. To be strictly clear, do you mean an strace? The only way I know how to generate a backtrace is with gdb. How about something like 'strace -o foo -tt -f -s 256 init 1' It should do fine. init:
3281 12:19:18.228284 execve("/sbin/init", ["init", "1"], [/* 79 vars */]) = 0
3281 12:19:18.229417 uname({sys="Linux", node="linux", ...}) = 0
3281 12:19:18.229796 brk(0) = 0x80c4000
3281 12:19:18.229970 brk(0x80c4c90) = 0x80c4c90
3281 12:19:18.230145 set_thread_area({entry_number:-1 -> 6, base_addr:0x80c4830, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_no
t_present:0, useable:1}) = 0
3281 12:19:18.230502 brk(0x80e5c90) = 0x80e5c90
3281 12:19:18.230684 brk(0x80e6000) = 0x80e6000
3281 12:19:18.230882 umask(022) = 022
3281 12:19:18.231071 geteuid32() = 0
3281 12:19:18.231234 getpid() = 3281
3281 12:19:18.231415 rt_sigaction(SIGSTOP, {SIG_IGN}, NULL, 8) = -1 EINVAL (Invalid argument)
3281 12:19:18.231637 rt_sigaction(SIGTERM, {SIG_IGN}, NULL, 8) = 0
3281 12:19:18.231817 rt_sigaction(SIGALRM, {0x8048250, [], 0}, NULL, 8) = 0
3281 12:19:18.232037 alarm(3) = 0
3281 12:19:18.232217 open("/dev/initctl", O_WRONLY|O_LARGEFILE) = 3
3281 12:19:18.232455 write(3, "i\31\t\3\1\0\0\0001\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
3281 12:19:18.270479 close(3) = 0
3281 12:19:18.270704 alarm(0) = 3
3281 12:19:18.270895 exit_group(0) = ?
and for good measure (lot of good it'll do me), xdm:
2887 12:19:03.579575 rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
This shows that xdm never goes any signal or anything.
You mean that xdm does not follow/handle the signal or that is does not get one, cannot tell as you left out the argument to rt_sigsuspend. (In reply to comment #22) > You mean that xdm does not follow/handle the signal or that is does not get > one, cannot tell as you left out the argument to rt_sigsuspend. That is the entire output from strage -o xdm.trace -f -s 256 -tt -p PID_OF_XDM "I" did not leave the argument out, strace did. ;-) This is either a problem with init or with X11, but I think it's X11 as init works well for other processes. I'm adding Werner into CC for a comment. Boot into runlevel 3. Use startx to start your Xsession, finish your Xsession, and tell us what exactly happenend. Attach /etc/X11/xorg.conf and /var/log/Xorg.0.log*. Created attachment 70239 [details]
Xorg.0.log
Created attachment 70240 [details]
xorg.conf
Created attachment 70241 [details]
'startx' run in a 'script' session
And now tell us what happened exactly. D'oh!
I started X which started a gnome session.
Then I logged out of the session.
That's it.
I got back to my console, and except for all of the messages on the screen (which I provided in attachment 70241 [details]), nothing out of the ordinary happened.
Thanks. This means that the Xserver is *not* responsible for the lockup. I just wiped that machine clean and did a beta5 install. This time I chose KDE as the desktop. I am not experiencing the problem with kdm and a kde desktop. I am installing the full gnome system as well to try various combinations of xdm, kdm, and gdm combined with fvwm, kde, and gnome desktops. I have a strong suspicion that it's not the display managers but the desktop environment. More data: kdm + kde - no observed problems kdm + gnome - no observed problems kdm + fvwm2 - no observed problems gdm + kde - no observed problems gdm + gnome - no observed problems gdm + fvwm2 - no observed problems Additional note: the behavior that I see when I type 'init 1', which is what appears to be a held-down enter key, continues. Furthermore, if I hit enter twice, once to enter the command and once extra, I get the weird hanging behavior again - this time however (beta5) it looks as though X is incompletely shut down - I get a corrupted display and my 'console' takes up the upper left 1/4 of the screen, and it's green. I can still type but only for a few more seconds and then it's hung as before. One other weird thing: the normal sysvinit shutdown messages and all that are on tty7 - the same terminal that X runs on normally. So then it must be problem with sysvinit, Werner: Please have a look at this. This has nothing todo with sysvinit. The only thing what happens with calling `init 1' is removing all running jobs and services and after this the last thing what happens is a `init s' from the boot script /etc/init.d/single . And this works for me, after `init 1' the system is awaiting the root password for maintenance on /dev/console. This seems to be a problem with your hardware. Please provide more information about your machine (`hwinfo'). Can you boot into single user mode (cold boot, apply S to your boot parameters). Have you tried this on other machines or does it fail only on this one? I'd like also to know if there are any problems with hal, dbus, and/or hotplug on this system. And please choose an appropiate CPU type from the list below. Currently this is other. Created attachment 70563 [details]
hwinfo all
Werner Fink: hal, dbus, etc... all seems to be just fine. I have a strong suspicion now that it has /something/ to do with the video driver - since beta3 I've had weird problems and all seem to be centering around the video. In beta3 I had to remove the vga= option in grub to get the machine to boot non-glacially, and I had lots of video problems with beta3 and beta4 both. Sadly, for now I can only try things on this one machine. Booting into any user mode works great. I tried an experiment: I removed the vga= parameter from the grub menu and rebooted. I now have no problems with init 1 or init pick-your-FAVORITE-runlevel. The vga parameter was 0x317 I suspect this to be an evil interaction with the framebuffer and X. OK this looks like a hardware problem on this system with a Mobile INTEL CPU. Maybe the kernel people know more about such systems. This sounds like the splash screen at shutdown steps on the X server's feet. Note that the video card is a GeForce, and X11 uses the nv driver not the framebuffer. It seems that /etc/rc.d/rc1.d/S12splash should wait for X to release the device. Jon, in order to test this hypothesis, could you please re-enable the vga=0x317 parameter in your grub config and uninstall the bootsplash package instead? I have done a complete wipe and reinstall of beta 5. This is what I am experiencing: if I type 'init 1' and DO NOT touch the keyboard again, everything works as expected. If I hit the keyboard, it appears as though it leaves X because I see the gdm login screen again, shortly before the display is corrupted. Furthermore, I performed a small experiment - it turns out that the machine is not actually hung - it is not network accessible, but control-alt-delete does work. Additionally, although the display is completely garbled, I assumed it was in fact in runlevel 1, waiting for me to enter root's password - I did so. I typed reboot and the machine did reboot. Therefore, I must conclude that the machine is in fact entering runlevel 1 but X is somehow incompletely shutting down, or perhaps being incompletely started up and then killed off. At your suggn. I removed the bootsplash package, rebooted, and tried again. There was no improvement. :-( Alright, so the machine isn't hung at all, it's just the display that gets badly garbled. IOW this sounds more like an X11 than a kernel problem. Assigning back to sndirsch Check if this also happens with fbdev. Reconfigure with SaX2. 1) reboot into runlevel 3 2) "sax2 -r -m 0=fbdev -a" 3) Try to reproduce this problem. I get: SaX: initializing please wait... SaX: your current configuration will not be read in SaX: wrong module syntax... SaX: syntax: -m CardNr=CardModule[,...] (and yes I used zero not 'oh') Marcus? What's wrong with "sax2 -r -m 0=fbdev -a"? sax2 -r -m 0=fbdev -a SaX: initializing please wait... SaX: your current configuration will not be read in SaX: access to your display has been granted SaX: startup SaX: creating config file please wait... SaX: Automatic configuration is done SaX: The file /etc/X11/xorg.conf has been written --- works for me. would suggest to use latest beta Yes, I think you should try Beta6. Maybe then you can setup a fbdev X11 configuration so we know if it's related to the native video driver. I tried beta6 (wipe and install). Same problem. For the remainder of the tests, I used fbdev after setting it up with the above command (which worked this time). The *first* time it worked - X was killed and I got back to the gdm login screen, which itself was killed a few seconds later. While in runlevel 1 I ran 'init 5' and logged back in via gdm. This time it did /not/ exist X and I got "frozen machine" thing (which I no longer am convinced is /frozen/, per-se, but the keyboard is unresponsive and of course the network is down, so it's got much the same behavior). I repeated the test after a reboot. After invoking 'init 1' and pressing enter again in rapid succession, this time X exited quickly and stayed that way. How X behaves seems to depend at least partly on how quily after I enter 'init 1' that I hit enter again. Has anybody else been able to reproduce the behavior? Perhaps hitting 'enter' twice in rapid succession, once to enter the 'init 1' commandline and once again would do the trick? IMHO we won't be able to fix/investigate this issue in time for 10.1 release. Therefore I propose to test/investigate (if required) this with SuSE > 10.1 Beta again. ==> LATER This should be retested with SUSE 10.2 Alpha3. NEEDINFO. openSUSE 10.2 Beta1 has just been released. This is still a problem for OpenSUSE 10.2 RC1. Ok. This issue seems to be a very strange one (init 1 + immediately pressing the Return key crashes the machine !?!). I suggest to check the latest driver from time to time by updating xorg-x11-server/xorg-x11-driver-video packages from the xorg73 project. http://software.opensuse.org/download/xorg73/openSUSE_10.2/ Closing as LATER. Reopen. Any improvements with openSUSE 10.3 Beta1? If you plan to test, please say so. Otherwise we'll have to close the bug I shall try to get to the test tonight - I was going to do it last night (really!) but I simply ran out of time. Clarify - make sure to read the comments - it does not crash the machine, only the display. Also, it still happens with opensuse 10.3 GM. This particular test machine (although it happens/happened to several) has an nvidia "go" chipset - I am *not* using the NVIDIA-supplied binary driver, just the free (xorg) driver. Could you attach /etc/X11/xorg.conf and /var/log/Xorg.0.log of openSUSE 10.3? Thanks. Created attachment 177547 [details]
xorg.conf from 10.3
Created attachment 177548 [details]
Xorg.0.log from 10.3
I would like to know if this issue is related to nv driver at all. Could you create a fbdev configuration by running "sax2 -r -m 0=fbdev" and try again? Give me 10 minutes... After I switched to using fbdev: I tried things twice. first time, I used 'su -' to become root in a konsole. I typed: 'init 1' and the konsole behaved as though I had held down the enter key. I pressed the enter key *once*. Shortly thereafter X exited and, apparently, tried to log me back in (X was *killed* and kdm restarted!!) Eventually, I got the expected login prompt (console) I logged in at runlevel 1. No display corruption! Then I ran 'init 5' to bring X back up, which happened fine. Then I tried again, but this time I did *not* hit the enter key a second time. I eventually saw a login prompt, but the keyboard was unresponsive. Caps lock, num lock, etc... failed to light up when pressed, and so on. No disk activity on power button indicating possible shutdown, and inserting and removing PCMCIA cards also did nothing (this was expected). I had to kill the power and reboot it. And now I'm here! Ok. So the issue is not related to a special driver. Which laptop is this? This issue is so strange. I think it can only be fixed if we have such a laptop here for testing. Not sure when we'll see a response here. Please reopen once you can provide the requested information. |