Bugzilla – Bug 116338
X crashes machine on stop
Last modified: 2005-10-11 12:24:10 UTC
When I stop X by logging out, or issueing init 3 in a console, the machine freezes. We had various versions: - Immediate reboot after logout (with autologin configured) - black screen with non-blinking cursor in the upper left after logging out (without autologin configured, but I'm not sure this is related) - after switching to console 1 and issueing init 3, freeze of the system (still pings, but ssh'ing in not possible) This happens on both machines we have for the openSUSE and SUSE Linux counters at BrainShare, as far as I can see they are similar, the graphics chipset used is Intel 915 G. What else do you need to debug this?
1) install the xorg-x11-server package from STABLE (RC2). A Xserver crash which could happen during a VT switch has been fixed after RC1. 2) after a freeze boot into runlevel 3 and attach /var/log/Xorg.0.log 3) attach /etc/X11/xorg.conf 4) attach the lines of /var/log/messages when the freeze happens (if there aren't any) For debugging I need access to this machine. Probably this won't be possible.
1) didn't help Everything that follows is with xorg-x11-server-6.8.2-99 installed. 2)-4) I'll attach now.
Created attachment 49535 [details] Xorg.0.log
Created attachment 49536 [details] /var/log/messages
Created attachment 49537 [details] xorg.conf
/var/log/messages: ------------------ Kernel oops related to agp use? Don't know why, I'm not a kernel expert. Try to disable intel-agp kernel module for now (add it to /etc/hotplug/blacklist) and reboot. There's no way to unload it since intel-agp and agpgart need each other (IMHO another bug). Xserver should also start without agpgart support. DRI won't work, but you didn't enable it anyway. Let me know, when my assumptions are wrong. :-) /etc/X11/xorg.conf: -------------------- An empty Modes section? Strange, but I don't think this is the cause of the problems we see here. Remove the 'UseModes "Modes[0]"' line in Monitor section to make sure the internal X modeline pool is used. /var/log/Xorg.0.log: -------------------- I couldn't find anything obvious wrong, but Matthias is our Intel expert.
Disabling the intel-agp module helps, thanks!
Could you please attach the new logfile?
Created attachment 49538 [details] /proc/cpuinfo of x86_64 box
Created attachment 49539 [details] output of hwinfo --all
Sorry, I've meant /var/log/Xorg.0.log. But it's interesting that it's a x86_64 SMP machine. I was not aware of any x86_64 machines with 915G gfx chip. I thought 945G would be the first one.
Created attachment 49541 [details] Xorg.0.log with intel-agp blacklisted
Ok. HW Cursor is disabled when using this workaround. Apart from this output looks ok. Can we lower the severity?
Setting to major. This problem should be debugged by the kernel team first (after 10.0 release) since it's related to agp. Therefore I assign this to the kernel component now.
I have informed Intel. They will try to reproduce.
Is this something for the release notes?
No, I don't think so. 915G SMP machines seem to be not that common.
SMP just means hyperthreading. Are you sure it is not so common?
Sorry, my fault. I've meant Intel 915 EM64T machines are not the common. At least I'm not aware of any machine here in Nürnberg. Some of our kernel developers in the Labs might have such a beast. Up to now I thought such machines simply don't exist ...
Dangerous assumption - note that Intel is shipping EM64T Celerons now. If the i915 is still shipping - and i think it is - you'll see a lot of EM64T capable boxes with that soon.
Looking at the whole bug Can someone connect a serial console and see if there is an oops or similar?
The oopses in /var/log/messages (already attached) don't reveal anything? I can try to hunt down a null-modem cable here but can't promise anything.
Ah missed those. No they're a reasonable start. Let's see.
Intel has been able to reproduce, but no luck finding the root cause. What is the status from our side?
Scott, Intel reproduced it on SL 10.0 or on SLES9 + their agp patch?
on SL10.0
Intel has not seen this issue on SLES9 + agp patch during their testing.
If this testing by Intel was done on the same hardware I would strongly vote for adding the agp patch for SLES9 now.
According to the test reports included with the source in bugzilla #114942, they tested 845G, 865G, 915G, 915GM, and 945G chipsets. So yes, they did test this same chipset along with others.
It's not specified, if they tested on 915G/i386 or 915G/x86_64. The problem occured only on 915G/x86_64.
It appears they only tested i386. They are testing x86-64 now, and will test with our betas. The developer for the intel agp driver has looked at this bug, and is not convinced that this is a graphics driver issue, rather some other kernel problem.
Sure, it's an agp issue. I'm not surprised that Intel did not test on x86_64 at all, since two additional patches (Bug #114942, comments #15+17) were required to get agp and DRI running on 945G/x86_64. When testing the betas Intel should know, that the agp patches (Bug #114942, comment #17+21) are not applied yet. So this needs to be done manually by Intel.
Update: We have found that an updated version of platform BIOS will eliminate the panic on our test platforms. My team tested Suse10 x86_64 and the backport on NLD9SP2 x86_64 on 915/945 em64t (915GAG/945GNT), after updating bios to latest version, kernel panic disappeared, and 2D/3D functions worked with no problem. Novell can get latest bios from: http://developer.intel.com/design/motherbd/genbios.htm. For 945GNT, the latest and workable version is NT94510J.86A.2104, and 915GAG is EV91510A.86A.0469.
Sonja, do we still have access to the orginal systems in this report to test the BIOS update?
No. they were sent back to whereever after BrainShare.
Ok. Let's close this one as FIXED assuming the BIOS update will resolve the issue since the machine is no longer available for testing anway and it's unknown where the machine has gone.
*** Bug 121875 has been marked as a duplicate of this bug. ***