Bug 116338 - X crashes machine on stop
Summary: X crashes machine on stop
Status: RESOLVED FIXED
: 121875 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: RC 1
Hardware: Other All
: P5 - None : Major
Target Milestone: ---
Assignee: Andreas Kleen
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 114942
  Show dependency treegraph
 
Reported: 2005-09-10 17:15 UTC by Sonja Krause-Harder
Modified: 2005-10-11 12:24 UTC (History)
7 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Xorg.0.log (38.10 KB, text/plain)
2005-09-11 10:10 UTC, Sonja Krause-Harder
Details
/var/log/messages (189.75 KB, text/plain)
2005-09-11 10:11 UTC, Sonja Krause-Harder
Details
xorg.conf (4.11 KB, text/plain)
2005-09-11 10:13 UTC, Sonja Krause-Harder
Details
/proc/cpuinfo of x86_64 box (1.18 KB, text/plain)
2005-09-11 11:06 UTC, Sonja Krause-Harder
Details
output of hwinfo --all (35.84 KB, text/plain)
2005-09-11 11:08 UTC, Sonja Krause-Harder
Details
Xorg.0.log with intel-agp blacklisted (34.20 KB, text/plain)
2005-09-11 11:37 UTC, Sonja Krause-Harder
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sonja Krause-Harder 2005-09-10 17:15:07 UTC
When I stop X by logging out, or issueing init 3 in a console, the machine 
freezes. We had various versions: 
 
- Immediate reboot after logout (with autologin configured) 
- black screen with non-blinking cursor in the upper left after logging out 
(without autologin configured, but I'm not sure this is related) 
- after switching to console 1 and issueing init 3, freeze of the system 
(still pings, but ssh'ing in not possible) 
 
This happens on both machines we have for the openSUSE and SUSE Linux counters 
at BrainShare, as far as I can see they are similar, the graphics chipset used  
is Intel 915 G. 
 
What else do you need to debug this?
Comment 1 Stefan Dirsch 2005-09-10 17:31:37 UTC
1) install the xorg-x11-server package from STABLE (RC2). A Xserver crash 
which could happen during a VT switch has been fixed after RC1. 
2) after a freeze boot into runlevel 3 and attach /var/log/Xorg.0.log 
3) attach /etc/X11/xorg.conf 
4) attach the lines of /var/log/messages when the freeze happens (if there 
   aren't any) 
 
For debugging I need access to this machine. Probably this won't be possible. 
Comment 2 Sonja Krause-Harder 2005-09-11 10:09:34 UTC
1) didn't help   
   
Everything that follows is with xorg-x11-server-6.8.2-99 installed.  
 
2)-4) I'll attach now. 
Comment 3 Sonja Krause-Harder 2005-09-11 10:10:23 UTC
Created attachment 49535 [details]
Xorg.0.log
Comment 4 Sonja Krause-Harder 2005-09-11 10:11:08 UTC
Created attachment 49536 [details]
/var/log/messages
Comment 5 Sonja Krause-Harder 2005-09-11 10:13:23 UTC
Created attachment 49537 [details]
xorg.conf
Comment 7 Stefan Dirsch 2005-09-11 10:34:29 UTC
/var/log/messages:    
------------------   
Kernel oops related to agp use? Don't know why, I'm not a kernel expert. Try   
to disable intel-agp kernel module for now (add it to /etc/hotplug/blacklist)   
and reboot. There's no way to unload it since intel-agp and agpgart need each   
other (IMHO another bug). Xserver should also start without agpgart support.   
DRI won't work, but you didn't enable it anyway. Let me know, when my 
assumptions are wrong. :-) 
   
/etc/X11/xorg.conf:   
--------------------   
An empty Modes section? Strange, but I don't think this is the cause of the   
problems we see here. Remove the 'UseModes "Modes[0]"' line in Monitor section  
to make sure the internal X modeline pool is used.  
  
/var/log/Xorg.0.log:  
--------------------  
I couldn't find anything obvious wrong, but Matthias is our Intel expert. 
Comment 8 Sonja Krause-Harder 2005-09-11 10:54:22 UTC
Disabling the intel-agp module helps, thanks! 
Comment 9 Stefan Dirsch 2005-09-11 10:58:30 UTC
Could you please attach the new logfile? 
Comment 10 Sonja Krause-Harder 2005-09-11 11:06:34 UTC
Created attachment 49538 [details]
/proc/cpuinfo of x86_64 box
Comment 11 Sonja Krause-Harder 2005-09-11 11:08:54 UTC
Created attachment 49539 [details]
output of hwinfo --all
Comment 12 Stefan Dirsch 2005-09-11 11:11:31 UTC
Sorry, I've meant /var/log/Xorg.0.log. But it's interesting that it's a x86_64 
SMP machine. I was not aware of any x86_64 machines with 915G gfx chip. I 
thought 945G would be the first one. 
Comment 13 Sonja Krause-Harder 2005-09-11 11:37:02 UTC
Created attachment 49541 [details]
Xorg.0.log with intel-agp blacklisted
Comment 14 Stefan Dirsch 2005-09-11 12:13:44 UTC
Ok. HW Cursor is disabled when using this workaround. Apart from this output 
looks ok. Can we lower the severity? 
Comment 15 Stefan Dirsch 2005-09-12 10:42:36 UTC
Setting to major. This problem should be debugged by the kernel team first
(after 10.0 release) since it's related to agp. Therefore I assign this to the
kernel component now.
Comment 16 Scott Bahling 2005-09-12 18:04:16 UTC
I have informed Intel. They will try to reproduce.
Comment 17 Andreas Jaeger 2005-09-13 11:44:47 UTC
Is this something for the release notes?
Comment 18 Stefan Dirsch 2005-09-13 12:29:22 UTC
No, I don't think so. 915G SMP machines seem to be not that common.
Comment 19 Scott Bahling 2005-09-13 14:04:53 UTC
SMP just means hyperthreading. Are you sure it is not so common?
Comment 20 Stefan Dirsch 2005-09-13 14:13:16 UTC
Sorry, my fault. I've meant Intel 915 EM64T machines are not the common. At
least I'm not aware of any machine here in Nürnberg. Some of our kernel
developers in the Labs might have such a beast. Up to now I thought such
machines simply don't exist ...
Comment 21 Andreas Kleen 2005-09-13 14:19:14 UTC
Dangerous assumption - note that Intel is shipping EM64T Celerons now.
If the i915 is still shipping - and i think it is - you'll see a lot of EM64T
capable boxes with that soon.

Comment 22 Andreas Kleen 2005-09-13 14:20:55 UTC
Looking at the whole bug 

Can someone connect a serial console and see if there is an oops or similar?
Comment 23 Sonja Krause-Harder 2005-09-13 17:05:40 UTC
The oopses in /var/log/messages (already attached) don't reveal anything? I can
try to hunt down a null-modem cable here but can't promise anything.
Comment 24 Andreas Kleen 2005-09-14 07:33:51 UTC
Ah missed those. No they're a reasonable start. Let's see.
Comment 25 Scott Bahling 2005-09-23 09:59:07 UTC
Intel has been able to reproduce, but no luck finding the root cause. What is
the status from our side?
Comment 26 Stefan Dirsch 2005-09-24 18:07:59 UTC
Scott, Intel reproduced it on SL 10.0 or on SLES9 + their agp patch? 
Comment 27 Scott Bahling 2005-09-26 06:50:06 UTC
on SL10.0
Comment 28 Scott Bahling 2005-09-26 06:51:43 UTC
Intel has not seen this issue on SLES9 + agp patch during their testing.
Comment 29 Stefan Dirsch 2005-09-26 07:02:32 UTC
If this testing by Intel was done on the same hardware I would strongly vote 
for adding the agp patch for SLES9 now. 
Comment 30 Scott Bahling 2005-09-26 12:31:29 UTC
According to the test reports included with the source in bugzilla #114942, they
tested 845G, 865G, 915G, 915GM, and 945G chipsets. So yes, they did test this
same chipset along with others.
Comment 31 Stefan Dirsch 2005-09-26 13:11:18 UTC
It's not specified, if they tested on 915G/i386 or 915G/x86_64. The problem
occured only on 915G/x86_64.
Comment 32 Scott Bahling 2005-09-27 08:39:35 UTC
It appears they only tested i386. They are testing x86-64 now, and will test
with our betas. 

The developer for the intel agp driver has looked at this bug, and is not
convinced that this is a graphics driver issue, rather some other kernel problem.
Comment 33 Stefan Dirsch 2005-09-27 09:39:25 UTC
Sure, it's an agp issue. I'm not surprised that Intel did not test on x86_64 at
all, since two additional patches (Bug #114942, comments #15+17) were required
to get agp and DRI running on 945G/x86_64. When testing the betas Intel should
know, that the agp patches (Bug #114942, comment #17+21) are not applied yet. So
this needs to be done manually by Intel.
Comment 34 Scott Bahling 2005-09-28 15:46:25 UTC
Update:  We have found that an updated version of platform BIOS will eliminate 
the panic on our test platforms.

My team tested Suse10 x86_64 and the backport on NLD9SP2 x86_64 on 915/945 
em64t (915GAG/945GNT), after updating bios to latest version, kernel panic 
disappeared, and 2D/3D functions worked with no problem. 

Novell can get latest bios from:  
http://developer.intel.com/design/motherbd/genbios.htm. For 945GNT, the latest 
and workable version is NT94510J.86A.2104, and 915GAG is EV91510A.86A.0469.
Comment 35 Scott Bahling 2005-09-28 15:48:17 UTC
Sonja, do we still have access to the orginal systems in this report to test the
BIOS update?
Comment 36 Sonja Krause-Harder 2005-09-29 08:18:19 UTC
No. they were sent back to whereever after BrainShare.
Comment 37 Stefan Dirsch 2005-09-29 15:57:48 UTC
Ok. Let's close this one as FIXED assuming the BIOS update will resolve the
issue since the machine is no longer available for testing anway and it's
unknown where the machine has gone.
Comment 38 Stefan Dirsch 2005-10-11 12:24:10 UTC
*** Bug 121875 has been marked as a duplicate of this bug. ***