Bug 73092

Summary: machine hard lockup when starting X
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Andreas Klein <asklein>
Component: KernelAssignee: Stefan Fent <stefan.fent>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: eich, sndirsch, vetter
Version: RC 1   
Target Milestone: ---   
Hardware: x86-64   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: lspci
hwinfo
serial console log of startup and crash
Xorg.0.log
xorg start-up of Radeon X600 PCIe on Asus nforce4 board.
64bit version of CVS radeon driver

Description Andreas Klein 2005-03-16 17:56:45 UTC
The machine works fine, in runlevel 3.
Say X and the machine locks up hard immediately.
I logged the complete startup and crash via serial console.
The crash isn't really readable, maybe the memory gets corrupted too fast.
I will also attach complete hwinfo output
Comment 1 Andreas Klein 2005-03-16 17:57:14 UTC
Created attachment 31874 [details]
lspci
Comment 2 Andreas Klein 2005-03-16 17:57:32 UTC
Created attachment 31875 [details]
hwinfo
Comment 3 Andreas Klein 2005-03-16 17:58:19 UTC
Created attachment 31876 [details]
serial console log of startup and crash
Comment 4 Stefan Fent 2005-03-21 12:49:22 UTC
can you try with pci=nommconf as kernel parameter?
Comment 5 Andreas Vetter 2005-03-21 15:10:34 UTC
Does not help.
Comment 6 Andreas Vetter 2005-03-21 15:49:41 UTC
I added 
  Option "XaaNoScreenToScreenCopy"
to xorg.conf (cf. bug 66744), and system does not crash, but screen goes blank
and nothing happens - X does not work with that option either.
Comment 7 Stefan Dirsch 2005-03-21 15:54:21 UTC
Ok. Try "noaccel" instead, although I don't think that this will help ...
Comment 8 Andreas Vetter 2005-03-21 16:29:11 UTC
"noaccel" gives the same as "XaaNoScreenToScreenCopy".
Output switches from Digital to Analogue and the monitor behaves like frequency
is out of sync range. Here is the Modes section;
Section "Modes"
  Identifier   "Modes[0]"
  Modeline      "1600x1200" 140.00 1600 1632 2024 2052 1200 1200 1208 1216
  Modeline      "1600x1200" 173.38 1600 1712 1888 2176 1200 1201 1204 1245
  Modeline      "1280x1024" 134.72 1280 1368 1504 1728 1024 1025 1028 1068
  Modeline      "1024x768" 79.52 1024 1080 1192 1360 768 769 772 801
  Modeline      "800x600" 47.53 800 840 920 1040 600 601 604 626
  Modeline      "640x480" 29.84 640 664 728 816 480 481 484 501
EndSection
The same section works with a ATI Radeon 9000 AGP card in another AMD64 machine.
Comment 9 Andreas Vetter 2005-03-21 16:48:07 UTC
Created attachment 32475 [details]
Xorg.0.log

Is this the reason for the problem?

(WW) RADEON(0): Failed to set up write-combining range (0xd0000000,0x8000000)
Comment 10 Stefan Dirsch 2005-03-21 16:55:29 UTC
Unlikely.
Comment 11 Egbert Eich 2005-03-22 15:22:12 UTC
This needs to be passed to ATi. Without a card I will be unable to reproduce this.
This is PCIe.
Comment 12 Stefan Dirsch 2005-03-23 11:51:57 UTC
This might be a x86_64 specific problem. We don't have this problem with our
PCIe X600 boards on i915 (IA32).
Comment 13 Stefan Dirsch 2005-03-26 10:48:38 UTC
Stefan F., I can give you my PCIe X600 board for testing on your AMD64 nForce4 
SLI board. Just let me know. 
Comment 14 Stefan Fent 2005-03-29 15:45:55 UTC
Thanks, Stefan D.
The X600 works on an ASUS A8N SLI board.
Comment 15 Stefan Dirsch 2005-03-30 10:31:38 UTC
Stefan F./Philipp, could you please attach the X.Org logfile, so we have this
issue at least somewhat documented? Thanks.
Comment 16 Philipp Thomas 2005-03-30 11:00:04 UTC
Here's the log. 
Comment 17 Stefan Dirsch 2005-03-30 11:02:21 UTC
Hmm ... where is it?
Comment 18 Philipp Thomas 2005-03-30 11:05:08 UTC
Attached to this bug? 
 
Comment 19 Philipp Thomas 2005-03-30 11:07:44 UTC
Something went wrong, so I'm trying again ... 
 
Comment 20 Philipp Thomas 2005-03-30 11:09:01 UTC
Created attachment 32919 [details]
xorg start-up of Radeon X600 PCIe on Asus nforce4 board.
Comment 21 Stefan Dirsch 2005-03-30 12:24:38 UTC
The first difference one can see is that the reporter uses a SMP kernel whereas
we use a UP kernel. Philipp, could you please try again with a the SMP kernel? I'll
give you another X600 board with exactly the same device ID, before you can do this.
Comment 22 Stefan Dirsch 2005-03-30 13:20:08 UTC
The problem is not reproducable with the SMP kernel and the new X600 board. BTW,
I mixed up the device IDs of the chips. The new X600 has the same device ID as the 
one we tested before (3E50). :-(

Probably we need to test this on a EM64T machine (SMP) with PCIe (16x) slots. If
we find one ...
Comment 23 Stefan Dirsch 2005-04-15 17:05:27 UTC
> Probably we need to test this on a EM64T machine (SMP) with PCIe (16x)   
> slots. If we find one ...  
  
This matches the HP xw8200 I have here for testing and I cannot reproduce this 
problem on this machine as well ==> WORKSFORME. 
Comment 24 Andreas Vetter 2005-09-08 19:47:33 UTC
still the same problem in 10.0 RC1.
machine locks up, when using radeon driver suggested by yast.
I installed fglrx 8.16.20, and it works without 3D, because the kernel module
does not compile.
Any chance to get this investigated again?
Comment 25 Andreas Vetter 2005-09-08 20:34:18 UTC
The card also does not work in a Intel D915PBL Mainboard with radeon. Whereas a
X300SE works in the D915PBL.
Comment 26 Matthias Hopf 2005-09-13 16:19:49 UTC
The current CVS driver from bug #115283 has several issues fixed. You can
download it in compressed form from:

https://bugzilla.novell.com/attachment.cgi?id=49796

Please test this driver (replace the one in /usr/X11R6/lib/modules/drivers/
after backing it up).
Comment 27 Andreas Vetter 2005-09-15 07:26:39 UTC
Can I have a 64 Bit version, please.

vetter@beder:~> file radeon_drv.o
radeon_drv.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not
stripped
Comment 28 Matthias Hopf 2005-09-15 13:21:58 UTC
Created attachment 50017 [details]
64bit version of CVS radeon driver

Err.... yes, of course.
Here it is.
Comment 29 Matthias Hopf 2005-09-15 13:25:21 UTC
Of course you will have to replace the driver in
/usr/X11R6/lib64/modules/drivers/ (not /usr/X11R6/lib/).
Comment 30 Andreas Vetter 2005-09-21 09:31:50 UTC
Sorry, the machine has some problems with memory and/or CPU, which are currently
investigated by the dealer. Maybe this fixes the lockup.

The behaviour was in 10.0beta and rc1  better, because it does not lockup
immediately, but first shows a black screen with the blinking cursor. So it
might be the same problem as in bug 115283
Comment 31 Matthias Hopf 2005-09-30 12:34:51 UTC
For the moment I close this as INVALID (due to broken RAM or CPU).

If that can be ruled out, feel free to reopen the bug.

The black screen with the blinking cursor is actually just the virtual terminal
#7 to which the Xserver switches before initializing the graphics mode.