Bug 129724

Summary: SuSE 10 install intermittent hang on servers with ATI Radeon 7000 video.
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Rod Macdonald <rod.macdonald>
Component: X.OrgAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED FIXED QA Contact: Stefan Dirsch <sndirsch>
Severity: Normal    
Priority: P5 - None CC: eich, mtippett, snwint
Version: unspecified   
Target Milestone: ---   
Hardware: x86   
OS: SuSE Linux 10.0   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: .pdf advisory from ATI describing this hardware hang

Description Rod Macdonald 2005-10-20 14:22:03 UTC
Two customers have reported a problem to ATI where SuSE 10 (and possibly 9.2)
will hang during install as soon as the "Welcome" screen is displayed.  ATI has
determined from PCI bus trace data that this hang is due to a hardware problem
in the video processor.  

This report is being opened to address the possibility of changing the install
software so as not to cause the hardware hang.  The problem impacts servers from
a variety of manufacturers which use the Radeon 7000 video processor.  This
includes more than 30% of x86 servers being manufactured today.

A product advisory has been published by ATI describing the problem. 
Essentially the R7000 can hang if all of the following conditions exist:

1. The Radeon 7000 is on a PCI bus.

2. The software application uses VESA mode, or a 132 column VGA mode.  I.E.:
   - Video Register CRTC_EXT_CNTL, field VGA_ATI_LINEAR = 1
   - or Register CRTC_EXT_CNTL, field VGA_Text_132 = 1

3. Other traffic on the PCI bus appears before a specific sequence of memory reads.

4. Video memory is accessed on byte boundaries (as opposed to word or dword
boundaries). The reads do not have to be 1 byte in length they just have to
occur on the boundaries shown above which are not dword aligned.

Example, if the PCI bus traffic for video was as follows:
  
  Byte read from address A30D0
  Any size read for another device on the same bus at any address. 
  Byte read from address A30D1
  Byte read from address A30D2
  Byte read from address A30D3
  Byte read from address AFFF0 

The final read (to an address that is not the next in sequence) will hang the
Radeon 7000 and stop communication on the PCI bus leading to system failure.

The problem can occur in other patterns as well.  There are 12 possibilities.  

In the list below “xxx1” means a byte read from an address ending in "1" and
"jump" means a read from an address not the next in sequence:  

1. xxx1, xxx2, xxx3, xxx4, xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump
2. xxx1, xxx2, xxx3, xxx4, xxx5, xxx6, xxx7, jump
3. xxx1, xxx2, xxx3, jump
4. xxx3, xxx4, xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump
5. xxx3, xxx4, xxx5, xxx6, xxx7, jump
6. xxx3, jump
7. xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump
8. xxx5, xxx6, xxx7, jump
9. xxx7, xxx8, xxx9, xxxa, xxxb, jump
10. xxx7, jump
11. xxx9, xxxa, xxxb, jump
12. xxxb, jump
Comment 1 Ladislav Slezák 2005-10-21 10:15:44 UTC
Changed component to X.Org
Comment 2 Rod Macdonald 2005-10-26 16:38:38 UTC
Created attachment 55586 [details]
.pdf advisory from ATI describing this hardware hang
Comment 3 Stefan Dirsch 2005-10-26 18:30:15 UTC
Good questions by Jon chaplick (ATI developer):

Which OEM(s) are impacted by the aformentioned bug?

You mentioned in the bug report that the hang occurs during install.
Have you been able to get the Xorg.0.log or xorg.conf files? If not do
we know if this is the VESA driver or the RADEON driver expeariencing
this hang (I cant seem to tell from the bug report).
Comment 4 Stefan Dirsch 2005-10-26 18:35:48 UTC
IIRC the "Welcome" screen is the first screen you can see during the installation (graphical grub?). No Xserver would be involved here. Not
even the kernel framebuffer? Steffen?
Comment 5 Steffen Winterfeldt 2005-10-27 09:12:46 UTC
The problem is known (see bug 81046 comment 20). A workaround has been
implemented in 10.1 & sles9 sp3 in the boot loader (doing only aligned
dword reads).
Comment 6 Stefan Dirsch 2005-10-27 21:27:53 UTC
But this needs to be fixed in the radeon driver as well, right?
Comment 7 Rod Macdonald 2005-10-28 15:58:16 UTC
Although radeon driver can set VGA_ATI_LINEAR=1, it doesn't really
read/write from/to the system VGA aperture (0xAxxxx). So I don't think
this bug actually affects radeon driver.  Also we have not had a report of this failure occurring when the Radeon driver is loaded.

The generic VESA driver might be a problem if it uses byte accesses.
Comment 8 Stefan Dirsch 2005-10-28 16:11:06 UTC
Ok. With "generic VESA driver" you mean "vesa" or "fbdev" driver?
Comment 9 Stefan Dirsch 2005-11-10 15:40:53 UTC
Rod, could you try to answer my question above, please?
Comment 10 Rod Macdonald 2005-11-10 22:13:54 UTC
Stefan,  Is the fbdev driver the Radeon frame buffer driver?  I'm afraid I don't know if that driver uses VESA modes.  If it does by setting field VGA_ATI_LINEAR = 1 or field VGA_Text_132 = 1, then there is the possibility a Server running SuSE Linux will hang.  Likewise for the "vesa".  When I refered to the generic driver I meant the one that loads when SuSE cannot definitively detect the video card type.  Such as when an ES1000 card is used with SuSE 8.

We have confirmed that the Welcome Screen hang does not occur in 10.1 & sles9 sp3.  Many thanks.
Comment 11 Stefan Dirsch 2005-11-10 22:50:50 UTC
fbdev driver is the generic framebuffer driver, which SUSE uses also for installation and also later when no native driver like radeon is available.
Comment 12 Stefan Dirsch 2005-11-14 11:40:07 UTC
> We have confirmed that the Welcome Screen hang does not occur in 10.1 &
> sles9 sp3.  Many thanks.

Ok. If you can verify that installation (which uses fbdev) works at well, we can be pretty sure, that fbdev is not affected as well.
Comment 13 Rod Macdonald 2005-11-14 19:09:15 UTC
We do not see other hangs during SLSE 9.3/10.1 install or when running X but that does not necessarily indicate that the potential for the hang is not present - the problem is intermittant and depends on a particular sequence of PCI activity which might not occur every time.  I've learned that the 
Radeonfb maintainer is  Benjamin Herrenschmidt [benh@kernel.crashing.org]. However I have yet to contact him.  We know the opensource Radeon X Server driver will not have this problem however I have no information about the default VESA driver.

A hang potential in the generic VESA driver could be serious given the high percetage of servers in the market that use the ATI Radeon 7000.
Comment 14 Stefan Dirsch 2005-11-14 21:38:08 UTC
Just to make sure since I have found references to CRTC_EXT_CNTL and  VGA_ATI_LINEAR in /usr/src/linux/drivers/video/radeonfb.c. SuSE does *not*
use any chipset specific framebuffer driver. We only use the generic kernel framebuffer.
Comment 16 Stefan Dirsch 2005-11-15 20:05:27 UTC
I think we can set this to NORMAL meanwhile.
Comment 17 Egbert Eich 2005-11-24 13:06:59 UTC
I don't think the Xserver is affected here as we don't do bytewise or unaligned accesses (which would have never worked on AXP). Also the fbdev driver doesn't use the VGA aperture while the vesa driver might - depending on the BIOS. Also the generic VGA driver does.
X therefore seems to be the wrong component for this.

Regardless of this I would like to use this opportunity to as if I can have a Radeon 7000 for my lab for testing as noone of the developers seems to have such an old card any more and we do receive bug reports for it.
Comment 19 Stefan Dirsch 2005-11-29 14:54:39 UTC
Since only the vesa and vga driver can be affected by this problem, which are never in use - neither during installation nor afterwareds - I close this now as fixed.
Comment 22 Dave Keck 2006-03-01 21:48:37 UTC
Our guys at AMD are needing the following:
===========================
Thanks David,
Not sure if this should get reopened, but we need to know if there is a workaround that can be applied to the system to overcome this issue.

Thanks Richard
===========================

Do you have any workarounds for this?

Thanks,  Dave