Bugzilla – Bug 129724
SuSE 10 install intermittent hang on servers with ATI Radeon 7000 video.
Last modified: 2006-03-01 21:48:37 UTC
Two customers have reported a problem to ATI where SuSE 10 (and possibly 9.2) will hang during install as soon as the "Welcome" screen is displayed. ATI has determined from PCI bus trace data that this hang is due to a hardware problem in the video processor. This report is being opened to address the possibility of changing the install software so as not to cause the hardware hang. The problem impacts servers from a variety of manufacturers which use the Radeon 7000 video processor. This includes more than 30% of x86 servers being manufactured today. A product advisory has been published by ATI describing the problem. Essentially the R7000 can hang if all of the following conditions exist: 1. The Radeon 7000 is on a PCI bus. 2. The software application uses VESA mode, or a 132 column VGA mode. I.E.: - Video Register CRTC_EXT_CNTL, field VGA_ATI_LINEAR = 1 - or Register CRTC_EXT_CNTL, field VGA_Text_132 = 1 3. Other traffic on the PCI bus appears before a specific sequence of memory reads. 4. Video memory is accessed on byte boundaries (as opposed to word or dword boundaries). The reads do not have to be 1 byte in length they just have to occur on the boundaries shown above which are not dword aligned. Example, if the PCI bus traffic for video was as follows: Byte read from address A30D0 Any size read for another device on the same bus at any address. Byte read from address A30D1 Byte read from address A30D2 Byte read from address A30D3 Byte read from address AFFF0 The final read (to an address that is not the next in sequence) will hang the Radeon 7000 and stop communication on the PCI bus leading to system failure. The problem can occur in other patterns as well. There are 12 possibilities. In the list below “xxx1” means a byte read from an address ending in "1" and "jump" means a read from an address not the next in sequence: 1. xxx1, xxx2, xxx3, xxx4, xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump 2. xxx1, xxx2, xxx3, xxx4, xxx5, xxx6, xxx7, jump 3. xxx1, xxx2, xxx3, jump 4. xxx3, xxx4, xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump 5. xxx3, xxx4, xxx5, xxx6, xxx7, jump 6. xxx3, jump 7. xxx5, xxx6, xxx7, xxx8, xxx9, xxxa, xxxb, jump 8. xxx5, xxx6, xxx7, jump 9. xxx7, xxx8, xxx9, xxxa, xxxb, jump 10. xxx7, jump 11. xxx9, xxxa, xxxb, jump 12. xxxb, jump
Changed component to X.Org
Created attachment 55586 [details] .pdf advisory from ATI describing this hardware hang
Good questions by Jon chaplick (ATI developer): Which OEM(s) are impacted by the aformentioned bug? You mentioned in the bug report that the hang occurs during install. Have you been able to get the Xorg.0.log or xorg.conf files? If not do we know if this is the VESA driver or the RADEON driver expeariencing this hang (I cant seem to tell from the bug report).
IIRC the "Welcome" screen is the first screen you can see during the installation (graphical grub?). No Xserver would be involved here. Not even the kernel framebuffer? Steffen?
The problem is known (see bug 81046 comment 20). A workaround has been implemented in 10.1 & sles9 sp3 in the boot loader (doing only aligned dword reads).
But this needs to be fixed in the radeon driver as well, right?
Although radeon driver can set VGA_ATI_LINEAR=1, it doesn't really read/write from/to the system VGA aperture (0xAxxxx). So I don't think this bug actually affects radeon driver. Also we have not had a report of this failure occurring when the Radeon driver is loaded. The generic VESA driver might be a problem if it uses byte accesses.
Ok. With "generic VESA driver" you mean "vesa" or "fbdev" driver?
Rod, could you try to answer my question above, please?
Stefan, Is the fbdev driver the Radeon frame buffer driver? I'm afraid I don't know if that driver uses VESA modes. If it does by setting field VGA_ATI_LINEAR = 1 or field VGA_Text_132 = 1, then there is the possibility a Server running SuSE Linux will hang. Likewise for the "vesa". When I refered to the generic driver I meant the one that loads when SuSE cannot definitively detect the video card type. Such as when an ES1000 card is used with SuSE 8. We have confirmed that the Welcome Screen hang does not occur in 10.1 & sles9 sp3. Many thanks.
fbdev driver is the generic framebuffer driver, which SUSE uses also for installation and also later when no native driver like radeon is available.
> We have confirmed that the Welcome Screen hang does not occur in 10.1 & > sles9 sp3. Many thanks. Ok. If you can verify that installation (which uses fbdev) works at well, we can be pretty sure, that fbdev is not affected as well.
We do not see other hangs during SLSE 9.3/10.1 install or when running X but that does not necessarily indicate that the potential for the hang is not present - the problem is intermittant and depends on a particular sequence of PCI activity which might not occur every time. I've learned that the Radeonfb maintainer is Benjamin Herrenschmidt [benh@kernel.crashing.org]. However I have yet to contact him. We know the opensource Radeon X Server driver will not have this problem however I have no information about the default VESA driver. A hang potential in the generic VESA driver could be serious given the high percetage of servers in the market that use the ATI Radeon 7000.
Just to make sure since I have found references to CRTC_EXT_CNTL and VGA_ATI_LINEAR in /usr/src/linux/drivers/video/radeonfb.c. SuSE does *not* use any chipset specific framebuffer driver. We only use the generic kernel framebuffer.
I think we can set this to NORMAL meanwhile.
I don't think the Xserver is affected here as we don't do bytewise or unaligned accesses (which would have never worked on AXP). Also the fbdev driver doesn't use the VGA aperture while the vesa driver might - depending on the BIOS. Also the generic VGA driver does. X therefore seems to be the wrong component for this. Regardless of this I would like to use this opportunity to as if I can have a Radeon 7000 for my lab for testing as noone of the developers seems to have such an old card any more and we do receive bug reports for it.
Since only the vesa and vga driver can be affected by this problem, which are never in use - neither during installation nor afterwareds - I close this now as fixed.
Our guys at AMD are needing the following: =========================== Thanks David, Not sure if this should get reopened, but we need to know if there is a workaround that can be applied to the system to overcome this issue. Thanks Richard =========================== Do you have any workarounds for this? Thanks, Dave