Bug 113203 - Xorg freezes during boot with nv driver
Summary: Xorg freezes during boot with nv driver
Status: VERIFIED FIXED
: 66744 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: X.Org (show other bugs)
Version: Beta 3
Hardware: i386 SUSE Other
: P2 - High : Normal
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: Stefan Dirsch
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-26 02:12 UTC by Ryan Fitzgerald
Modified: 2005-10-10 14:13 UTC (History)
4 users (show)

See Also:
Found By: Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Xorg log file after x freeze (37.70 KB, text/x-log)
2005-08-26 08:45 UTC, Ryan Fitzgerald
Details
Xorg configuration file (6.53 KB, application/octet-stream)
2005-08-26 08:45 UTC, Ryan Fitzgerald
Details
Updated Xorg log (38.16 KB, text/x-log)
2005-08-26 09:32 UTC, Ryan Fitzgerald
Details
Patch from CVS mentioned above (7.31 KB, patch)
2005-09-15 20:20 UTC, Stefan Dirsch
Details | Diff
Patch from CVS mentioned above (2.24 KB, patch)
2005-09-23 20:38 UTC, Stefan Dirsch
Details | Diff
NVIDIA finally resolved the problem :-) (863 bytes, patch)
2005-09-28 12:52 UTC, Stefan Dirsch
Details | Diff
prevent_endless_loop (556 bytes, patch)
2005-09-30 07:52 UTC, Stefan Dirsch
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Fitzgerald 2005-08-26 02:12:06 UTC
After successful install of openSuse beta 3, the system crashes while trying to
boot into gnome.  My hardware is an Nvidia 6800 LE attached to a Dell 2005fpw
using dvi. I booted into rescue mode using the cd and changed the driver from
"nv" to vesa and everything boots up fine.
Comment 1 Stefan Dirsch 2005-08-26 08:13:01 UTC
Please let the system crash again and reboot into runlevel 3 afterwards. Then 
attach current /etc/X11/xorg.conf and /var/log/Xorg.0.log. Thanks. 
Comment 2 Ryan Fitzgerald 2005-08-26 08:45:01 UTC
Created attachment 47720 [details]
Xorg log file after x freeze
Comment 3 Ryan Fitzgerald 2005-08-26 08:45:43 UTC
Created attachment 47721 [details]
Xorg configuration file
Comment 4 Ryan Fitzgerald 2005-08-26 08:50:19 UTC
You can see that in the Xorg log file that the only error reported was about the
glx extension, so I removed glx module from being loaded in xorg.conf and
started X again, however it froze once again.  I checked the Xorg log file and
this time no error was reported, but the same warnings did still occur. If you
want me to upload that Xorg log file just let me know.
Comment 5 Stefan Dirsch 2005-08-26 09:18:16 UTC
You're mixing a nv/nvidia driver configuration. Please try to uninstall the
nvidia driver configuration first.

  nvidia-installer --uninstall
Comment 6 Ryan Fitzgerald 2005-08-26 09:32:26 UTC
Created attachment 47730 [details]
Updated Xorg log
Comment 7 Ryan Fitzgerald 2005-08-26 09:34:31 UTC
I ran the nvidia-installer --uninstall and everything went fine. I tried using
the nv driver and again X froze. I ran a diff between my xorg.conf files and
they were the same before and after the nvidia uninstall, however my Xorg.0.log
did change so I uploaded that.
Comment 8 Stefan Dirsch 2005-08-26 09:43:08 UTC
logfile looks ok now. nvidia driver installation has been uninstalled cleanly.
Please add
  
  Option "XaaNoScreenToScreenCopy"

to 'Section "Device"' of your /etc/X11/xorg.conf and try again. If this doesn't
help, try 

  Option "noaccel"

instead. If ""XaaNoScreenToScreenCopy" does help try also
  
 Option  "XaaNoPixmapCache"
 Option  "XaaNoOffScreenPixmaps"

Please report the results. Thanks.
Comment 9 Ryan Fitzgerald 2005-08-26 16:55:43 UTC
Using Option "XaaNoScreenToScreenCopy" worked and so did using Option "noaccel".
The only problem is that scrolling is unbearably slow when using either of these
options.  I also added options "XaaNoPixmapCache" and "XaaNoOffScreenPixmaps"
when using "XaaNoScreenToScreenCopy", but still scrolling was extremely slow.  I
did not notice any other problems other then that.
Comment 10 Stefan Dirsch 2005-08-26 17:03:22 UTC
Please try only 

 Option  "XaaNoPixmapCache"
 Option  "XaaNoOffScreenPixmaps"

If this also works it should be usable again.
Comment 11 Ryan Fitzgerald 2005-08-26 17:29:37 UTC
When using only "XaaNoPixmapCache" and "XaaNoOffScreenPixmaps" the system starts
to load up gnome and gets as far as displaying the desktop, but then it freezes.  
Comment 12 Stefan Dirsch 2005-08-26 17:32:36 UTC
Ouch. Probably the hit the livelock problem again. :-(
Comment 13 Lars Knoll 2005-08-29 11:23:40 UTC
I have experienced the same problem a couple of times with the NV driver. It 
seems to be a problem with using 24bit colordepth. When using 16bit the driver 
runs perfectly stable (even without disabling any Xaa stuff) for me. 
 
Comment 14 Matthias Hopf 2005-08-29 12:54:16 UTC
(In reply to comment #13)
> I have experienced the same problem a couple of times with the NV driver. It 

What do you mean with 'a couple of times'? With different graphics cards, or
with different SuSE versions? Do you remember when you hit it first?

BTW - one way to circumvent this is to use the binary NVidia drivers, they
usually work. You will want to use them anyway (otherwise a 6800 doesn't really
make sense ;)
Comment 15 Lars Knoll 2005-08-29 13:05:47 UTC
I don't remember exactly when I first saw it, as SuSE 9.3 defaults to 16bit 
colordepth and the first thing I did afterwards was to install the commercial 
driver. 
 
But the nv driver is unstable in 24 bit mode at least with my Geforce 6600. No 
idea how good or bad it works with other HW. I experienced the hanging quite 
often as I'm at the moment switching back and forth between nv (to add EXA 
support there) and the nvidia driver (for other work). 
 
And the binary driver doesn't help you at all as long as you don't ship it 
together with openSuSE (which you obviously can't) and your system hangs after 
the default install because of this. I know how to get around it, but Joe User 
probably won't. 
 
Comment 16 Matthias Hopf 2005-08-29 13:33:43 UTC
I know.
Unfortunately, this is a bug in the nv driver we thought present in some 6200
cards only, but it seems to be more widespread.

Could both of you send us the vendor/device ids, i.e. the output of
  hwinfo --gfxcard
Then we will add XaaNoScreenToScreenCopy to these cards.

I'm sorry that there is no better solution ATM, I'm already bugging NVidia for
driver improvement here, but their focus is on the binary driver, of course.

Ryan, can try run your Xserver with 16bit as well? Does it work without the
XaaNoScreenToScreenCopy as well? Then this would be a viable solution.
Comment 17 Lars Knoll 2005-08-29 13:46:07 UTC
dhcp234:~ # hwinfo --gfxcard 
23: PCI 100.0: 0300 VGA compatible controller (VGA) 
  [Created at pci.277] 
  UDI: /org/freedesktop/Hal/devices/pci_10de_140 
  Unique ID: VCu0._BdnBIAclaC 
  Parent ID: vSkL.akG_2l700s2 
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0 
  SysFS BusID: 0000:01:00.0 
  Hardware Class: graphics card 
  Model: "nVidia GeForce 6600 GT" 
  Vendor: pci 0x10de "nVidia Corporation" 
  Device: pci 0x0140 "GeForce 6600 GT" 
  Revision: 0xa2 
  Driver: "nvidiafb" 
  Memory Range: 0xd0000000-0xd3ffffff (rw,non-prefetchable) 
  Memory Range: 0xc8000000-0xcfffffff (rw,prefetchable) 
  Memory Range: 0xd4000000-0xd4ffffff (rw,non-prefetchable) 
  Memory Range: 0x40000000-0x4001ffff (ro,prefetchable,disabled) 
  IRQ: 137 (371025 events) 
  I/O Ports: 0x3c0-0x3df (rw) 
  Module Alias: "pci:v000010DEd00000140sv00000000sd00000000bc03sc00i00" 
  Driver Info #0: 
    XFree86 v4 Server Module: nv 
    XF86Config Entry: Option  "XaaNoPixmapCache"\nOption  
"XaaNoOffScreenPixmaps" 
  Driver Info #1: 
    XFree86 v4 Server Module: nvidia 
    3D Support: yes 
  Config Status: cfg=new, avail=yes, need=no, active=unknown 
  Attached to: #10 (PCI bridge) 
 
Primary display adapter: #23 
 
Cheers, 
Lars 
 
Comment 18 Matthias Hopf 2005-08-29 14:16:46 UTC
16bit doesn't work for the card we have for reproduction. :-(

This seems like a hardware race condition to me, which is obviously more often
triggered when using 24bit.
Comment 19 Ryan Fitzgerald 2005-08-29 19:33:39 UTC
23: PCI 100.0: 0300 VGA compatible controller (VGA)
  [Created at pci.277]
  UDI: /org/freedesktop/Hal/devices/pci_10de_42
  Unique ID: VCu0.HI5+P2cWJE8
  Parent ID: vSkL.K3WJKbXW3V7
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
  SysFS BusID: 0000:01:00.0
  Hardware Class: graphics card
  Model: "LeadTek GeForce 6800 LE"
  Vendor: pci 0x10de "nVidia Corporation"
  Device: pci 0x0042 "GeForce 6800 LE"
  SubVendor: pci 0x107d "LeadTek Research Inc."
  SubDevice: pci 0x299b
  Revision: 0xa1
  Memory Range: 0xd5000000-0xd5ffffff (rw,non-prefetchable)
  Memory Range: 0xd8000000-0xdfffffff (rw,prefetchable)
  Memory Range: 0xd4000000-0xd4ffffff (rw,non-prefetchable)
  Memory Range: 0xd7f00000-0xd7f1ffff (ro,prefetchable,disabled)
  IRQ: 185 (4 events)
  I/O Ports: 0x3c0-0x3df (rw)
  Module Alias: "pci:v000010DEd00000042sv0000107Dsd0000299Bbc03sc00i00"
  Driver Info #0:
    XFree86 v4 Server Module: nv
  Driver Info #1:
    XFree86 v4 Server Module: nvidia
    3D Support: yes
  Config Status: cfg=yes, avail=yes, need=yes, active=unknown
  Attached to: #9 (PCI bridge)

Primary display adapter: #23
Comment 20 Stefan Dirsch 2005-08-29 21:11:06 UTC
I've added XaaNoScreenToScreenCopy now to both gfx boards. I'm afraid that the 
Open Source nv driver is getting more and more unusable. At least for 6x00 
boards. :-( 
Comment 21 Lars Knoll 2005-08-30 06:28:44 UTC
Sorry, but without ScreenToScreen copy the driver is more or less unusable, as 
reads from the framebuffer on NV hardware are way too slow (I get max 2 MB/sec 
on my hardware). So e.g moving windows is unbearable without an accelerated 
copy. It would be a lot better to use the ShadowFB option for the HW that 
makes problems. 
 
Comment 22 Matthias Hopf 2005-08-30 09:31:43 UTC
That actually works for the broken card we have here as well!
Thanks, Lars.

Yet another option I havn't though of WRT this bug...

Stefan, please change database entries to
Option "ShadowFB" "on"
Comment 23 Egbert Eich 2005-08-30 09:46:29 UTC
Shadow FB is completely unaccelerated. So it is conceivable that it works.
Why don't you just send me a card with which I can reproduce this?
Comment 24 Matthias Hopf 2005-08-30 10:03:44 UTC
But it is not clear that it is *that* much faster than just disabling
ScreenToScreenCopy.
We needed the card to reproduce several issues, I think Stefan can now send the
card to you.
Comment 25 Stefan Dirsch 2005-08-30 10:51:08 UTC
I'll send this card to Egbert today. Since it's unclear whether we'll get a fix
in time I'll change XaaNoScreenToScreenCopy to ShadwoFB for now.
Comment 26 Lars Knoll 2005-08-30 10:53:17 UTC
Trust me, as long as you're not using a Pentium 100 it's a lot faster. When  
you get max 2.5 MB/sec in framebuffer read speed a simple calculation shows  
that moving a 200x200 pixel window takes about 60ms. If you have a 500x500  
window, you're up at 400ms. That's unusable. Using a shadowFB, the blit is in 
main memory using memcpy which has a bandwidth of more than 1GB/s on my 
hardware. After that you need to write the damaged region to the framebuffer 
(these are rather fast). Alltogether that takes probably 5ms to complete for 
the 500x500 window.  
  
I tried using shadowFB on my HW and with the current state the driver is in  
it's by far the fastest option if you have a halfway modern CPU. On a P4 or  
similar you can even run a composition manager on top and keep a usable  
desktop.   
 
Comment 27 Matthias Hopf 2005-08-30 11:05:01 UTC
That's strange, because I got 7 MB/s framebuffer read speed in userspace on a
TNT2 over AGP... Back in those old days...
I thought that readback is much faster in PCIe than in AGP.
Comment 28 Lars Knoll 2005-08-30 12:27:06 UTC
The 2.5 MB are measured with an AGP card. But even with my PCIe card I don't 
get more than 7.5 MB/s. Maybe the kernel support for PCIe is still lacking 
something? 
 
Comment 29 Stefan Dirsch 2005-08-30 14:33:34 UTC
Egbert will investigate this.
Comment 30 Stefan Dirsch 2005-09-06 21:09:39 UTC
This will definitely be investigated by nvidia. Workaround for now was to set 
"ShadowFB" for the affected chipsets we could test and which are affected by 
this. The problem is also mentioned in the Release Notes. Setting to Normal. 
Comment 31 Matthias Hopf 2005-09-07 13:31:53 UTC
Tracked on developer.nvidia.com now as #187822.
Comment 32 Matthias Hopf 2005-09-07 13:33:56 UTC
*** Bug 66744 has been marked as a duplicate of this bug. ***
Comment 33 Stefan Dirsch 2005-09-14 03:42:23 UTC
Looks related. 
 
Date: Tue, 13 Sep 2005 19:28:04 -0700 (PDT) 
From: Mark Vojkovich <mvojkovi@XFree86.Org> 
To: cvs-commit@xfree86.org 
Subject: CVS Update: xc (branch: trunk) 
 
CVSROOT:        /home/x-cvs 
Module name:    xc 
Changes by:     mvojkovi@public.xfree86.org.    05/09/13 19:28:03 
 
Log message: 
  Fix a potential problem with pixmap cache corruption on GeForce 6xxx 
  and 7xxx parts. 
 
Modified files: 
      xc/programs/Xserver/hw/xfree86/drivers/nv/: 
        nv_driver.c nv_hw.c nv_setup.c 
 
  Revision      Changes    Path 
  1.137         +19 -7   xc/programs/Xserver/hw/xfree86/drivers/nv/nv_driver.c 
  1.16          +39 -12  xc/programs/Xserver/hw/xfree86/drivers/nv/nv_hw.c 
  1.48          +1 -2    xc/programs/Xserver/hw/xfree86/drivers/nv/nv_setup.c 
 
Comment 34 Stefan Dirsch 2005-09-15 14:52:53 UTC
eich > It will at least fix the pixmap cache problem. 
eich > The lockup problem still remains to be looked at. 
 
Comment 35 Stefan Dirsch 2005-09-15 20:20:02 UTC
Created attachment 50076 [details]
Patch from CVS mentioned above
Comment 36 Stefan Dirsch 2005-09-23 20:33:30 UTC
This one looks interesting: 
 
Date: Thu, 22 Sep 2005 13:34:42 -0700 (PDT) 
From: Mark Vojkovich <mvojkovi@XFree86.Org> 
To: cvs-commit@xfree86.org 
Subject: CVS Update: xc (branch: trunk) 
 
CVSROOT:        /home/x-cvs 
Module name:    xc 
Changes by:     mvojkovi@public.xfree86.org.    05/09/22 13:34:42 
 
Log message: 
    Fix possible cause of some acceleration instability on some GeForce6xxx 
  parts. 
 
Modified files: 
      xc/programs/Xserver/hw/xfree86/drivers/nv/: 
        nv_hw.c 
 
  Revision      Changes    Path 
  1.17          +14 -4     xc/programs/Xserver/hw/xfree86/drivers/nv/nv_hw.c 
 
Comment 37 Stefan Dirsch 2005-09-23 20:38:48 UTC
Created attachment 50764 [details]
Patch from CVS mentioned above
Comment 38 Stefan Dirsch 2005-09-27 13:08:10 UTC
WRT comment #36: Unfortunately it's unrelated. :-(

aritger: "We found this change while debugging the hang problem, but this fix 
          does not solve the problem in our tests."
Comment 39 Stefan Dirsch 2005-09-28 12:52:18 UTC
Created attachment 51051 [details]
NVIDIA finally resolved the problem :-)
Comment 40 Stefan Dirsch 2005-09-28 13:37:57 UTC
xorg-x11 package with all patches applied submitted to STABLE and 10.0. I'll
make a YOU update after testing.
Comment 41 Stefan Dirsch 2005-09-28 17:44:07 UTC
See also: 
 
--> https://bugs.freedesktop.org/show_bug.cgi?id=3333 
 
BTW, a new NVIDIA developer, who's working on the Open Source driver? 
Comment 42 Stefan Dirsch 2005-09-30 07:52:23 UTC
Created attachment 51219 [details]
prevent_endless_loop

From X.Org CVS, committed by Aaaron Plattner:

* Don't hang if j is zero.  This should never happen, but it's better to be
safe than sorry.
Comment 43 Stefan Dirsch 2005-09-30 07:52:54 UTC
above patch applied for 10.0.
Comment 44 Stefan Dirsch 2005-09-30 12:35:55 UTC
Reopen for aquiring a SWAMPID.
Comment 45 Stefan Dirsch 2005-09-30 12:36:35 UTC
Andreas, could you create a SWAMP entry for this?

Description: 
- fixes "nv" video driver freeze and acceleration problems (#113203)
Comment 46 Andreas Jaeger 2005-09-30 14:04:07 UTC
Please fix this together with Bug# 114490.
Comment 47 Stefan Dirsch 2005-09-30 14:12:52 UTC
Ok.
Comment 48 Stefan Dirsch 2005-09-30 14:36:55 UTC
me> Ich wollte ASAP das YOU nv Treiber Update für die 10.0 machen. Dafür
me> hätte ich das aber gerne erst getestet. Es ist uns gerade aufgefallen,
me> dass Du inzwischen alle 6200 Karten hast, bei denen diese beiden
me> Probleme (a) Livelock, b) korrupter Pixmap-Cache) auftreten. Also
me> entweder Du müsstest das testen oder Du schickst uns die Karten
me> zurück. 

egbert> Ich hatte den pixmap Patch bereits vor einiger Zeit mit einer Karte,
egbert> auf der er auftrat, getestet, er hatte funktioniert.
egbert> Heute habe ich noch mal mit Josephine's Karte getestet. Der neue
egbert> patch hat das Problem geloest.
egbert> Ich denke, das sollte an Tests reichen.
Comment 49 Anja Stock 2005-10-10 14:13:27 UTC
released