Bug 159962

Summary: i810/intel: I830WaitLpRing crash
Product: [openSUSE] openSUSE 10.3 Reporter: Christian Deckelmann <christian.deckelmann>
Component: X.OrgAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED WONTFIX QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Major    
Priority: P3 - Medium CC: eich, felix, forgotten_a4cKfOE_HD, forgotten_Drfk9mafMw, gernot, jfunk, kai, kent.liu, quanxian.wang, zhenyu.z.wang
Version: Beta 1   
Target Milestone: ---   
Hardware: 32bit   
OS: Other   
Whiteboard:
Found By: IS&T Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Xorg logfile
config file
xorg.conf for D620
output of hwinfo for D620
log of server during crash
Xorg7 logfile when it crashed
a part from messages related to the X failure
X log file from one of the crashes
Xorg log file with the i810 driver
Latest crash on 945G: scrolling in Konqueror
xorg.log
This is the log of todays endless restarting and crashing XServer on a MacBook
Xorg log

Description Christian Deckelmann 2006-03-22 08:51:09 UTC
I have random X crashing loops.
X is crashing, screen is getting black and the clock mouse pointer is shown for a few secs. Then screen is getting black again, clock mouse pointer is shown. Then screen is getting black... and so on.

I will attach the Xorg.log.
Comment 1 Christian Deckelmann 2006-03-22 08:52:10 UTC
Created attachment 74359 [details]
Xorg logfile
Comment 2 Stefan Dirsch 2006-03-22 09:38:28 UTC
Could you attach the config file as well? Thanks.
Comment 3 Christian Deckelmann 2006-03-22 09:52:12 UTC
Created attachment 74379 [details]
config file
Comment 4 Stefan Dirsch 2006-03-22 10:41:22 UTC
Intel 855 hardware. Still widely in use I'm afraid ...
Comment 5 Matthias Hopf 2006-03-22 11:03:32 UTC
I'm having these crashes since 10.0 after Suspend-To-Disk when opening a XVideo surface (ashamed to admit I didn't test 10.1b8 yet). After that, the Xserver crashes over and over again.
Is this the same scenario for you?
Comment 6 Christian Deckelmann 2006-03-22 11:21:06 UTC
I also have seen these crashes on 10.0. I don´t use Suspend-To-Disk, but use Suspend-To-Ram daily. On 10.0 I have seen this crashes not only but also when I started mplayer (which is using XVideo,right?). On 10.1b8 I had this crash only a few times yet. In no case mplayer was used. I chrashed when surfing with konqueror.


Comment 7 Stefan Dirsch 2006-03-22 11:30:11 UTC
So, do you need to use STR/STD on 10.1 to reproduce this problem? We will need to know if this is STR/STD related.
Comment 8 Christian Deckelmann 2006-03-22 12:24:22 UTC
Well, I can stop using STR and see if the crash happens again.
Shall I try that?

Comment 9 Matthias Hopf 2006-03-22 13:00:22 UTC
Yes, please!
This would help reduce the problem to a suspend issue - or show that this is a general intel driver issue.

(In reply to comment #6)
> started mplayer (which is using XVideo,right?). On 10.1b8 I had this crash only

Typically, yes.
Comment 10 Stefan Dirsch 2006-04-01 22:22:34 UTC
Any new results when no longer using STR?
Comment 11 Christian Deckelmann 2006-04-02 11:32:04 UTC
No crash since I stopped using STR.
Comment 12 Stefan Dirsch 2006-04-02 12:52:23 UTC
Thanks for testing!
Comment 13 Stefan Dirsch 2006-04-03 14:22:44 UTC
This should be tested again after switching to the new i810 driver (SUSE 10.2 Alpha). ==> LATER
Comment 14 Stefan Dirsch 2006-05-30 10:19:52 UTC
reopen
Comment 15 Stefan Dirsch 2006-05-30 10:20:19 UTC

*** This bug has been marked as a duplicate of 179773 ***
Comment 16 Petr Baudis 2006-08-09 22:32:13 UTC
This happens to me on my Thinkpad X41 every time I try to create an xv surface (yes, it's mplayer), even right after I boot up, no suspending involved. I have tried with the alternate driver of bug 179773 and it crashes as well, exactly the same way.

This is what gets dumped into the log:

Error in I830WaitLpRing(), now is 826668, start is 824667
pgetbl_ctl: 0x3ffc0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 1810000
LP ring tail: 71b0 head: 7184 len: 1f801 start 0
eir: 0 esr: 0 emr: ffff
instdone: ffc0 instpm: 0
memmode: 108 instps: f0000
hwstam: ffff ier: 0 imr: ffff iir: 0
space: 131020 wanted 131064

Fatal server error:
lockup
Comment 17 Petr Baudis 2006-08-09 22:51:25 UTC
Note that it is not 100% reproducible, only in 3/4 of the cases or so - sometimes it works fine.

Also it seems that I forgot to mention it in the previous message - of course I'm using SLES10.
Comment 18 Stefan Dirsch 2006-08-10 01:58:43 UTC
Could you test this with openSUSE Alpha3?
Comment 19 Petr Baudis 2006-08-10 12:04:05 UTC
It would be great if I could avoid reinstalling the system - can I just get the RPMs with the newer xorg version from somewhere and install them?
Comment 21 Petr Baudis 2006-08-10 14:19:17 UTC
The repository seems rather empty - I've tried to add it to yast as an installation source but when I filter by installation sources, no packages are shown.
Comment 22 Stefan Dirsch 2006-08-10 14:40:03 UTC
Indeed. We currently have some buildservice problems. Packages should be available again soon. Try again early next week.
Comment 23 Adam Spiers 2006-10-06 15:56:51 UTC
What's the latest on this?  I am also seeing this with xorg-x11-server-6.9.0-50.24 on an x86_64 install of SLES10 GMC.  The machine is a new Dell Latitude D620 with Intel Core 2 Duo.

lspci tells me:

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)

Here is the error in /var/log/Xorg.0.log.old:

Error in I830WaitLpRing(), now is 1954125, start is 1951905
pgetbl_ctl: 0x7ffc0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 0
LP ring tail: 15370 head: 15370 len: 1f001 start 0
eir: 0 esr: 1 emr: ffff
instdone: ffc0 instpm: 0
memmode: 306 instps: f0000
hwstam: fffe ier: 82 imr: 0 iir: 20
space: 131060 wanted 131064
(II) I810(0): [drm] removed 1 reserved context for kernel
(II) I810(0): [drm] unmapping 8192 bytes of SAREA 0x10009000 at 0x2add7fb35000

Fatal server error:
lockup


Please consult the The X.Org Foundation support
         at http://wiki.X.Org
 for help.
Please also check the log file at "/var/log/Xorg.0.log" for additional information.

(WW) I810(0): Successfully set original devices
(WW) I810(0): Setting the original video mode instead of restoring
        the saved state
(WW) I810(0): Extended BIOS function 0x5f05 failed.
(II) I810(0): BIOS call 0x5f05 not supported, setting refresh with VBE 3 method.
(II) I810(0): xf86UnbindGARTMemory: unbind key 7
(II) I810(0): xf86UnbindGARTMemory: unbind key 0
(II) I810(0): xf86UnbindGARTMemory: unbind key 1
(II) I810(0): xf86UnbindGARTMemory: unbind key 3
(II) I810(0): xf86UnbindGARTMemory: unbind key 2
(II) I810(0): xf86UnbindGARTMemory: unbind key 4
(II) I810(0): xf86UnbindGARTMemory: unbind key 5
(II) I810(0): xf86UnbindGARTMemory: unbind key 6
(WW) I810(0): Successfully set original devices (2)
Comment 24 Adam Spiers 2006-10-14 23:09:20 UTC
Created attachment 101504 [details]
xorg.conf for D620
Comment 25 Adam Spiers 2006-10-14 23:10:39 UTC
Created attachment 101505 [details]
output of hwinfo for D620
Comment 26 Adam Spiers 2006-10-14 23:13:32 UTC
Created attachment 101506 [details]
log of server during crash
Comment 27 Adam Spiers 2006-10-14 23:14:06 UTC
Please note that this crash happened without doing any STR.
Comment 28 Stefan Dirsch 2006-10-16 02:38:18 UTC
Still waiting for feedback by reporter and/or Petr when using X.Org 7 with latest i810 driver. Unfortunately this bugreport has been hijacked by another user meanwhile. :-(
Comment 29 Christian Deckelmann 2006-10-16 05:10:07 UTC
Oops. Seems I missed that. Sorry.
I just updated my SLED10 to X.Org 7 from buildservice.
Worked without problems.
As this crash happens only occasionally often others might try updating too.
Will report back if it is crashing again.
Comment 30 Christian Deckelmann 2006-10-18 20:56:23 UTC
It crashed again.
BTW: I am again using STR.
Comment 31 Christian Deckelmann 2006-10-18 20:57:58 UTC
Created attachment 101980 [details]
Xorg7 logfile when it crashed
Comment 32 Stefan Dirsch 2006-10-19 05:45:10 UTC
Thanks for verifying!
Comment 33 Adam Spiers 2006-10-27 11:45:14 UTC
How is this hijacking?  I am seeing exactly the same bug, so thought it would be helpful to provide you with more datapoints.  Would you prefer that I submit a separate bug and you resolve it as DUPLICATE?
Comment 34 Stefan Dirsch 2006-10-27 11:54:45 UTC
It's completely different hardware. The reporter used 855 whereas you're using 945GM.
Comment 35 Forgotten User ZhJd0F0L3x 2006-11-09 10:10:53 UTC
JFTR: i have now (stable ~oS10.2beta1) X server crashes with a "death loop" without having suspended. I don't have the logs anymore, but it looked similar, some i830 errors and then "Fatal: ... lockup"
This is an i855:
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

I will try the i810beta driver soon :-)
Comment 36 Forgotten User ZhJd0F0L3x 2006-12-06 16:06:40 UTC
the i810beta driver does not even give me an image on the LCD, so i cannot use it (or somebody needs to help me get it configured).

Yesterday, the machine went into the death loop again, this time when trying to start an openGL app (crack-attack), with this in syslog:

Dec  5 20:22:41 strolchi kernel: [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 18952666 emitted: 18952671
Comment 37 Forgotten User ZhJd0F0L3x 2006-12-06 18:23:44 UTC
...and again, this time while playing crack-attack:
Could not init font path element unix/:7100, removing from list!
Error in I830WaitLpRing(), now is 80958538, start is 80956537
pgetbl_ctl: 0x3fee0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 7f000029
LP ring tail: 160 head: 0 len: 1f001 start 0
eir: 0 esr: 1 emr: ffff
instdone: ffc1 instpm: 0
memmode: 108 instps: 20
hwstam: ffff ier: 22 imr: 9 iir: 0
space: 130712 wanted 131064
(II) I810(0): [drm] removed 1 reserved context for kernel
(II) I810(0): [drm] unmapping 8192 bytes of SAREA 0xf8e95000 at 0xb7a86000

Fatal server error:
lockup

(II) AIGLX: Suspending AIGLX clients for VT switch
Error in I830WaitLpRing(), now is 80960544, start is 80958543
pgetbl_ctl: 0x3fee0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 7f000029
LP ring tail: 168 head: 0 len: 1f001 start 0
eir: 0 esr: 1 emr: ffff
instdone: ffc1 instpm: 0
memmode: 108 instps: 20
hwstam: ffff ier: 22 imr: 9 iir: 0
space: 130704 wanted 131064

FatalError re-entered, aborting
lockup

This always means data loss, and the machine can not cleanly shut down (the console is completely messed up), so raising severity.
Comment 38 Stefan Dirsch 2006-12-21 14:31:34 UTC
Ok. This issue needs to get fixed upstream. I suggest to check the latest
driver from time to time by updating xorg-x11-server/xorg-x11-driver-video
packages from the xorg73 project.

  http://software.opensuse.org/download/xorg73/openSUSE_10.2/

Closing as LATER.
Comment 39 Stefan Dirsch 2007-01-09 17:19:18 UTC
Date: Tue, 9 Jan 2007 12:18:47 +0100
From: Matthias Hopf <mhopf@suse.de>
To: Stefan Dirsch <sndirsch@suse.de>
Subject: RE: X crashes when parsec runs on linux with drm/i915 and Mesa/i965_dri (fwd)

... might (!) be interesting ...

Matthias

--
Matthias Hopf <mhopf@suse.de>,  SuSE R&D,  Zimmer 3.2.06,  Tel. 74053-715

Subject: RE: X crashes when parsec runs on linux with drm/i915 and
        Mesa/i965_dri
Date: Tue, 9 Jan 2007 14:30:38 +0800
From: "Xiang, Haihao" <haihao.xiang@intel.com>
To: airlied@linux.ie
Cc: dri-devel@lists.sourceforge.net

I worked out a patch to make it wait longer under the worst situation. With this patch applied, parsec works fine for me.

Could anyone take a look at this patch?

Thanks
Haihao

________________________________

From: dri-devel-bounces@lists.sourceforge.net [mailto:dri-devel-bounces@lists.sourceforge.net] On Behalf Of Xiang, Haihao
Sent: 2007年1月4日 14:55
To: airlied@linux.ie
Cc: dri-devel@lists.sourceforge.net
Subject: X crashes when parsec runs on linux with drm/i915 and Mesa/i965_dri

Hi,

I run parsec with mode set to 1024x768 (see http://www.parsec.org <http://www.parsec.org/> ) on Linux with the latest Mesa/I965_dri and Drm/i915, but I965_dri always gets “intelWaitIrq:drmI830IrqWait: -16”, then X crashes, and gets the following error:

Error in I830WaitLpRing(), now is 6864091, start is 6862090

pgetbl_ctl: 0x7ff80001 pgetbl_err: 0x0

ipeir: 0 iphdr: 0

LP ring tail: cc80 head: ec98 len: 1f001 start 0

eir: 0 esr: 1 emr: ffdf

instdone: 0 instpm: 0

memmode: 0 instps: 0

hwstam: cffe ier: 22 imr: 0 iir: 1080

space: 8208 wanted 131064


Fatal server error:

Lockup

I looked into it and noticed that drm/i915 uses a magic number 10000 in the for_statement (see i915_wait_ring in i915_dma.c), and the driver doesn’t check the return value in macro BEGIN_LP_RING(in i915_drv.h). If i915_wait_ring returns error, BEGIN_LP_RING will destroy the entire ring buffer.

I guess 10000 are too small in the following case: There is a Batch Buffer command in the ring buffer, and the Batch Buffer(or Batch Buffer chaining) are enough long, and the CPU is enough fast. So I try 1000000 and it works well. Then I notice drm/i830 uses an OS-timer(see i830_wait_ring in i830_dma.c), and I use the same method in drm/i915 and It works well too.

So I think the current method in drm/i915 has some risk: It is hard to control the waiting time. EBUSY will happen however the ring buffer doesn’t stall. But I have not a good idea to fix this issue.

Could anyone give some comments?

Thank in advance
Haihao

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
Comment 41 Stefan Dirsch 2007-03-05 10:09:39 UTC
*** Bug 251166 has been marked as a duplicate of this bug. ***
Comment 42 Stefan Dirsch 2007-03-21 21:44:49 UTC
I don't expect this still to be fixed in "i810" driver, so better switch to new "intel" driver in xorg.conf for testing.
Comment 43 Stefan Dirsch 2007-04-18 03:25:09 UTC
*** Bug 265617 has been marked as a duplicate of this bug. ***
Comment 44 James Oakley 2007-05-28 22:32:10 UTC
What's the status of this? I'm on 10.3a4 with the "intel" driver and the crashes occur frequently enough to make it basically unusable. I'm only running konsole, konqueror, and akregator on this machine and it's a desktop, so there's no STD/STR involved.
Comment 45 Stefan Dirsch 2007-05-29 05:51:59 UTC
Which Intel chipset are you using?
Comment 46 James Oakley 2007-05-29 12:14:12 UTC
The chipset is a 945G on an Asus P5LD2-VM motherboard.
Comment 47 Wilken Gottwalt 2007-06-01 08:59:46 UTC
I've the same problem with a GMA3000 (965Q). I attach some log files.
Comment 48 Wilken Gottwalt 2007-06-01 09:01:09 UTC
Created attachment 143459 [details]
a part from messages related to the X failure
Comment 49 Wilken Gottwalt 2007-06-01 09:01:48 UTC
Created attachment 143460 [details]
X log file from one of the crashes
Comment 50 Wilken Gottwalt 2007-06-11 10:57:46 UTC
The bug can be reproduced by starting Blender, render some stuff to open the rendering window and moving around this rendering window while having Blender behind that window. Blender will show some artifacts and after some seconds (about 3-5) of moving around the window, everything is going to hang and the X server crashes and keeps crashing. The bug can be reproduced with the i810 and intel driver. The i810 driver needs some more time until the crash. It doesn't matter if you use a 16 or 32 bit screen. It also doesn't matter if AIGLX is active or not.
Comment 51 Wilken Gottwalt 2007-06-11 10:59:53 UTC
Created attachment 145295 [details]
Xorg log file with the i810 driver
Comment 52 Kent Liu 2007-06-14 13:01:37 UTC
(In reply to comment #50)
> The bug can be reproduced by starting Blender, render some stuff to open the
> rendering window and moving around this rendering window while having Blender
> behind that window. Blender will show some artifacts and after some seconds
> (about 3-5) of moving around the window, everything is going to hang and the X
> server crashes and keeps crashing. The bug can be reproduced with the i810 and
> intel driver. The i810 driver needs some more time until the crash. It doesn't
> matter if you use a 16 or 32 bit screen. It also doesn't matter if AIGLX is
> active or not.
> 

I am using a Thinkpad X60 (945GM) with SLE10SP1, and try to reproduce this bug. With Blender operations as you said, I tried about half an hour but failed to reproduce. 

Is there any other direct way to make it be reproduced?? Or maybe I have to change my laptop to try.
Comment 53 Wilken Gottwalt 2007-06-14 13:11:30 UTC
Well, I have a totally different video card, a GMA3000 (965Q). Maybe it can't be reproduced with older video hardware that easily. I used an openSUSE 10.2 with a Xorg 7.2/7.3 I update on a nearly daily basis.
Comment 54 James Oakley 2007-06-15 02:46:59 UTC
It certainly crashes easily on my older 945G. I'm not doing anything special, either. It always crashes for me when browsing in Konqueror. In fact, I just made it crash 3 three times in a row by simply mousewheel scrolling the Most Annoying Bugs page.

I'll attach the log from the latest crash.
Comment 55 James Oakley 2007-06-15 02:48:05 UTC
Created attachment 146425 [details]
Latest crash on 945G: scrolling in Konqueror
Comment 56 Stefan Dirsch 2007-07-19 17:24:55 UTC
*** Bug 293093 has been marked as a duplicate of this bug. ***
Comment 57 James Oakley 2007-07-19 17:44:09 UTC
Since upgrading to alpha5, I haven't had a single crash.

It looks like my problem has been fixed, at least.
Comment 58 Matthias Hopf 2007-07-19 19:22:06 UTC
Intel had some issues with pre-915 hardware, because they didn't have any test machines in house any longer. Seems they have acquired some, again.

Wilkon, Adam, Petr, can anybody verify this with alpha5 or alpha6 as well? Then we can close this as fixed.
Comment 59 Felix Möller 2007-07-20 16:46:59 UTC
(In reply to comment #57 from James Oakley)
> Since upgrading to alpha5, I haven't had a single crash.
> 
> It looks like my problem has been fixed, at least.
It is at least not fixed for me. I had another crash today when resuming. My system is on current factory.

My Card is an "Intel(R) 945GM" (MacBook of early 2006). Do you want my logs?
Comment 60 Felix Möller 2007-08-04 10:32:01 UTC
This problem seems to be connected with the time beeing suspended somehow. 

It nearly never happens when I try to reproduce it and suspend every 5 minutes or so. But it happens way more often in the mornings when the system has been suspended for the whole night.
Comment 61 Stefan Dirsch 2007-08-09 19:27:09 UTC
Reopen.
Comment 62 Stefan Dirsch 2007-08-09 19:32:00 UTC
Any improvements with openSUSE 10.3?
Comment 63 Christian Deckelmann 2007-08-10 10:07:44 UTC
Sorry, can´t provide the info. I didn´t see these crashes the last few months.
I have also used 10.2 instead of SLED the last months where I can´t remember to have seen a crash. Haven´t used 10.3 yet.
Maybe Seife has seen these crashes lately.
Comment 64 Felix Möller 2007-08-10 22:29:47 UTC
I have seen it more than once within the last week. Just running "openSUSE 10.3 (i586) Beta1".
Comment 65 Stefan Dirsch 2007-08-11 04:09:51 UTC
Ok.
Comment 66 Stefan Dirsch 2007-08-11 10:59:38 UTC
The most promising thing seems to be investigating this issue on Wilken's machine (comment #50).
Comment 67 Stefan Dirsch 2007-08-11 12:21:57 UTC
Date: Sat, 11 Aug 2007 14:17:48 +0200
From: Christian Henz <chrhenz@gmx.de>
To: xorg@lists.freedesktop.org
Subject: Intel 965 lockup bugs...

Hi.

Ever since I first got my 965 motherboard in november of last year, I
have been waiting for 3D to work reliably. It seems that every other 3D
application I am running results in I830WaitLpRing timeout followed by
"Fatal server error: lockup". Bug reports for these lockups are all over
bugzilla (including some of my own), but they never seem to go
anywhere.

https://bugs.freedesktop.org/show_bug.cgi?id=9415
(Opened 2006-12!)
https://bugs.freedesktop.org/show_bug.cgi?id=10506
https://bugs.freedesktop.org/show_bug.cgi?id=11269
https://bugs.freedesktop.org/show_bug.cgi?id=11319
https://bugs.freedesktop.org/show_bug.cgi?id=11847

This is really frustrating to me, because nine months down the road
(and a year after 965 support was announced), I
practically cannot use 3D.

I would really like to press these issues on the list, because lately
all the attention seems to be on display and video related stuff.

cheers,
Christian Henz
Comment 68 Stefan Dirsch 2007-08-16 19:33:42 UTC
I wonder if the proposal in comment #39 could help. 

Question to the affected persons. Is the i915 kernel module loaded at all when it happens (check with "lsmod")? 

Otherwise such a patch wouldn't help anyway.
Comment 69 Stefan Dirsch 2007-08-16 20:13:21 UTC
Needs to be fixe upstream. LATER.
Comment 70 Felix Möller 2007-08-16 20:16:32 UTC
For the record I do not have the i915 module loaded.
Comment 71 Stefan Dirsch 2007-08-16 20:19:07 UTC
Ok. So a patch based on the proposal would be useless for you. Thanks for the quick feedback!
Comment 72 Stefan Dirsch 2007-08-27 12:47:24 UTC
*** Bug 304811 has been marked as a duplicate of this bug. ***
Comment 73 Kai Zimmer 2007-08-28 10:02:31 UTC
Zhenyu Wang published a patch which might solve this problem on the linux kernel mailing list:
http://lkml.org/lkml/2007/6/11/382
Maybe this could be integrated into the OpenSuSE 10.3 kernel?
Comment 74 Stefan Dirsch 2007-08-28 10:20:03 UTC
Interesting. So this patch does fix your issues?
Comment 75 Kai Zimmer 2007-08-28 15:23:25 UTC
Difficult to tell because i can't get hands on the machine which produces the error - on the other hand my problem isn't STR/STD related. Maybe somebody with STR/STD related problems could test this?
Comment 76 Stefan Dirsch 2007-08-29 19:55:26 UTC
*** Bug 305629 has been marked as a duplicate of this bug. ***
Comment 77 Felix Möller 2007-09-25 08:08:10 UTC
Created attachment 174529 [details]
xorg.log

On my MacBook I think I hit this bug *every* time when suspending for the night. It happened at least 7 days in a row. Somewhere around 7 hours seem to be enough. I attached the log for the record. Anything I could test?
Comment 78 Felix Möller 2007-09-28 10:13:05 UTC
I applied the patch from comment #73 (Kai Zimmer) three days ago and I was able to resume two mornings in a row. Yesterday was not able to suspend at all ass I was hit several times in a row by bug #301101.

So I think the patch of comment #73 might help. Did anyone else apply and test it?
Comment 79 Stefan Dirsch 2007-10-05 20:44:27 UTC
Wang, is your patch already pushed upstream? In which kernel version?
Comment 80 Stefan Dirsch 2007-10-06 08:51:00 UTC
reopen.
Comment 81 Stefan Dirsch 2007-10-06 08:52:56 UTC
Wang, is your patch mentioned in comment #73 already pushed upstream? In which kernel version?
Comment 82 Forgotten User Drfk9mafMw 2007-10-06 11:07:40 UTC
Just wanted to inform you that apparently *I* do no longer have that problem. Due to a longer vacation I have skipped the RC-phase of openSUSE 10.3 and have installed GM now.

So *please*: whatever you do, don't break anything with a patch :)

I have the i855 onboard graphics chip:

00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

dionysos:~ # lsmod |grep intel
snd_intel8x0           36636  1
snd_ac97_codec         97060  1 snd_intel8x0
snd_pcm                82564  3 snd_pcm_oss,snd_intel8x0,snd_ac97_codec
snd                    58164  10 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
intel_agp              27156  1
snd_page_alloc         13960  2 snd_intel8x0,snd_pcm
agpgart                35764  3 drm,intel_agp
Comment 83 Zhenyu Wang 2007-10-08 02:19:16 UTC
(In reply to comment #81 from Stefan Dirsch)
> Wang, is your patch mentioned in comment #73 already pushed upstream? In which
> kernel version?
> 

No, it's not in upstream yet, and I haven't worked on gart suspend/resume issue lately. We agreed to not save/restore whole block, but look into each chipset for specific save/restore method. As a workaround on some machines, you may try to save/restore much pci config space via sysfs.

This bug track seems mixed with different problems on different chipset, so I
can't suggest much. We have seen X hangs mostly caused by 3D bug in mesa dri driver, so try with more latest mesa is recommended.
Comment 84 Stefan Dirsch 2007-10-10 21:24:51 UTC
Ok. Later again.
Comment 85 Felix Möller 2007-10-12 10:02:02 UTC
FYI:
I got online updated to kernel-default-2.6.22.9-0.4 (i.e. without the patch from comment #73) and hit the bug on the first suspend overnight again:
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0x00000000) and PRB0_TAIL (0x00000010) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.
Error in I830WaitLpRing(), timeout for 2 seconds
...

lspci:
00:02.0 VGA compatible controller [Class 0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
00:02.1 Display controller [Class 0380]: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller [8086:27a6] (rev 03)

Zhenyu Wang:
Do I understand you correctly that saving and restoring /sys/bus/pci/devices/0000\:00\:02.[10]/config would be worth a try?
I have Mesa-7.0.1-18 so this is fairly up-to-date.
Comment 86 Felix Möller 2007-10-13 23:16:46 UTC
I added a script to the suspend-framework to try whether restoring the config space of the graphics card solves anything.

# cat /usr/lib/pm-utils/sleep.d/02vga-config
#!/bin/bash
case "$1" in
  hibernate|suspend)
    for x in /sys/bus/pci/devices/*; do
      if [ `cat $x/class` = "0x030000" ]; then
        cat $x/config >/var/run/vga-pci-`basename $x`;
      fi
    done
  ;;
  thaw|resume)
  for x in /sys/bus/pci/devices/*; do
    if [ -f /var/run/vga-pci-`basename $x` ]; then
      cat /var/run/vga-pci-`basename $x` >$x/config;
    fi
  done
  ;;
esac

But still got the usual crash:
(WW) intel(0): ESR is 0x00000001, instruction error
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0x00000000) and PRB0_TAIL (0x00000010) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.
Error in I830WaitLpRing(), timeout for 2 seconds
[...]
Ring end
space: 131056 wanted 131064

Fatal server error:
lockup
Comment 87 Zhenyu Wang 2007-10-15 02:37:14 UTC
Could you try to save/restore config space on device 02.1 too?




Comment 88 Felix Möller 2007-10-17 10:09:20 UTC
I updated my suspend script and this seems to make the crashes less frequent, but they are not gone.

With todays factory the look of the crash changed. My X-Server restarted today in an "endless loop". ;) Will attach the log.

#!/bin/bash

case "$1" in
  hibernate|suspend)
    for x in /sys/bus/pci/devices/*; do
      class=`cat $x/class`
      if [ $class = "0x030000" -o $class = "0x038000" ]; then
        echo "saving the state of device $x with class $class"
        cat $x/config >/var/run/vga-pci-`basename $x`;
      fi
    done
  ;;

  thaw|resume)
  for x in /sys/bus/pci/devices/*; do
    if [ -f /var/run/vga-pci-`basename $x` ]; then
      echo "restoring the state of device $x"
      cat /var/run/vga-pci-`basename $x` >$x/config;
    fi
  done
  ;;
esac

exit 0
Comment 89 Felix Möller 2007-10-17 10:10:40 UTC
Created attachment 178967 [details]
This is the log of todays endless restarting and crashing XServer on a MacBook
Comment 90 Felix Möller 2007-12-27 09:42:48 UTC
Created attachment 188783 [details]
Xorg log

I hit the bug several times in the last two days trying my new flatscreen I got for christmas.

(WW) intel(0): ESR is 0x00000010, page table error
(WW) intel(0): PGTBL_ER is 0x00000112, host pte data, display A pte, display B pte
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0xe2215b6c) and PRB0_TAIL (0x00016668) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0x4ffc0001 pgetbl_err: 0x112
ipeir: 0 iphdr: 54000004
LP ring tail: 16670 head: 15b6c len: 1f001 start 0
eir: 0 esr: 10 emr: ffff
instdone: f8c1 instpm: 0
memmode: 306 instps: 800f00c4
hwstam: ffff ier: 0 imr: ffff iir: 0
Ring at virtual 0xa79de000 head 0x15b6c tail 0x16670 count 705
        00015aec: 19180000

Resume does not work at all with current factory. These crashes happen now while just browsing the web and moving windows. :-(
Comment 91 Kent Liu 2007-12-28 01:30:15 UTC
(In reply to comment #90 from Felix Möller)
> Resume does not work at all with current factory. These crashes happen now
> while just browsing the web and moving windows. :-(

As an upstream Intel Linux graphics driver, you can go to https://bugzilla.freedesktop.org/ to report your issue directly. Intel developers are monitoring that bugzilla list and will investigate the issue.

Comment 92 Kent Liu 2007-12-28 01:30:45 UTC
(In reply to comment #90 from Felix Möller)
> Resume does not work at all with current factory. These crashes happen now
> while just browsing the web and moving windows. :-(

As an upstream Intel Linux graphics driver bug, you can go to https://bugzilla.freedesktop.org/ to report your issue directly. Intel developers are monitoring that bugzilla list and will investigate the issue.

Comment 93 Stefan Dirsch 2007-12-28 09:01:10 UTC
Kent, this is an issue, for which we get reports since Intel released the first grahics hardware (i810) and Linux driver many years ago. I would greatly appreciate if Intel would investigate this issue, but I'm afraid Intel already tried this and failed. Maybe it's even a bug in the hardware, which cannot be fixed. It's definitely a known issue to Intel. Just search for "Error in I830WaitLpRing" as comment on https://bugzilla.freedesktop.org/.
Comment 94 Kent Liu 2007-12-28 09:05:39 UTC
> Just search for "Error in
> I830WaitLpRing" as comment on https://bugzilla.freedesktop.org/.

I830WaitLpRing problem is not a single problem. It can be caused by different situations and the fix methods are also different. Here I am afraid that Möller is triggering a new issue and upstream developer need attention.

Comment 95 Stefan Dirsch 2007-12-28 09:18:00 UTC
Ok. So we need to track this issue in seperate bugreports on https://bugzilla.freedesktop.org/.
To all, who are still affected by this issue. Please open a bugreport upstream, but do not refer to this bugreport here (nobody wants to scan 100 more or less confusing comments by different people for different issues). Instead add the required information in the upstream bugreport. Thanks.

Closing as WONTFIX.