Bug 148249 - oops, "EIP is at ext3_clear_inode+0x1a/0x80 [ext3]"
Summary: oops, "EIP is at ext3_clear_inode+0x1a/0x80 [ext3]"
Status: RESOLVED INVALID
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: Other Other
: P5 - None : Normal
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-04 19:32 UTC by Forgotten User ZhJd0F0L3x
Modified: 2006-02-24 23:19 UTC (History)
1 user (show)

See Also:
Found By: Component Test
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
oops (3.17 KB, text/plain)
2006-02-04 19:33 UTC, Forgotten User ZhJd0F0L3x
Details
oops-collection (15.68 KB, text/plain)
2006-02-18 08:32 UTC, Forgotten User ZhJd0F0L3x
Details
oops and sysrq-t snippet. (14.96 KB, text/plain)
2006-02-19 22:25 UTC, Forgotten User ZhJd0F0L3x
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User ZhJd0F0L3x 2006-02-04 19:32:45 UTC
I was burning a video DVD with:
growisofs -dvd-compat -Z /dev/dvd -dvd-video .
after about 20 seconds, the machine oopsed. I will attach the oops.

The machine has ext3 as root, but the filesystem the video dvd was burnt from, is reiserfs.
Comment 1 Forgotten User ZhJd0F0L3x 2006-02-04 19:33:32 UTC
Created attachment 66474 [details]
oops
Comment 2 Chris L Mason 2006-02-06 14:42:37 UTC
Do you have quotas on?  Looks like good old fashioned memory corruption on the free inode list.  
Comment 3 Forgotten User ZhJd0F0L3x 2006-02-06 15:16:51 UTC
(In reply to comment #2)
> Do you have quotas on?

no. Standard 10.0 installation, the reiserfs partition is left over from an older installation.

The mainboard is a ASUS a7v Athlon mainboard (KT133) from 2000. Might that be a hardware glitch? The VIA crap always seems a bit shakey although i am not using the promise controller that is also on the board :-)
Comment 4 Chris L Mason 2006-02-07 12:47:47 UTC
Can you try to narrow down what triggers it?  Generic usage, just the CD etc etc.
Comment 5 Greg Kroah-Hartman 2006-02-09 22:59:26 UTC
Is this fixed in the 10.1 beta 3 kernel release?
Comment 6 Forgotten User ZhJd0F0L3x 2006-02-09 23:12:34 UTC
i have no idea. This is a 10.0 and will probably die horribly if i install a newer kernel. This is also the family-tv, so i am not willing to experiment on it.
I will burn DVDs again this weekend, but i have done so before and not hit that bug.
I have only seen this once, so it might well be a passing gravitational wave, but since i am QA, i report all bugs i see ;-)
Comment 7 Forgotten User ZhJd0F0L3x 2006-02-18 08:32:04 UTC
Created attachment 69165 [details]
oops-collection

Yesterday evening, the machine hung "partially" which means the display was locked and ssh login etc was not possible, but there was regular disk activity (led blinking, maybe hal). Shutdown via powerbutton => powersaved was not possible, sysrq worked, but unfortunately only a fragment of sysrq-t made it into the logs :-(

I examined the logs and found an oops yesterday morning, then i looked further and found 4 oopsen since january 1st, including the one from comment #1.
3 of them are in ext3_dquot_drop, the first was in prune_dcache, I'll attach them.

I included some context in the logs. It looks (from the "su to nobody") that they were often around the time the cron.daily scripts are running, i have locate installed, so updatedb is also running at this time.

If this all looks very fishy: i would not totally rule out a hardware problem.
This is an athlon board with the infamous kt133 chipset and there is always mplayer running, showing live tv from a saa7134 card, so i would not be too surprised if there was some stray dma all over my memory :-)

I will now disable the (unused, but still loaded) pdc202xx_old driver and force an ext3 filesystemcheck, just to make sure.
Comment 8 Forgotten User ZhJd0F0L3x 2006-02-19 22:25:36 UTC
Created attachment 69217 [details]
oops and sysrq-t snippet.

the machine locked up again; this time it was playing an mpeg stream via NFS.

I tried sysrq-T, but only a small part of it made it into syslog, probably because the filesystem crashed?
Comment 11 Jan Kara 2006-02-21 15:22:16 UTC
Hmm.. The oopses in ext3_clear_inode() look really like a single-bit error. i_acl or i_acl_default should be -1 but they have one bit cleared. ext3_drop_dquot() is just a garbage on the stack from previous call (but it's strange it was ever called if quotas are disabled). What also puzzles me is the fact that we got three-times oops at the same place but not anywhere else (if it's flaky memory or some other HW problem). Anyways memtest is definitely a good place to start.
Comment 12 Forgotten User ZhJd0F0L3x 2006-02-24 23:19:06 UTC
Memtest hit on the first run, at location 96mb (machine has 3x128mb)
And it seems the first of the 3 memory banks has a problem since every module i plug in there sooner or later gets sie bit-error, so it looks like the machine will have to live with 256mb and i'll leave the third slot vacant :-)

Anyway, this makes the bug clearly invalid.

I should have come to the memtest idea by myself... sorry for the noise.