Bug 146513 - firewire/ieee1394/ohci1394/sbp2: slab error in cache_free_debugcheck(): cache `size-512(DMA)': double free, or memory outside object was overwritten
Summary: firewire/ieee1394/ohci1394/sbp2: slab error in cache_free_debugcheck(): cache...
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 2
Hardware: All SuSE Linux 10.1
: P5 - None : Critical (vote)
Target Milestone: Beta 6
Assignee: Bernhard Kaindl
QA Contact: E-mail List
URL: http://bugzilla.kernel.org/show_bug.c...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-29 15:43 UTC by Joachim Reichelt
Modified: 2006-02-27 16:38 UTC (History)
1 user (show)

See Also:
Found By: Beta-Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
boot.msg (35.54 KB, text/plain)
2006-01-29 15:44 UTC, Joachim Reichelt
Details
output from hwinfo (302.95 KB, text/plain)
2006-01-30 20:02 UTC, Joachim Reichelt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joachim Reichelt 2006-01-29 15:43:27 UTC
Changeing the USB/Firewire configuration or accessing it gives a deadlock.
Nothing in messages etc.
Keyboard/mose are dead.
Found only one hint in /var/log/boot.msg:
double free...
Will attach file in next step.
First lock: switched on second ieee-disk
reboot (next day)
started digikam to access USB-logitech webcam
Comment 1 Joachim Reichelt 2006-01-29 15:44:33 UTC
Created attachment 65556 [details]
boot.msg
Comment 3 Olaf Kirch 2006-01-30 12:22:07 UTC
The backtrace implicates the SCSI stack.
Comment 5 Jens Axboe 2006-01-30 12:30:14 UTC
I wonder if this is a dupe of 145459, they certainly look related.
Comment 7 Joachim Reichelt 2006-01-30 20:02:28 UTC
Created attachment 65752 [details]
output from hwinfo

This is a SCSI-Only system, upgraded from 10.0 to
10.1 beta2 (CD-Version) in one step.
Comment 9 Bernhard Kaindl 2006-02-14 18:21:44 UTC
bug #145459, which is referred to as related resulted in a fix added to 2.6.16-rc2-git2(or so) and our current kernels are based on 2.6.16-rc3,
so they should have the fix from bug #145459.

I suggest trying with the Kernel of Beta3 or the next, soon to be released Beta4.

If Beta3/4 is not feasible/reachable, a test with plain 2.6.16-rc3 kernel is also interesting, it should certainly work. At least I used the rc3-based Beta3 Kernel with USB (digicam) without problem.

At least I had no as critical problems with USB and SBP2-disks so
far, but I can also do some testing in this regard.

Summary: Should work with latest code, I will of course test USB and SBP2 disks and raw1394. No doubt on that.
Comment 10 Joachim Reichelt 2006-02-16 19:01:37 UTC
No more disk releated problems.
Comment 11 Bernhard Kaindl 2006-02-22 14:37:11 UTC
I reproduced the slab error contained in the boot.msg from
attachment 65556 [details] (comment #1). It was is SCSI/Firewire (SBP2)
problem which has been the target of some patches by Stefan Richter
who maintains Firewire and the SBP2 driver. The fixes which he
produced may have fixed firewire disk indeed already, but the
error is only seen with CONFIG_DEBUG_SLAB=y, which our current
testing kernels have to catch such errors.

The correct fix which fixes all possible causes of this problem,
even if it occurs with an USB-attached disk, fixed my disk too:

http://sourceforge.net/mailarchive/message.php?msg_id=14879016

It's documented in full detail (in the references) in this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=6114

I hope that the patch from Al makes it into mainline, at least
the critical part will make it fore sure and I'll check if we
add it to the next beta.
Comment 12 Bernhard Kaindl 2006-02-27 15:00:15 UTC
The crucial part of the patch from Al Viro has been added to mainline.
Here is the git commit:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=489708007785389941a89fa06aedc5ec53303c96

See also the kernel.org bug for further information:
http://bugzilla.kernel.org/show_bug.cgi?id=6114

For even more detail, there is a long thread on the lkml
http://marc.theaimsgroup.com/?t=114065708500001
which started with: [PATCH 1/2] sd: fix memory corruption by sd_read_cache_type

The patch is part of 2.6.16-rc-git10, which is in our kernel CVS now.
It compiled, isntalled and tested it in the configuration in which
I could reproduce the bug, so I can confirm, it's fixed.

Assuming that our current kernel CVS will be checked-in for Beta5 today,
this fix will be part of Beta5.
Comment 13 Bernhard Kaindl 2006-02-27 16:38:50 UTC
s/Beta5/Beta6/g