Bug 153008

Summary: The kernel doesn't like my i2o devices
Product: [openSUSE] SUSE Linux 10.1 Reporter: Glen Kaukola <glen>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: admin
Version: Beta 4   
Target Milestone: ---   
Hardware: i686   
OS: SuSE Linux 10.1   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Bugfixes from 2.6.17

Description Glen Kaukola 2006-02-23 02:12:56 UTC
I get messages like the following over and over when I run dmesg:

slab error in cache_free_debugcheck(): cache `i2o_iop0_msg_inpool': double free, or memory outside object was overwritten
 [<c0158fd3>] cache_free_debugcheck+0xc8/0x1b3
 [<c0144763>] mempool_free+0x60/0x64
 [<c015a79b>] kmem_cache_free+0x2a/0x5c
 [<c0144763>] mempool_free+0x60/0x64
 [<f882cac9>] i2o_block_request_fn+0x408/0x538 [i2o_block]
 [<c029436f>] _spin_lock_irqsave+0x9/0xd
 [<c01b07a5>] __generic_unplug_device+0x1d/0x1f
 [<c01b1f9c>] __make_request+0x2e9/0x32b
 [<c01b017d>] generic_make_request+0x177/0x187
 [<c0147f81>] release_pages+0x58/0x147
 [<c0159460>] kmem_cache_alloc+0x7f/0x89
 [<c01bb21f>] radix_tree_node_alloc+0x10/0x4b
 [<c01b112d>] submit_bio+0xa6/0xad
 [<c01481b6>] __pagevec_lru_add+0x94/0x9f
 [<f9689131>] reiserfs_get_block+0x0/0x112d [reiserfs]
 [<c017ade4>] mpage_bio_submit+0x18/0x1b
 [<c017bd19>] mpage_readpages+0xda/0xe5
 [<f968824e>] reiserfs_readpages+0x0/0x15 [reiserfs]
 [<c01479ea>] __do_page_cache_readahead+0x139/0x1f7
 [<f9689131>] reiserfs_get_block+0x0/0x112d [reiserfs]
 [<c0141f94>] __lock_page+0x60/0x67
 [<c01444ea>] filemap_nopage+0x14f/0x315
 [<c014cb2a>] __handle_mm_fault+0x28f/0x7ca
 [<c02950c3>] do_page_fault+0x17a/0x540
 [<c0294f49>] do_page_fault+0x0/0x540
 [<c0104e3f>] error_code+0x4f/0x60
f7db4d00: redzone 1: 0x170fc2a5, redzone 2: 0xa9f7000.
slab error in cache_free_debugcheck(): cache `i2o_iop0_msg_inpool': double free, or memory outside object was overwritten
 [<c0158fd3>] cache_free_debugcheck+0xc8/0x1b3
 [<c0144763>] mempool_free+0x60/0x64
 [<c015a79b>] kmem_cache_free+0x2a/0x5c
 [<c0144763>] mempool_free+0x60/0x64
 [<f882cac9>] i2o_block_request_fn+0x408/0x538 [i2o_block]
 [<c0141f9b>] sync_page+0x0/0x3c
 [<c029436f>] _spin_lock_irqsave+0x9/0xd
 [<c0141f9b>] sync_page+0x0/0x3c
 [<c01b07a5>] __generic_unplug_device+0x1d/0x1f
 [<c01b083e>] generic_unplug_device+0x15/0x21
 [<c01aed7a>] blk_backing_dev_unplug+0xc/0xd
 [<c015df86>] block_sync_page+0x32/0x35
 [<c0141fcf>] sync_page+0x34/0x3c
 [<c0293938>] __wait_on_bit_lock+0x2a/0x52
 [<c0141f94>] __lock_page+0x60/0x67
 [<c01311b0>] wake_bit_function+0x0/0x3c
 [<c01445c8>] filemap_nopage+0x22d/0x315
 [<c014cb2a>] __handle_mm_fault+0x28f/0x7ca
 [<c01338a2>] ktime_get_ts+0x17/0x46
 [<c02950c3>] do_page_fault+0x17a/0x540
 [<c0294f49>] do_page_fault+0x0/0x540
 [<c0104e3f>] error_code+0x4f/0x60
f7451244: redzone 1: 0x170fc2a5, redzone 2: 0x1e18b000.
Comment 1 Andreas Kleen 2006-02-23 09:52:48 UTC
Well, first you would need to specify what i2o device you have.
Comment 2 Glen Kaukola 2006-02-23 14:45:41 UTC
Adaptec 2005S, zero channel RAID card.  The motherboard is a Supermicro P4DC6+.
Comment 3 Hannes Reinecke 2006-02-23 15:15:24 UTC
And this actually works with i2o?
Or do you rather need the aacraid driver?

Or (heaven forbid) the dpt_i2o?
Comment 4 Glen Kaukola 2006-02-28 16:14:50 UTC
Well under Mandriva I use the i2o driver.  Works great.  I believe Suse was different in that the dpt_i2o driver was loaded.  Sorry I'm not sure though, my Suse install is gone.
Comment 6 Hannes Reinecke 2006-03-13 11:01:22 UTC
Actually, this should already be fixed with the latest updates from -rc. There have been initialisation errors with dpt_i2o which might have cause this.
Please re-test with Beta8 and re-open if the problem persists.
Comment 7 pokey templar 2006-05-29 21:57:29 UTC
I am running the same Adaptec 2005S ZCR card in an Adaptec x5DA8 Motherboard Dual 3Ghz processors and 4GB ECC memory.  Currently trying to run opensuse.org 10.1 Final because I wasted my money purchasing 10.0 and was unsuccessful in getting it to run on my system.
Comment 8 Achim Mildenberger 2006-07-11 07:35:21 UTC
Hi,

I think there is a general problem with the 2.6.16 kernel that ships with SuSE 10.1 and i2o devices. Maybe more important than my own kernel oopses and crashes are the following references for the problem:
http://bugzilla.kernel.org/show_bug.cgi?id=6561
(see also 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 )
Apparently some changes from 2.6.15 to 2.6.16 made the i2o drivers
instable. There is a patch available for kernel 2.6.16 from
the authors of the i2o subsystem
  http://i2o.shadowconnect.com/download.php
  http://i2o.shadowconnect.com/changes.php
(The patch has been incorporated in mainstream 2.6.17.)

I also tried to use adaptecs (old) dpt_i2o module (instead of i2o_block).
This can be done via manual installation and removing the i2o subsystem
drivers. 
But this also leads to kernel failures. (Setup: Adaptec 2010S Raid controller,
SuSE 10.1, Xeon 2.8 GHz). If you are intereted in the log files, please tell me.

So, it seems there is no other way than using an own kernel if one
wants to run SuSE 10.1 and use this Adaptec RAID controller.
Comment 9 Jeff Mahoney 2006-08-25 15:02:09 UTC
Created attachment 97157 [details]
Bugfixes from 2.6.17

From: Markus Lidel <Markus.Lidel@shadowconnect.com>

- Fixed locking of struct i2o_exec_wait in Executive-OSM

- Removed LCT Notify in i2o_exec_probe() which caused freeing memory and
  accessing freed memory during first enumeration of I2O devices

- Added missing locking in i2o_exec_lct_notify()

- removed put_device() of I2O controller in i2o_iop_remove() which caused
  the controller structure get freed to early

- Fixed size of mempool in i2o_iop_alloc()

- Fixed access to freed memory in i2o_msg_get()

See http://bugzilla.kernel.org/show_bug.cgi?id=6561
Comment 10 Jeff Mahoney 2006-08-25 15:11:56 UTC
Glen -

The above patch is already in our kernel. You'll get it automatically in an update kernel if you haven't already. Closing as FIXED.