Bug 145459

Summary: memory corruption with modprobe loop max_loops=42 in initrd
Product: [openSUSE] SUSE Linux 10.1 Reporter: Olaf Hering <ohering>
Component: KernelAssignee: Jens Axboe <axboe>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: forgotten_t1uVJ6IbUR
Version: Beta 2   
Target Milestone: ---   
Hardware: PowerPC-64   
OS: Linux   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Only init online CPUs
Poison invalid cpus
Fix percpu init of invalid cpus

Description Olaf Hering 2006-01-25 12:33:03 UTC
adding loop.ko to initrd with max_loop=42 leads to memory corruption.
my g5 doesnt get past the modprobe command, invalid memory access in .flush_sigqueue(), some pointer there was null.

a power5 box boots further and shows this error:

oop: loaded (max 42 devices)
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
EXT3 FS on sda3, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
dm-netlink version 0.0.2 loaded
Intel(R) PRO/1000 Network Driver - version 7.0.28-NAPI
Copyright (c) 1999-2005 Intel Corporation.
PCI: Enabling device: (0000:d0:01.0), cmd 3
e1000: 0000:d0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:13:86
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0000:d0:01.1), cmd 3
e1000: 0000:d0:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:13:87
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:c8:01.0), cmd 3
e1000: 0001:c8:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:20:d4
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:c8:01.1), cmd 3
e1000: 0001:c8:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:20:d5
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Capability LSM initialized
Adding 1050616k swap on /dev/sda2.  Priority:42 extents:1 across:1050616k
slab error in cache_alloc_debugcheck_after(): cache `size-256': double free, or memory outside object was overwritten
Call Trace:
[C00000006C727850] [C00000000000ED88] .show_stack+0x68/0x1b0 (unreliable)
[C00000006C7278F0] [C0000000000BA974] .__slab_error+0x38/0x50
[C00000006C727970] [C0000000000BB864] .cache_alloc_debugcheck_after+0xcc/0x1a4
[C00000006C727A10] [C0000000000BBA9C] .kmem_cache_alloc+0x160/0x190
[C00000006C727AB0] [C000000000107DF0] .load_elf_binary+0x74/0x1b40
[C00000006C727C20] [C0000000000D1414] .search_binary_handler+0x138/0x408
[C00000006C727CE0] [C0000000000FEB14] .compat_do_execve+0x1e4/0x33c
[C00000006C727D90] [C000000000015198] .compat_sys_execve+0x74/0xf4
[C00000006C727E30] [C0000000000086F8] syscall_exit+0x0/0x40
c000000000432d88: redzone 1: 0xc000000000432d80, redzone 2: 0x5a2cf071.
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
Comment 1 Jens Axboe 2006-01-25 12:54:49 UTC
That looks strange! The loop init is pretty trivial, I can't see what could go wrong. What happens if you use an even larger number?
Comment 2 Chris L Mason 2006-01-25 12:55:44 UTC
Are we sure this isn't just another viewing of our other pop up memory corruption bug(s)?
Comment 3 Jens Axboe 2006-01-25 12:57:06 UTC
It certainly smells like it, the loop init path looks pretty bullet proof.
Comment 4 Jens Axboe 2006-01-25 13:05:09 UTC
Another idea - this could be a ppc compiler bug as well, I suppose. Does the kernel work if you built it with the SLES9 compiler?
Comment 5 Chris L Mason 2006-01-25 13:28:01 UTC
Could you please cvs update, recompile and modprobe crasher threads=8 (or more, don't be shy).
Comment 6 Olaf Hering 2006-01-25 13:48:11 UTC
Freeing unused kernel memory: 248k freed
Starting udevd
Creating devices
Loading sd_mod
SCSI subsystem initialized
slab error in cache_alloc_debugcheck_after(): cache `size-256': double free, or memory outside object was overwritten
Call Trace:
[C0000000EB3A76C0] [C00000000000ED88] .show_stack+0x68/0x1b0 (unreliable)
[C0000000EB3A7760] [C0000000000BA974] .__slab_error+0x38/0x50
[C0000000EB3A77E0] [C0000000000BB864] .cache_alloc_debugcheck_after+0xcc/0x1a4
[C0000000EB3A7880] [C0000000000BC7CC] .__kmalloc+0x1ac/0x1dc
[C0000000EB3A7920] [C00000000009F200] .kzalloc+0x18/0x4c
[C0000000EB3A79B0] [C0000000001C4710] .kobject_uevent+0x120/0x4dc
[C0000000EB3A7AA0] [C0000000001C4294] .kobject_register+0x68/0x8c
[C0000000EB3A7B30] [C00000000024D750] .bus_add_driver+0x70/0x18c
[C0000000EB3A7BD0] [C00000000024ECAC] .driver_register+0xd0/0xf4
[C0000000EB3A7C80] [D00000000007A484] .scsi_register_driver+0x20/0x38 [scsi_mod]
[C0000000EB3A7D00] [D000000000033570] .init_sd+0x10c/0x12c [sd_mod]
[C0000000EB3A7D90] [C0000000000828B8] .sys_init_module+0x1e4/0x448
[C0000000EB3A7E30] [C0000000000086F8] syscall_exit+0x0/0x40
c000000000432d88: redzone 1: 0xc000000000432d80, redzone 2: 0x5a2cf071.
Loading ipr
ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005)
ipr 0000:c0:01.0: Found IOA with IRQ: 99

happens even before loop is loaded.
Comment 7 Olaf Hering 2006-01-25 13:59:37 UTC
same with sles9 gcc.
Comment 8 Jens Axboe 2006-01-25 14:01:38 UTC
Olaf, I think this was discussed (and Brian posted a patch) on linux-scsi very recently.
Comment 9 Olaf Hering 2006-01-25 14:34:25 UTC
what patch, the sg ones?
Comment 10 Jens Axboe 2006-01-25 14:36:49 UTC
Hmm can't seem to locate it now, it was a scsi host / ipr possible memory corruption bug iirc.
Comment 11 Olaf Hering 2006-01-25 15:37:16 UTC
since I see it also on a mac, we can rule out ipr.
Comment 12 Olaf Hering 2006-01-25 22:23:46 UTC
see it even on ide only, js20:

(none):~ # init 6
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Slab corruption: start=c00000001db0d178, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c00000000021f204>](.release_mem+0x274/0x2a0)
120: ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b
Prev obj: start=c00000001db0c960, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0000000000bccc4>](.alloc_arraycache+0x40/0x94)
000: 00 00 00 00 00 00 00 80 ba ad f0 0d 00 00 00 00
010: 00 00 00 00 de ad 4e ad ff ff ff ff 00 00 00 00
Boot logging started on /dev/hvc0(/dev/console) at Wed Jan 25 14:20:00 2006
Master Resource Control: previous runlevel: *, switching to runlevel: 6
Running /etc/init.d/halt.local                                        done
Sending all processes the TERM signal...
Sending all processes the KILL signal...                              done
Turning off quota
                                                                      done
Comment 13 Jens Axboe 2006-01-26 09:21:30 UTC
Olaf, did you try Chris' slab checker?
Comment 14 Olaf Hering 2006-01-26 11:00:43 UTC
did not find anything so far, on cranberry.
slab debug was enable since a while, no idea why it starts to be a problem now. maybe it just did not trigger. has scsi (or whatever) some alignment requirements?
Comment 15 Jens Axboe 2006-01-26 11:29:36 UTC
I don't think the alignment has much to do with it. It could be a missing init of some structure, sometimes that works until you turn on slab debugging which fills the memory allocated with a poison.
Comment 16 Olaf Hering 2006-01-26 22:20:47 UTC
does it do anything?

Freeing unused kernel memory: 248k freed
Starting udevd
Creating devices
Loading sd_mod
SCSI subsystem initialized
Loading crasher
crasher: module not supported by Novell, setting U taint flag.
crasher module (4 threads).  Testing sizes: 32 64 128 192 256 1024 2048 4096
Loading ipr
ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005)
ipr 0000:d0:01.0: Found IOA with IRQ: 325
ipr 0000:d0:01.0: Starting IOA initialization sequence.
ipr 0000:d0:01.0: Adapter firmware version: 05100030
ipr 0000:d0:01.0: IOA initialized.
scsi0 : IBM 573E Storage Adapter
  Vendor: IBM   H0  Model: HUS151473VL3800   Rev: S37A
  Type:   Direct-Access                      ANSI SCSI revision: 04
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
 sda: sda1 sda2 sda3
sd 0:1:5:0: Attached scsi disk sda
  Vendor: IBM   H0  Model: HUS151473VL3800   Rev: S37A
  Type:   Direct-Access                      ANSI SCSI revision: 04
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write through w/ FUA
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write through w/ FUA
 sdb:<5>sd 0:1:5:0: Attached scsi generic sg0 type 0
 sdb1 sdb2 sdb3
sd 0:1:8:0: Attached scsi disk sdb
sd 0:1:8:0: Attached scsi generic sg1 type 0
  Vendor: IBM       Model: VSBPD2E1  U4SCSI  Rev: 4805
  Type:   Enclosure                          ANSI SCSI revision: 02
 0:1:15:0: Attached scsi generic sg2 type 13
Slab corruption: start=c000000000432000, len=4096
d80: c0 00 00 00 00 43 2d 80 c0 00 00 00 00 43 2d 80
scsi: unknown device type 31
  Vendor: IBM       Model: 573E001           Rev: 0150
  Type:   Unknown                            ANSI SCSI revision: 00
 0:255:255:255: Attached scsi generic sg3 type 31
...
Comment 17 Chris L Mason 2006-01-27 01:51:15 UTC
Do you mean aside from tainting your kernel?

Loading crasher
crasher: module not supported by Novell, setting U taint flag.
crasher module (4 threads).  Testing sizes: 32 64 128 192 256 1024 2048 4096

It runs in the background in a loop:
while(1):
    allocate randomly sized buffers from the list above
    set a pattern in the buffers
    sleep for a random amount of time
    check the pattern
    free the buffers

Basically it increases the slab cache load and complains if it finds errors.  It is best used in conjunction with long running tests where we find evidence of memory corruption after hours or days of use.  Usually it can find trigger the corruption faster and help track things down.
Comment 18 Olaf Hering 2006-01-27 14:33:25 UTC
doesnt happen with mainline kernel, only with our cvs. only 318 patches to go.
Comment 19 Olaf Hering 2006-01-29 03:38:40 UTC
fails with only 159 patches applied.
Comment 20 Olaf Hering 2006-01-29 03:50:49 UTC
and even with only 40 applied
Comment 21 Olaf Hering 2006-01-29 11:53:13 UTC
and even with only 20 applied
Comment 22 Chris L Mason 2006-01-29 16:59:14 UTC
20 patches applied isn't very much at all.  This starts to smell like a difference in configuration options.
Comment 23 Olaf Hering 2006-01-30 10:23:10 UTC
today it doesnt reproduce, even after a few reboots. will keep looking at this.
Comment 24 Olaf Hering 2006-01-30 10:42:31 UTC
reproduced again with 20 patches applied. .config was always the same.
Comment 25 Olaf Hering 2006-01-30 11:03:01 UTC
fresh build with only 10 patches, now I suspect mainline is broken as well:

Freeing unused kernel memory: 260k freed
Starting udevd
Creating devices
Loading sd_mod
SCSI subsystem initialized
Loading loop
loop: loaded (max 255 devices)
Loading ipr
ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005)
ipr 0000:c0:01.0: Found IOA with IRQ: 99
ipr 0000:c0:01.0: Starting IOA initialization sequence.
ipr 0000:c0:01.0: Adapter firmware version: 020A004E
ipr 0000:c0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
  Vendor: IBM       Model: ST373453LC        Rev: C51A
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
 sda: sda1 sda2 sda3 sda4
sd 0:0:3:0: Attached scsi disk sda
  Vendor: IBM       Model: VSBPD3E   U4SCSI  Rev: 4812
  Type:   Enclosure                          ANSI SCSI revision: 02
Unable to handle kernel paging request for data at address 0x00000004
Faulting instruction address: 0xc0000000001dcc98
cpu 0x0: Vector: 300 (Data Access) at [c0000000ebcd37e0]
    pc: c0000000001dcc98: ._raw_spin_lock+0x28/0x17c
    lr: c000000000388b40: ._spin_lock+0x10/0x24
    sp: c0000000ebcd3a60
   msr: 8000000000009032
   dar: 4
 dsisr: 40000000
  current = 0xc0000000ebcc1000
  paca    = 0xc0000000004a6e00
    pid   = 26, comm = events/0
enter ? for help
0:mon> t
[c0000000ebcd3af0] c000000000388b40 ._spin_lock+0x10/0x24
[c0000000ebcd3b70] c000000000385380 .klist_del+0x28/0x58
[c0000000ebcd3c00] c000000000262bb0 .device_del+0x50/0x120
[c0000000ebcd3ca0] d00000000007ac18 .scsi_target_reap_work+0xe0/0x12c [scsi_mod]
[c0000000ebcd3d30] c000000000077bdc .run_workqueue+0x108/0x19c
[c0000000ebcd3dd0] c000000000077dc0 .worker_thread+0x150/0x1c0
[c0000000ebcd3ed0] c00000000007d72c .kthread+0x140/0x190
[c0000000ebcd3f90] c000000000025d1c .kernel_thread+0x4c/0x68
0:mon> 
Comment 26 Olaf Hering 2006-01-30 16:57:50 UTC
klist_del was called with a klist_node* which was all zeros.

http://marc.theaimsgroup.com/?l=linux-scsi&m=113864009506400&w=2

no idea if there is a race in scsi or sysfs. I just reboot in a loop right now, and the bug happens rarely.
Comment 27 Olaf Hering 2006-01-30 18:05:05 UTC
last bug was also reported on x86_64.

http://lkml.org/lkml/2006/01/30/164

no idea if they are related.
Comment 28 Olaf Hering 2006-01-30 20:22:59 UTC
a typical value looks like this (if I hit the initial slab thing):

Freeing unused kernel memory: 260k freed
Starting udevd
Creating devices
Loading sd_mod
SCSI subsystem initialized
Loading loop
Slab corruption: start=c000000000455000, len=4096
d80: c0 00 00 00 00 45 5d 80 c0 00 00 00 00 45 5d 80
loop: loaded (max 255 devices)
Loading ipr

c000000000454f20 d per_cpu__watchdog_task
c000000000454f28 D per_cpu__nr_pagecache_local
c000000000454f30 d per_cpu__page_states
c000000000455088 d per_cpu__ratelimits.2
c000000000455090 d per_cpu__committed_space
c000000000455098 d per_cpu__lru_add_active_pvecs
c000000000455118 d per_cpu__lru_add_pvecs
c000000000455198 d per_cpu__reap_work
c0000000004551f8 d per_cpu__bh_lrus
c000000000455238 d per_cpu__bh_accounting
c000000000455240 d per_cpu__fdtable_defer_list
c0000000004552f0 d per_cpu__blk_cpu_done
c000000000455300 D per_cpu__radix_tree_preloads
c000000000455368 d per_cpu__trickle_count
c00000000045536c d per_cpu__proc_event_counts
c000000000455370 d per_cpu__loopback_stats
c000000000455480 d per_cpu__sockets_in_use
c000000000455500 D per_cpu__netdev_rx_stat
c000000000455580 D per_cpu__softnet_data
c000000000455b00 d per_cpu__net_rand_state
c000000000455b80 d per_cpu__flow_flush_tasklets
c000000000455c00 d per_cpu__flow_hash_info
c000000000455c80 d per_cpu__flow_tables
c000000000455d00 d per_cpu__rt_cache_stat
c000000000455d40 d per_cpu____icmp_socket
c000000000455d48 D __per_cpu_end

Comment 29 Olaf Hering 2006-01-31 13:19:14 UTC
my small slab corruption could be caused by the per_cpu optimizations.
so we have now 3 bugs here.

-the bogus struct device, I hit that alot even without the per_cpu patch
-the 256 slab thing, I did not see it so far with git-bisect
-the slab corruption without details, pointer is in the per_cpu__* range

git-whatchanged 61943c5015acae42fe355bf033bbee8d63691fe3..621009f4baf24581211dbb7b0816e01ed9fa8496

will look at #3 once #1 is analyzed.
Comment 30 Olaf Hering 2006-01-31 20:00:44 UTC
I'm trying without this one now:

From: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: [PATCH] powerpc/64: per cpu data optimisations
X-Git-Commit: 7a0268fa1a3613f2c526a9b3058701b277f6abe1
Comment 31 Olaf Hering 2006-01-31 23:21:17 UTC
I did 117 reboots without the patch above and hit none of the 3 bugs mentioned above. will let it run overnight.
Comment 32 Olaf Hering 2006-02-01 07:56:30 UTC
current kernel cvs + the percpu patch reverted, 460 reboots without trouble.
Comment 33 Olaf Hering 2006-02-01 08:21:06 UTC
yeah, it starts misbehaving (now with entirely different symptoms) right after the first boot.
I will revert the patch now and ask Anton for advice.

now looking for the bogus struct device.
Comment 34 Jens Axboe 2006-02-01 08:52:25 UTC
Olaf, I think I know what is going on. With that patch applied, we cannot use NR_CPUS for init loops anymore. I'm surprised Anton didn't check for more of these, there seems to me many of them. Since you seem to hit the problem with SCSI and io more than others, can you just try the patch I'm attaching below?
Comment 35 Jens Axboe 2006-02-01 08:56:06 UTC
Created attachment 65982 [details]
Only init online CPUs
Comment 36 Forgotten User t1uVJ6IbUR 2006-02-01 11:57:22 UTC
Jens: Nice catch. While I did audit the ppc64 users of per cpu data, I unfortunately did not go through the entire tree. I thought x86-64 had converted to per cpu data for possible cpus only, and if so Im surprised it didnt show up there too.

It does highlight the fact that we should poison the per cpu data pointers for invalid cpus so we oops straight away.
Comment 37 Jens Axboe 2006-02-01 11:59:03 UTC
Anton, that was my assumption as well (that x86-64 had done the same), strange why we don't see it there. Poisoning definitely sounds like a good idea!

Olaf, can you try with the patch I attached (and make sure to remove the revert of the ppc percpu data patch as well, of course)?
Comment 38 Olaf Hering 2006-02-01 12:25:56 UTC
will try it asap. looks like the the beta3 kernel needs either one to make it usable at all. argh. have to double check.
get bad swap file entry at least with cvs.
Comment 39 Olaf Hering 2006-02-01 12:44:45 UTC
yeah, beta3 is bad. try again player 1.
Comment 40 Jens Axboe 2006-02-01 12:48:50 UTC
Olaf, I don't follow - what did you test?
Comment 41 Olaf Hering 2006-02-01 12:57:13 UTC
Jens, I will test this patch on top of Linus git3 shortly. Its just that beta3 as is will hit the bug more often, which makes it unusable. (just to document it here.)
Comment 42 Forgotten User t1uVJ6IbUR 2006-02-01 13:44:11 UTC
Created attachment 66026 [details]
Poison invalid cpus

Ensure that per cpu access to !possible cpus causes a fault instead of silent corruption.
Comment 43 Forgotten User t1uVJ6IbUR 2006-02-01 13:45:59 UTC
Created attachment 66027 [details]
Fix percpu init of invalid cpus

The patch to poison invalid cpus caught some more problems. Here is a patch to fix them.
Comment 44 Jens Axboe 2006-02-01 13:49:46 UTC
Thanks Anton, the SCSI patch needs to use for_each_cpu() as well to be safe (instead of for_each_online_cpu().

Will you push this to Linus asap? I'll send the one I have now.
Comment 45 Olaf Hering 2006-02-02 13:05:03 UTC
will start testing now.
Comment 46 Olaf Hering 2006-02-02 15:20:58 UTC
with just the poison patch on top of 2616rc1git3, I got this:

[boot]0015 Setup Done
Unable to handle kernel paging request for data at address 0xaeeeeeeeef33772e
Faulting instruction address: 0xc000000000429418
cpu 0x0: Vector: 300 (Data Access) at [c000000000453bc0]
    pc: c000000000429418: .sched_init+0x5c/0x184
    lr: c00000000040f550: .start_kernel+0x58/0x288
    sp: c000000000453e40
   msr: 8000000000001032
   dar: aeeeeeeeef33772e
 dsisr: 42000000
  current = 0xc00000000049e250
  paca    = 0xc00000000049ee00
    pid   = 0, comm = swapper
enter ? for help
0:mon> 

will apply the other ones now.
Comment 47 Olaf Hering 2006-02-02 15:28:06 UTC
now I get this:

Freeing unused kernel memory: 256k freed
Starting udevd
Creating devices
Loading sd_mod
Unable to handle kernel paging request for data at address 0xaeeeeeeeef33cc6e
Faulting instruction address: 0xd00000000007dc28
cpu 0x1: Vector: 300 (Data Access) at [c0000000eb3eba90]
    pc: d00000000007dc28: .init_scsi+0xb8/0x140 [scsi_mod]
    lr: d00000000007dbec: .init_scsi+0x7c/0x140 [scsi_mod]
    sp: c0000000eb3ebd10
   msr: 8000000000009032
   dar: aeeeeeeeef33cc6e
 dsisr: 42000000
  current = 0xc0000000eb90c000
  paca    = 0xc00000000049f000
    pid   = 1035, comm = modprobe
enter ? for help
1:mon> 
Comment 48 Olaf Hering 2006-02-02 16:13:28 UTC
hmm, is there a race? adding a printk:
+++ linux-2.6.16-rc1-git3/drivers/scsi/scsi.c
@@ -1245,8 +1245,10 @@ static int __init init_scsi(void)
        if (error)
                goto cleanup_sysctl;
 
-       for_each_online_cpu(i)
+       for_each_online_cpu(i) {
+               printk(KERN_ERR "cpu %d\n", i);
                INIT_LIST_HEAD(&per_cpu(scsi_done_q, i));
+       }
 
        devfs_mk_dir("scsi");
        printk(KERN_NOTICE "SCSI subsystem initialized\n");


Freeing unused kernel memory: 256k freed
Starting udevd
Creating devices
Loading sd_mod
cpu 0
cpu 1
cpu 2
cpu 3
cpu 4
cpu 5
cpu 6
cpu 7
SCSI subsystem initialized
Loading loop
loop: loaded (max 255 devices)
Comment 49 Olaf Hering 2006-02-02 16:50:31 UTC
maybe I forgot mkinitrd or something, at least its working now.

who will send it on?
Comment 50 Jan Beulich 2006-02-03 14:43:36 UTC
Just as a note - the last patch breaks the i386 Xen kernel.
Comment 51 Olaf Hering 2006-02-03 15:12:15 UTC
Jan, which part of it?
Comment 52 Olaf Hering 2006-02-06 15:20:33 UTC
patch is in 2.6.16-rc2-git2

the bogus struct device crash in raw_spinlock is tracked in bug #148133.