Bugzilla – Bug 145459
memory corruption with modprobe loop max_loops=42 in initrd
Last modified: 2008-07-16 15:45:54 UTC
adding loop.ko to initrd with max_loop=42 leads to memory corruption. my g5 doesnt get past the modprobe command, invalid memory access in .flush_sigqueue(), some pointer there was null. a power5 box boots further and shows this error: oop: loaded (max 42 devices) kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. EXT3 FS on sda3, internal journal device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com dm-netlink version 0.0.2 loaded Intel(R) PRO/1000 Network Driver - version 7.0.28-NAPI Copyright (c) 1999-2005 Intel Corporation. PCI: Enabling device: (0000:d0:01.0), cmd 3 e1000: 0000:d0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:13:86 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0000:d0:01.1), cmd 3 e1000: 0000:d0:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:13:87 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0001:c8:01.0), cmd 3 e1000: 0001:c8:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:20:d4 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0001:c8:01.1), cmd 3 e1000: 0001:c8:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:20:d5 e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection kjournald starting. Commit interval 5 seconds EXT3 FS on dm-0, internal journal EXT3-fs: mounted filesystem with ordered data mode. Capability LSM initialized Adding 1050616k swap on /dev/sda2. Priority:42 extents:1 across:1050616k slab error in cache_alloc_debugcheck_after(): cache `size-256': double free, or memory outside object was overwritten Call Trace: [C00000006C727850] [C00000000000ED88] .show_stack+0x68/0x1b0 (unreliable) [C00000006C7278F0] [C0000000000BA974] .__slab_error+0x38/0x50 [C00000006C727970] [C0000000000BB864] .cache_alloc_debugcheck_after+0xcc/0x1a4 [C00000006C727A10] [C0000000000BBA9C] .kmem_cache_alloc+0x160/0x190 [C00000006C727AB0] [C000000000107DF0] .load_elf_binary+0x74/0x1b40 [C00000006C727C20] [C0000000000D1414] .search_binary_handler+0x138/0x408 [C00000006C727CE0] [C0000000000FEB14] .compat_do_execve+0x1e4/0x33c [C00000006C727D90] [C000000000015198] .compat_sys_execve+0x74/0xf4 [C00000006C727E30] [C0000000000086F8] syscall_exit+0x0/0x40 c000000000432d88: redzone 1: 0xc000000000432d80, redzone 2: 0x5a2cf071. e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver
That looks strange! The loop init is pretty trivial, I can't see what could go wrong. What happens if you use an even larger number?
Are we sure this isn't just another viewing of our other pop up memory corruption bug(s)?
It certainly smells like it, the loop init path looks pretty bullet proof.
Another idea - this could be a ppc compiler bug as well, I suppose. Does the kernel work if you built it with the SLES9 compiler?
Could you please cvs update, recompile and modprobe crasher threads=8 (or more, don't be shy).
Freeing unused kernel memory: 248k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized slab error in cache_alloc_debugcheck_after(): cache `size-256': double free, or memory outside object was overwritten Call Trace: [C0000000EB3A76C0] [C00000000000ED88] .show_stack+0x68/0x1b0 (unreliable) [C0000000EB3A7760] [C0000000000BA974] .__slab_error+0x38/0x50 [C0000000EB3A77E0] [C0000000000BB864] .cache_alloc_debugcheck_after+0xcc/0x1a4 [C0000000EB3A7880] [C0000000000BC7CC] .__kmalloc+0x1ac/0x1dc [C0000000EB3A7920] [C00000000009F200] .kzalloc+0x18/0x4c [C0000000EB3A79B0] [C0000000001C4710] .kobject_uevent+0x120/0x4dc [C0000000EB3A7AA0] [C0000000001C4294] .kobject_register+0x68/0x8c [C0000000EB3A7B30] [C00000000024D750] .bus_add_driver+0x70/0x18c [C0000000EB3A7BD0] [C00000000024ECAC] .driver_register+0xd0/0xf4 [C0000000EB3A7C80] [D00000000007A484] .scsi_register_driver+0x20/0x38 [scsi_mod] [C0000000EB3A7D00] [D000000000033570] .init_sd+0x10c/0x12c [sd_mod] [C0000000EB3A7D90] [C0000000000828B8] .sys_init_module+0x1e4/0x448 [C0000000EB3A7E30] [C0000000000086F8] syscall_exit+0x0/0x40 c000000000432d88: redzone 1: 0xc000000000432d80, redzone 2: 0x5a2cf071. Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:c0:01.0: Found IOA with IRQ: 99 happens even before loop is loaded.
same with sles9 gcc.
Olaf, I think this was discussed (and Brian posted a patch) on linux-scsi very recently.
what patch, the sg ones?
Hmm can't seem to locate it now, it was a scsi host / ipr possible memory corruption bug iirc.
since I see it also on a mac, we can rule out ipr.
see it even on ide only, js20: (none):~ # init 6 INIT: Switching to runlevel: 6 INIT: Sending processes the TERM signal Slab corruption: start=c00000001db0d178, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [<c00000000021f204>](.release_mem+0x274/0x2a0) 120: ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b Prev obj: start=c00000001db0c960, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<c0000000000bccc4>](.alloc_arraycache+0x40/0x94) 000: 00 00 00 00 00 00 00 80 ba ad f0 0d 00 00 00 00 010: 00 00 00 00 de ad 4e ad ff ff ff ff 00 00 00 00 Boot logging started on /dev/hvc0(/dev/console) at Wed Jan 25 14:20:00 2006 Master Resource Control: previous runlevel: *, switching to runlevel: 6 Running /etc/init.d/halt.local done Sending all processes the TERM signal... Sending all processes the KILL signal... done Turning off quota done
Olaf, did you try Chris' slab checker?
did not find anything so far, on cranberry. slab debug was enable since a while, no idea why it starts to be a problem now. maybe it just did not trigger. has scsi (or whatever) some alignment requirements?
I don't think the alignment has much to do with it. It could be a missing init of some structure, sometimes that works until you turn on slab debugging which fills the memory allocated with a poison.
does it do anything? Freeing unused kernel memory: 248k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading crasher crasher: module not supported by Novell, setting U taint flag. crasher module (4 threads). Testing sizes: 32 64 128 192 256 1024 2048 4096 Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:d0:01.0: Found IOA with IRQ: 325 ipr 0000:d0:01.0: Starting IOA initialization sequence. ipr 0000:d0:01.0: Adapter firmware version: 05100030 ipr 0000:d0:01.0: IOA initialized. scsi0 : IBM 573E Storage Adapter Vendor: IBM H0 Model: HUS151473VL3800 Rev: S37A Type: Direct-Access ANSI SCSI revision: 04 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sd 0:1:5:0: Attached scsi disk sda Vendor: IBM H0 Model: HUS151473VL3800 Rev: S37A Type: Direct-Access ANSI SCSI revision: 04 SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB) sdb: Write Protect is off SCSI device sdb: drive cache: write through w/ FUA SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB) sdb: Write Protect is off SCSI device sdb: drive cache: write through w/ FUA sdb:<5>sd 0:1:5:0: Attached scsi generic sg0 type 0 sdb1 sdb2 sdb3 sd 0:1:8:0: Attached scsi disk sdb sd 0:1:8:0: Attached scsi generic sg1 type 0 Vendor: IBM Model: VSBPD2E1 U4SCSI Rev: 4805 Type: Enclosure ANSI SCSI revision: 02 0:1:15:0: Attached scsi generic sg2 type 13 Slab corruption: start=c000000000432000, len=4096 d80: c0 00 00 00 00 43 2d 80 c0 00 00 00 00 43 2d 80 scsi: unknown device type 31 Vendor: IBM Model: 573E001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 0:255:255:255: Attached scsi generic sg3 type 31 ...
Do you mean aside from tainting your kernel? Loading crasher crasher: module not supported by Novell, setting U taint flag. crasher module (4 threads). Testing sizes: 32 64 128 192 256 1024 2048 4096 It runs in the background in a loop: while(1): allocate randomly sized buffers from the list above set a pattern in the buffers sleep for a random amount of time check the pattern free the buffers Basically it increases the slab cache load and complains if it finds errors. It is best used in conjunction with long running tests where we find evidence of memory corruption after hours or days of use. Usually it can find trigger the corruption faster and help track things down.
doesnt happen with mainline kernel, only with our cvs. only 318 patches to go.
fails with only 159 patches applied.
and even with only 40 applied
and even with only 20 applied
20 patches applied isn't very much at all. This starts to smell like a difference in configuration options.
today it doesnt reproduce, even after a few reboots. will keep looking at this.
reproduced again with 20 patches applied. .config was always the same.
fresh build with only 10 patches, now I suspect mainline is broken as well: Freeing unused kernel memory: 260k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading loop loop: loaded (max 255 devices) Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:c0:01.0: Found IOA with IRQ: 99 ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:3:0: Attached scsi disk sda Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 Unable to handle kernel paging request for data at address 0x00000004 Faulting instruction address: 0xc0000000001dcc98 cpu 0x0: Vector: 300 (Data Access) at [c0000000ebcd37e0] pc: c0000000001dcc98: ._raw_spin_lock+0x28/0x17c lr: c000000000388b40: ._spin_lock+0x10/0x24 sp: c0000000ebcd3a60 msr: 8000000000009032 dar: 4 dsisr: 40000000 current = 0xc0000000ebcc1000 paca = 0xc0000000004a6e00 pid = 26, comm = events/0 enter ? for help 0:mon> t [c0000000ebcd3af0] c000000000388b40 ._spin_lock+0x10/0x24 [c0000000ebcd3b70] c000000000385380 .klist_del+0x28/0x58 [c0000000ebcd3c00] c000000000262bb0 .device_del+0x50/0x120 [c0000000ebcd3ca0] d00000000007ac18 .scsi_target_reap_work+0xe0/0x12c [scsi_mod] [c0000000ebcd3d30] c000000000077bdc .run_workqueue+0x108/0x19c [c0000000ebcd3dd0] c000000000077dc0 .worker_thread+0x150/0x1c0 [c0000000ebcd3ed0] c00000000007d72c .kthread+0x140/0x190 [c0000000ebcd3f90] c000000000025d1c .kernel_thread+0x4c/0x68 0:mon>
klist_del was called with a klist_node* which was all zeros. http://marc.theaimsgroup.com/?l=linux-scsi&m=113864009506400&w=2 no idea if there is a race in scsi or sysfs. I just reboot in a loop right now, and the bug happens rarely.
last bug was also reported on x86_64. http://lkml.org/lkml/2006/01/30/164 no idea if they are related.
a typical value looks like this (if I hit the initial slab thing): Freeing unused kernel memory: 260k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading loop Slab corruption: start=c000000000455000, len=4096 d80: c0 00 00 00 00 45 5d 80 c0 00 00 00 00 45 5d 80 loop: loaded (max 255 devices) Loading ipr c000000000454f20 d per_cpu__watchdog_task c000000000454f28 D per_cpu__nr_pagecache_local c000000000454f30 d per_cpu__page_states c000000000455088 d per_cpu__ratelimits.2 c000000000455090 d per_cpu__committed_space c000000000455098 d per_cpu__lru_add_active_pvecs c000000000455118 d per_cpu__lru_add_pvecs c000000000455198 d per_cpu__reap_work c0000000004551f8 d per_cpu__bh_lrus c000000000455238 d per_cpu__bh_accounting c000000000455240 d per_cpu__fdtable_defer_list c0000000004552f0 d per_cpu__blk_cpu_done c000000000455300 D per_cpu__radix_tree_preloads c000000000455368 d per_cpu__trickle_count c00000000045536c d per_cpu__proc_event_counts c000000000455370 d per_cpu__loopback_stats c000000000455480 d per_cpu__sockets_in_use c000000000455500 D per_cpu__netdev_rx_stat c000000000455580 D per_cpu__softnet_data c000000000455b00 d per_cpu__net_rand_state c000000000455b80 d per_cpu__flow_flush_tasklets c000000000455c00 d per_cpu__flow_hash_info c000000000455c80 d per_cpu__flow_tables c000000000455d00 d per_cpu__rt_cache_stat c000000000455d40 d per_cpu____icmp_socket c000000000455d48 D __per_cpu_end
my small slab corruption could be caused by the per_cpu optimizations. so we have now 3 bugs here. -the bogus struct device, I hit that alot even without the per_cpu patch -the 256 slab thing, I did not see it so far with git-bisect -the slab corruption without details, pointer is in the per_cpu__* range git-whatchanged 61943c5015acae42fe355bf033bbee8d63691fe3..621009f4baf24581211dbb7b0816e01ed9fa8496 will look at #3 once #1 is analyzed.
I'm trying without this one now: From: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: [PATCH] powerpc/64: per cpu data optimisations X-Git-Commit: 7a0268fa1a3613f2c526a9b3058701b277f6abe1
I did 117 reboots without the patch above and hit none of the 3 bugs mentioned above. will let it run overnight.
current kernel cvs + the percpu patch reverted, 460 reboots without trouble.
yeah, it starts misbehaving (now with entirely different symptoms) right after the first boot. I will revert the patch now and ask Anton for advice. now looking for the bogus struct device.
Olaf, I think I know what is going on. With that patch applied, we cannot use NR_CPUS for init loops anymore. I'm surprised Anton didn't check for more of these, there seems to me many of them. Since you seem to hit the problem with SCSI and io more than others, can you just try the patch I'm attaching below?
Created attachment 65982 [details] Only init online CPUs
Jens: Nice catch. While I did audit the ppc64 users of per cpu data, I unfortunately did not go through the entire tree. I thought x86-64 had converted to per cpu data for possible cpus only, and if so Im surprised it didnt show up there too. It does highlight the fact that we should poison the per cpu data pointers for invalid cpus so we oops straight away.
Anton, that was my assumption as well (that x86-64 had done the same), strange why we don't see it there. Poisoning definitely sounds like a good idea! Olaf, can you try with the patch I attached (and make sure to remove the revert of the ppc percpu data patch as well, of course)?
will try it asap. looks like the the beta3 kernel needs either one to make it usable at all. argh. have to double check. get bad swap file entry at least with cvs.
yeah, beta3 is bad. try again player 1.
Olaf, I don't follow - what did you test?
Jens, I will test this patch on top of Linus git3 shortly. Its just that beta3 as is will hit the bug more often, which makes it unusable. (just to document it here.)
Created attachment 66026 [details] Poison invalid cpus Ensure that per cpu access to !possible cpus causes a fault instead of silent corruption.
Created attachment 66027 [details] Fix percpu init of invalid cpus The patch to poison invalid cpus caught some more problems. Here is a patch to fix them.
Thanks Anton, the SCSI patch needs to use for_each_cpu() as well to be safe (instead of for_each_online_cpu(). Will you push this to Linus asap? I'll send the one I have now.
will start testing now.
with just the poison patch on top of 2616rc1git3, I got this: [boot]0015 Setup Done Unable to handle kernel paging request for data at address 0xaeeeeeeeef33772e Faulting instruction address: 0xc000000000429418 cpu 0x0: Vector: 300 (Data Access) at [c000000000453bc0] pc: c000000000429418: .sched_init+0x5c/0x184 lr: c00000000040f550: .start_kernel+0x58/0x288 sp: c000000000453e40 msr: 8000000000001032 dar: aeeeeeeeef33772e dsisr: 42000000 current = 0xc00000000049e250 paca = 0xc00000000049ee00 pid = 0, comm = swapper enter ? for help 0:mon> will apply the other ones now.
now I get this: Freeing unused kernel memory: 256k freed Starting udevd Creating devices Loading sd_mod Unable to handle kernel paging request for data at address 0xaeeeeeeeef33cc6e Faulting instruction address: 0xd00000000007dc28 cpu 0x1: Vector: 300 (Data Access) at [c0000000eb3eba90] pc: d00000000007dc28: .init_scsi+0xb8/0x140 [scsi_mod] lr: d00000000007dbec: .init_scsi+0x7c/0x140 [scsi_mod] sp: c0000000eb3ebd10 msr: 8000000000009032 dar: aeeeeeeeef33cc6e dsisr: 42000000 current = 0xc0000000eb90c000 paca = 0xc00000000049f000 pid = 1035, comm = modprobe enter ? for help 1:mon>
hmm, is there a race? adding a printk: +++ linux-2.6.16-rc1-git3/drivers/scsi/scsi.c @@ -1245,8 +1245,10 @@ static int __init init_scsi(void) if (error) goto cleanup_sysctl; - for_each_online_cpu(i) + for_each_online_cpu(i) { + printk(KERN_ERR "cpu %d\n", i); INIT_LIST_HEAD(&per_cpu(scsi_done_q, i)); + } devfs_mk_dir("scsi"); printk(KERN_NOTICE "SCSI subsystem initialized\n"); Freeing unused kernel memory: 256k freed Starting udevd Creating devices Loading sd_mod cpu 0 cpu 1 cpu 2 cpu 3 cpu 4 cpu 5 cpu 6 cpu 7 SCSI subsystem initialized Loading loop loop: loaded (max 255 devices)
maybe I forgot mkinitrd or something, at least its working now. who will send it on?
Just as a note - the last patch breaks the i386 Xen kernel.
Jan, which part of it?
patch is in 2.6.16-rc2-git2 the bogus struct device crash in raw_spinlock is tracked in bug #148133.