Bug 129926 - 10.0 kernel regularly oopses
Summary: 10.0 kernel regularly oopses
Status: RESOLVED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: unspecified
Hardware: Other Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Jens Axboe
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-21 08:40 UTC by Dirk Mueller
Modified: 2005-11-04 04:28 UTC (History)
0 users

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Don't clear ->elevator_data on queue exit (458 bytes, patch)
2005-10-31 12:53 UTC, Jens Axboe
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Mueller 2005-10-21 08:40:48 UTC
Hi,  
 
I see this regularly on one machine:  
 
Unable to handle kernel NULL pointer dereference at virtual address 00000094 
 printing eip: 
c0123e52 
*pde = 00000000 
Oops: 0000 [#1] 
Modules linked in: ipt_LOG ipt_REJECT ipt_owner ipt_state iptable_filter 
ip_tables ip_conntrack_ftp ip_conntrack nls_utf8 eepro100 e100 mii ohci_hcd 
usbcore i2c_piix4 generic i2c_savage4 i2c_algo_bit i2c_core raid5 xor dm_mod 
parport_pc lp parport reiserfs aic7xxx scsi_transport_spi fan thermal 
processor serverworks hpt366 sg ips sd_mod scsi_mod ide_disk ide_core 
CPU:    0 
EIP:    0060:[<c0123e52>]    Not tainted VLI 
EFLAGS: 00010282   (2.6.13-15-default)  
EIP is at del_timer+0x2/0x50 
eax: 00000088   ebx: 00000000   ecx: c036ac18   edx: c0261cb0 
esi: f2ac5f60   edi: c036a7e0   ebp: c036ac40   esp: f3515ef8 
ds: 007b   es: 007b   ss: 0068 
Process bash (pid: 2228, threadinfo=f3514000 task=e2834080) 
Stack: 00000000 00000000 c0261cbf f2ac5120 c0259220 dfc96d88 c025989f f2ac5120  
       f3515f30 f3515f31 c036a975 dfc96d88 c0259989 00000004 00716663 00000000  
       00000000 00000000 c0259900 dfc96d88 c4739000 f3515fa4 c025d791 00000004  
Call Trace: 
 [<c0261cbf>] as_exit_queue+0xf/0x60 
 [<c0259220>] elevator_exit+0x10/0x30 
 [<c025989f>] elevator_switch+0x7f/0xe0 
 [<c0259989>] elv_iosched_store+0x89/0xc0 
 [<c0259900>] elv_iosched_store+0x0/0xc0 
 [<c025d791>] queue_attr_store+0x21/0x30 
 [<c01914fd>] flush_write_buffer+0x1d/0x30 
 [<c0191540>] sysfs_write_file+0x30/0x50 
 [<c0191510>] sysfs_write_file+0x0/0x50 
 [<c015936d>] vfs_write+0x8d/0x170 
 [<c01594fc>] sys_write+0x3c/0x70 
 [<c0102d1b>] sysenter_past_esp+0x54/0x79 
Code: 5e e9 03 ff ff ff 8d 76 00 8b 03 85 c0 74 ec b8 01 00 00 00 5b 5e c3 0f 
0b 3d 01 b9 2d 31 c0 eb c5 90 8d b4 26 00 00 00 00 53 53 <81> 78 0c 6e ad 87 
4b 89 c3 74 05 e8 8e fd ff ff 8b 0b 31 c0 85  
 <5>ips 0000:02:05.0: Reset Request - Flushed Cache 
ips 0000:02:05.0: Reset Request - Flushed Cache
Comment 1 Jens Axboe 2005-10-21 13:44:26 UTC
It looks like ->elevator_data == NULL in as_exit_queue()
Comment 2 Nick Piggin 2005-10-23 02:00:41 UTC
Thanks, I'll try to reproduce. What kind of block device is involved? What steps do you perform to see the oops?
Comment 3 Dirk Mueller 2005-10-24 04:52:31 UTC
its a cronjob actually. when testing, this was enough to reproduce the bug: 

find / &
while true; do echo anticipatory > /sys/block/sda/queue/scheduler; sleep 1; echo cfq > /sys/block/sda/queue/scheduler; sleep 1; done

where /dev/sda == /

oopses within seconds. 


Comment 4 Jens Axboe 2005-10-31 12:53:54 UTC
Created attachment 56002 [details]
Don't clear ->elevator_data on queue exit

This patch should fix the oops on switching away from cfq. I will commit to SL100 now. Please test.
Comment 5 Dirk Mueller 2005-11-02 14:31:46 UTC
survived one day of uptime. more than any time before. 

Comment 6 Jens Axboe 2005-11-02 14:33:56 UTC
Thanks for testing.
Comment 7 Dirk Mueller 2005-11-02 14:41:35 UTC
thanks for fixing :)
Comment 8 Nick Piggin 2005-11-04 04:28:28 UTC
Thanks for fixing, Jens.