Bug 105779 - Processes hanging in disk access
Summary: Processes hanging in disk access
Status: RESOLVED DUPLICATE of bug 106103
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 2
Hardware: PowerPC-64 All
: P5 - None : Normal
Target Milestone: ---
Assignee: Chris L Mason
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-19 09:18 UTC by Andreas Schwab
Modified: 2006-09-28 13:26 UTC (History)
1 user (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Sysrq-t (50.99 KB, text/plain)
2005-08-19 09:19 UTC, Andreas Schwab
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schwab 2005-08-19 09:18:49 UTC
After a while processes are hanging while writing to disk.  This cannot be 
reproduced with the vanilla kernel.
Comment 1 Andreas Schwab 2005-08-19 09:19:21 UTC
Created attachment 46657 [details]
Sysrq-t
Comment 2 Andreas Schwab 2005-08-19 09:35:33 UTC
See avocado.suse.de. 
Comment 3 Olaf Kirch 2005-08-19 09:42:23 UTC
Looks like a problem in the ext3 journaling code. 
Not sure if Jan is back already, he was in China this week. Chris, will 
you assign please? 
Comment 4 Olaf Hering 2005-08-20 12:47:07 UTC
this happens also with reiser. on mac.suse.de, single cpu g5. I cant switch
console anymore, mouse moves still.
Comment 5 Jan Kara 2005-08-22 13:11:46 UTC
Actually everything seems to be waiting for kjournald to finish. kjournald trace
looks like:
kjournald     D c0000000000145a8 11264  3672      1          3993  2752 (L-TLB)
Call Trace:
[c00000007e27f670] [c00000007e27f740] 0xc00000007e27f740 (unreliable)
[c00000007e27f840] [c000000000011b10] .__switch_to+0xd0/0x160
[c00000007e27f8d0] [c0000000003d44a0] .schedule+0x590/0xe90
[c00000007e27fa10] [c0000000003d4df0] .io_schedule+0x50/0x90
[c00000007e27faa0] [c0000000000cd988] .sync_buffer+0x68/0x80
[c00000007e27fb20] [c0000000003d5788] .__wait_on_bit+0xc8/0x140
[c00000007e27fbd0] [c0000000003d5ad0] .out_of_line_wait_on_bit+0x90/0xc0
[c00000007e27fcb0] [c0000000000cc9b0] .__wait_on_buffer+0x30/0x50
[c00000007e27fd30] [c00000000018d254] .journal_commit_transaction+0x6a4/0x1580
[c00000007e27fe80] [c00000000019102c] .kjournald+0x11c/0x290
[c00000007e27ff90] [c0000000000145a8] .kernel_thread+0x4c/0x68

So kjournald seems to be waiting for IO completion and that never happens... I
guess Jens should know more about this ;) Also the fact that it happens both on
ext3 and reiser seems to confirm that it's probably a problem below a filesystem
layer.
Comment 6 Jens Axboe 2005-08-22 13:48:39 UTC
Which vanilla kernel did you test (and did you use barriers there as well, we
put them on by default)?

In any case, try and reproduce with -o barrier=none (or barrier=0 for ext3) as
well, thanks!
Comment 7 Olaf Hering 2005-08-22 13:58:59 UTC
I just build all -rc* patches without suse patches. I havent tried to narrow it
further, G5 is busy with other testing.
Comment 8 Andreas Schwab 2005-08-22 15:03:29 UTC
avocado currently runs 2.6.13-rc6 with the minimal set of required patches. 
Comment 9 Andreas Schwab 2005-08-23 10:13:08 UTC
When mounting with barrier=0 the problem does not occur any more. 
Comment 10 Jens Axboe 2005-08-23 10:19:32 UTC
Andreas, can you try mounting with barriers again and do:

# echo 2 > /sys/block/sda/queue/iosched/max_depth
# echo 2 > /sys/block/sdb/queue/iosched/max_depth

and see if it still hangs?
Comment 11 Andreas Schwab 2005-08-24 10:13:26 UTC
That appears to be working as well. 
Comment 12 Jens Axboe 2005-08-24 10:47:36 UTC
Thanks for testing, I will change the max_depth default to 2 then.
Comment 13 Jens Axboe 2005-08-24 10:50:37 UTC
Marking as duplicate, no need to track two different bugs for the same issue.

*** This bug has been marked as a duplicate of 106103 ***