Bug 133718 - memory allocation hangs during DVD-RAM writing
Summary: memory allocation hangs during DVD-RAM writing
Status: RESOLVED FIXED
Alias: None
Product: openSUSE 11.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: Other Other
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Jan Kara
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-14 18:11 UTC by Stanislav Brabec
Modified: 2015-12-03 15:59 UTC (History)
5 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hang-swapoff.log.bz2 (20.32 KB, application/x-bzip2)
2008-06-24 15:09 UTC, Stanislav Brabec
Details
hang-logs.tar.bz2 (29.42 KB, application/x-bzip2)
2008-06-24 15:46 UTC, Stanislav Brabec
Details
convert iprune_mutex to rwsem (1.98 KB, patch)
2009-07-13 16:53 UTC, Nick Piggin
Details | Diff
Patch making UDF avoid IO in udf_clear_inode (6.41 KB, patch)
2009-12-03 13:11 UTC, Jan Kara
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stanislav Brabec 2005-11-14 18:11:19 UTC
During DVD-RAM writing, GNOME/GTK+ applications randomly hangs for minutes and then continue in work.

Notes:
- No GNOME application is actually using the DVD medium.
- Command line in virtual console or ssh works without any problems and hangs.

How to reproduce:
rsync -avH /home /media/dvdram
Then try to use GNOME desktop.

Possible sources of the problem:
- gamin (not probable, note that no application except rsync uses DVD-RAM).
- slow interaction with HAL/D-BUS (more probable).
With older hal/dbus version, I have even experienced half-hour delay before appearing CD icon, if CD was inserted to another drive durin DVD-RAM writing. I have not yet tried to reproduce this particular problem, but it is easy to check.
Comment 1 Federico Mena Quintero 2005-11-14 18:40:16 UTC
This is going to be tricky to track down.

What does "top" show during the writing process?  Is somethin gobbling up the CPU?

If you use a multiload applet, does it show a lot of CPU user time, or kernel time?

Does a KDE session have the same problem?

When a particular application hangs, could you attach to it with gdb and get a stack trace?
Comment 2 Stanislav Brabec 2005-11-14 21:43:03 UTC
I have tried to reproduce the problem. I have 768MB of RAM and about 1GB of swap. When the amount of free RAM (non-buffers, non-allocated) decreases to zero (swap is not yet used), system starts to hang (but I am still able to login via ssh and run a command). Applet shows a lot of kernel time, nearly no CPU user time.

It is never clean, which application will hang, so it's hard to use strace.

To "start DVD writing and insert CD to another drive" - it seems to work in 10.0 and I don't see delays any more.

There is a list of running tasks during hang of abiword:

Tasks: 141 total,   2 running, 139 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  3.0% sy,  0.0% ni,  0.0% id, 97.0% wa,  0.0% hi,  0.0% si
Mem:    775832k total,   768868k used,     6964k free,     2084k buffers
Swap:   498004k total,     3704k used,   494300k free,   201260k cached

17366 sb        16   0  141m  63m  16m D  1.9  8.4   0:05.42 abiword 
14572 root      15   0  5648 2960 1020 D  0.3  0.4   0:07.53 rsync 
17488 root      16   0  2112 1032  764 R  0.3  0.1   0:00.04 top 
  377 root      15   0     0    0    0 D  0.0  0.0   0:01.21 kswapd0 
 6070 root      16   0  1796  720  632 D  0.0  0.1   0:00.25 hald-addon-stor 
 6090 root      16   0  1792  716  632 D  0.0  0.1   0:00.76 hald-addon-stor 
28513 root      15   0     0    0    0 D  0.0  0.0   0:00.17 pdflush 
12829 sb        16   0  107m  30m  19m D  0.0  4.0   0:03.62 mono 
14178 root      16   0  4616 2072 1572 R  0.0  0.3   0:00.14 mc

My blind guess says, that application is waiting for swapping out some data to allocate required amount of memory. I will continue with testing with KDE or command line applications requiring a lot of memory.
Comment 3 Federico Mena Quintero 2005-11-14 22:47:10 UTC
The "Mem" line doesn't show many buffers, so there's probably some program gobbling up a lot of RAM.  Can you run top and hit M to sort by memory usage to see if this is the case?

Also, those processes don't show that a lot of RAM is being used.  It's probably some sleeping process that is the culprit.
Comment 4 Stanislav Brabec 2005-11-15 10:37:49 UTC
I am reproduced this problem in command line (YaST runlevel). ssh, ls and other tools are not affected probably because the amount of allocated memory is not so large.

System was just writing a lot of big files to DVD-RAM and memory usage was:
Buffers: 689MB
Memory: 71MB
Free: 8MB

It seems, that it is not related with hal/dbus, but with memory management.

The reproduce scenario seems to be:

- Start a huge write to DVD-RAM (much bigger than available RAM).
- Wait until write buffers fill most of the memory.
- Start many programs to allocate whole RAM.

System now cannot simply free these buffers. But system response is much worse than response of the system with RAM-write_buffers of memory. System is not swapping-out some parts of memory or write buffers, but allocating application hangs,
Comment 6 Andrea Arcangeli 2005-11-16 09:15:37 UTC
this seems just a performance problem right?

Can you confirm that echo 1 >/proc/sys/vm/dirty_ratio fixes it?

I believe this could be fixed by my per-task-predictive-write-throttling recent research
Comment 7 Andrea Arcangeli 2005-11-16 09:18:55 UTC
see the slides I prepared for the the lkconf in tokyo, they cover the per-task-predictive-write-throttling heuristic starting on page 25 (the suse labs conf 2005 slides cover it too)

http://w3d.suse.de/Dev/Labs/Pubs/aa/tokyo-2005.pdf

Comment 8 Stanislav Brabec 2005-11-16 14:17:37 UTC
Yes. This is a problem of performance. The system still has 1GB of completely unused swap on the fast hard disc, but it is waiting for flushing some buffers to slow DVD-RAM.

I will test dirty_ratio.
Comment 9 Stanislav Brabec 2005-11-24 20:35:10 UTC
No, "echo 1 >/proc/sys/vm/dirty_ratio" did not fix it.

How to reproduce the worse freeze (hang for 5 minutes, than continue):

- Run many memory-hungry applications (gnumeric, openoffice, evolution, web browser).

- echo 1 >/proc/sys/vm/dirty_ratio

- When memory usage is above 97% (swap use is still 0%), open a really big image 100MPix or so, for example in eog.

Actual result:
System hangs for 5 minutes.
When the system is back, you see 78% memory usage, 22% of it in buffers, but still 0% of swap.
Comment 10 Andrea Arcangeli 2005-12-14 13:16:29 UTC
The interesting thing to monitor is the size of the dirty cache.

With dirty_ratio = 1 the dirty cache should always be less than 1% of the freeable memory in the system. So it's weird that you get hangs even with so little in dirty cache if your application isn't writing too.

Is your application writing too, or just reading. I wouldn't expect an image viewer to write to disk, it should only allocate memory right?

My patch should help to avoid blocking in writes, but again you shouldn't be writing frequently. Certainly it can help though, some little write always happens.

Perhaps the major problem is that dvd writing is slow, and the slower it writes, the less effect my patch will have. If we would know the max bandwidth of each device we could do a better guess of how much we're under a write-hog.
Comment 11 Stanislav Brabec 2005-12-14 16:42:22 UTC
1% of 768MB is too few to display large image.

eog is sending a message to gconfd (written to small file ~/.gconf/apps/eog/window/%gconf.xml) and probably creates a little file in /tmp/orbit-* or so.

Current write speed is reported by CD-ROM devices and for DVD-RAM is typically 3-5MB/s, for Mt. Rainier CD-RW 500kB/s-2MB/s (both does not include seek times).

The thing, what I don't understand is: Why it does not swap something out in this situation? It has a big empty swap.
Comment 12 Stanislav Brabec 2006-04-26 17:06:39 UTC
It seems that the similar behavior still appears in SuSE Linux 10.1.

How to reproduce:

rsync -av /home/data_to_backup /media/BackupDVD1/ # BackupDVD1 is a DVD-RAM

Than continue with a different tasks. Computer responds well until a moment, when memory usage raises to 100%. Then it hangs for several minutes.

After killing the rsync proceess, hang disappeared in a few seconds.
Comment 14 Greg Kroah-Hartman 2008-02-04 20:46:26 UTC
Is this still relevant against the 10.3 release?

Or SLE10 SP2?
Comment 15 Stanislav Brabec 2008-02-05 09:15:24 UTC
It seems to be much better in 10.3 and system is well usable while writing to DVD-RAM.

According to Jan Kára, it is a principial problem - kernel does not know, how long will take flushing of this buffer. It seems, that in 10.3 kernel does not tend to fill all avaiable free memory with write buffers, but instead of it, it writes data to the disc in smaller batches. Hangs are still present, but they don't take minutes, but only few seconds.

I did not test SLED SP2.
Comment 16 Stanislav Brabec 2008-03-17 16:16:31 UTC
I did not test it properly.

When I am trying to write several gigabytes of /home to DVD-RAM by rsync, it still happens, even in 11.0 Alpha2 plus.

Start of application, which needs memory plus disk activity still hangs system for several minutes, including mouse pointer.
Comment 17 Stanislav Brabec 2008-03-17 17:55:36 UTC
In factory, I have actually seen following strange behavior:

write something
sync
rm -rf BackupDVD1/dir
sync

sync finisned, but LED of the drive still blinks and system hangs

Subsequent sync finishes immediately, bug drive still does something.

fuser -mv /media/BackupDVD1 shows nothing as well as fuser -v /dev/sr0.
Comment 19 Nick Piggin 2008-06-16 04:13:07 UTC
I'm sure this is due to some global write throttling problem, or write fairness problem (depending on what exactly userspace is trying to do when it gets blocked). Until very recent mainline kernels, we do a bad job at managing writeout to multiple devices especially if one is much slower than another. This probably can't be fixed well in SLES10.

Memory allocation probably should not be getting blocked, however writes likely will be.

Can you try getting a task trace of the blocked tasks while the hang is in progress? (sysrq+W or echo W > /proc/sysrq-trigger), or if you kernel doesn't support the sysrq+W, then get a trace of all tasks with sysrq+T.

Comment 20 Stanislav Brabec 2008-06-24 15:09:15 UTC
Created attachment 224038 [details]
hang-swapoff.log.bz2

Log created during several minutes long hang after calling swapoff on openSUSE 11.0 final.

I will try to reproduce with memory stress caused by a normal system activity.
Comment 21 Stanislav Brabec 2008-06-24 15:46:55 UTC
Created attachment 224046 [details]
hang-logs.tar.bz2

I just reproduced the hang in normal conditions (writing to DVD-RAM and compilation with >1G required at once).

First I have finished swapoff from the previous dump and reproduced the hang without swap. Then I have enabled swap and the same problem occurred in few seconds again.
Comment 22 Tristan Hoffmann 2009-03-20 15:38:07 UTC
openSUSE 10.1 is not supported anymore. Please reopen this bug if you still see this issue with any supported version (10.3 or later).
Comment 23 Stanislav Brabec 2009-03-20 15:42:17 UTC
Last verified on 11.0. Reopening.
Comment 24 Jan Kara 2009-05-18 12:02:12 UTC
Looking at the traces: As Stanislav suspected, the hanging processes are hanging on memory allocation. E.g.:
X             D ffff81009d47c400     0 17285  14831
 ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
 ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
Call Trace:
 [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9
 [<ffffffff80447e1a>] mutex_lock+0x1e/0x22
 [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213
 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
 [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6
 [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d
 [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf
 [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73
 [<ffffffff802ad949>] do_select+0x308/0x506
 [<ffffffff802adced>] core_sys_select+0x1a6/0x254
 [<ffffffff802ae0b7>] sys_select+0xb5/0x157

or

gnome-panel   D ffff8100b3db03c0     0 17480  17335
 ffff8100844858a8 0000000000000086 0000000000000000 ffff81000000e288
 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff8100b3db03c0
 ffff8100bccbc140 ffff8100b3db0700 0000000084485988 ffff8100b3db0700
Call Trace:
 [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9
 [<ffffffff80447e1a>] mutex_lock+0x1e/0x22
 [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213
 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
 [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6
 [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d
 [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf
 [<ffffffff80433d54>] unix_poll+0x20/0xa1
 [<ffffffff803c7c2b>] sock_poll+0x18/0x1a
 [<ffffffff802ad1b8>] do_sys_poll+0x1bc/0x370
 [<ffffffff802ad598>] sys_poll+0x39/0x85 

Now, one of the processes is actually cleaning up inode cache:

gnome-screens D ffff810006d1d598     0 20686      1
 ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
 ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
 ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
Call Trace:
 [<ffffffff804477f3>] io_schedule+0x63/0xa5
 [<ffffffff802c2587>] sync_buffer+0x3b/0x3f
 [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79
 [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21
 [<ffffffff802c442a>] __bread+0x70/0x86
 [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a
 [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c
 [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b
 [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394
 [<ffffffff802bd205>] write_inode_now+0x7d/0xc4
 [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53
 [<ffffffff802b39ae>] clear_inode+0xc2/0x11b
 [<ffffffff802b3ab1>] dispose_list+0x5b/0x102
 [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213
 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
 [<ffffffff802951fa>] alloc_page_vma+0x176/0x189
 [<ffffffff802822d8>] __do_fault+0x10c/0x417
 [<ffffffff80284232>] handle_mm_fault+0x466/0x940
 [<ffffffff8044b922>] do_page_fault+0x676/0xabf

There are actually two other processes trying to write something to the DVD media - rsync and pdflush. So we have three processes working with the DVD and the result is that the drive is probably seeking as mad and not much IO is really done... As Stanislav writes, the speed of the drive is a few MB/s in optimal conditions. If we are seeking, I can imagine the drive does something like a few KB or tens of KB per second.

The question is what we can do about this (at least long term). Probably breaking iprune mutex to per-fs mutexes could help in this particular case. What do you think Nick?
Comment 25 Nick Piggin 2009-07-13 16:31:48 UTC
Hi Jan and Stanislav, thanks for reviving this. I must have given up on it last year... and it dropped off my radar.

Jan that is a good idea, but the icache pruning path is probably just going to take mutexes from lots of filesystems as it scans inodes. So maybe we could even just break from the icache shrinking path if we have iprune_mutex contention? The caller in vmscan should account for this and try to scan a bit more later.
Comment 26 Nick Piggin 2009-07-13 16:39:11 UTC
Actaully, we could even just turn it into an rwsem, which then will allow inode reclaim to happen concurrently if one reclaimer is stuck in IO as the case here.

Now the problem is that he may just end up just getting stuck in inode reclaim! In that case, then perhaps UDF should not write back the inode here in this path (well, it seems like it needs to write back here, but maybe inode reclaim should avoid such inodes?)
Comment 27 Nick Piggin 2009-07-13 16:53:14 UTC
Created attachment 304763 [details]
convert iprune_mutex to rwsem
Comment 28 Jan Kara 2009-07-13 17:29:07 UTC
I've forgotten about the bug as well :).

Your patch looks good. I'm only afraid it might unveil some bugs in filesystems because with your patch cleaning of filesystem inodes can happen from several threads in parallel. But those can be probably fixed.

Also your patch does something a bit different from what I had in mind - my idea was that if some filesystem gets congested, then we just skip cleansing inodes there and hopefully make progress somewhere else (so something like you wrote in Comment #25). That would IMHO also make some sence. As far as UDF is concerned, I'll probably move it from doing any IO from udf_clear_inode anyway because that turned out to be a bad idea for several reasons but there are other filesystems which do non-trivial stuff in their clear_inode methods (e.g. cluster filesystems have to do some network communication before they can release the inode) and they are going to have similar problems as UDF if the network gets slow...
Comment 29 Nick Piggin 2009-07-13 17:46:44 UTC
Bugs -- yes it could happen, good point, although inode_lock should cover most of those (at least, no filesystem is supposed to rely on iprune_mutex, but I get your point).

OK well we certainly could skip cleaning them. So per-fs locks as you suggest, then also doing trylock on them... I guess one problem with that is it still can result in one process stalling (well, one for each slow filesystem).

I think the best approach is to really try avoiding inode reclaim if it is going to be slow (or some work could be pushed to a deferred context). In the simplest case, simply keep inodes off the reclaimabe list if more work is required, otherwise in cases like cluster filesystems, maybe moving the work queued into a kernel thread is better.

The problem is that we don't want to apply too much penalty to the process in reclaim if they did not cause the work in the first place.

If you can change UDF to avoid writeout here, it would be nice from the memory reclaim standpoint. I think it should be a good idea for desktop users as well, even if we still have some problems with cluster filesystems and such.
Comment 31 Jan Kara 2009-08-06 21:33:18 UTC
Ok, taking the bug for the UDF part. I already have a part of the patch but it's a bit more complicated than I originally thought.
Comment 32 Jan Kara 2009-12-03 13:11:58 UTC
Created attachment 330750 [details]
Patch making UDF avoid IO in udf_clear_inode

This patch should resolve the issue in UDF. I'll let it get some upstream testing and then we can merge it to some of our products if there is demand (openSUSE 11.2?).
Comment 33 Stanislav Brabec 2009-12-03 13:50:00 UTC
I guess that UDF write access is not widely used, but if you add it to 11.2, I can provide some testing.
Comment 34 Jan Kara 2010-01-13 11:40:52 UTC
Ok, upstream (2.6.33-rc1) has all the improvements. I'm also working on improving UDF write pattern which should make writing faster but I think it's out of scope of this bug. So hopefully 11.3 should be considerably better in this regard.

I guess I won't push it to 11.2 if there's no real demand... Thanks for all your input. Closing the bug.
Comment 35 Stanislav Brabec 2013-12-10 18:52:03 UTC
I have seen a very similar issue now in openSUSE 13.1 while copying to the external ext4 formatted USB hard drive (3.0, but computed has only USB 2.0).

But it is much harder to trigger:

1. Run a large rsync from a fast device to a slower one with nice and ionice -c 3.
2. Run desktop environment in parallel.
3. Watch the free memory.

I see that the desktop works most of the time seamlessly - no delays, no hangs. I have a dual core system with 4GB RAM. Watching the free output, system tries to keep about 200MB free RAM and about 1GB of buffers (on system with 8GB RAM and 8 cores it is 2.5GB).

But there is a special condition, where it fails. It following two conditions are met at once:

4. Large files are copied (~2MB per file and more).

5. Desktop requests larger amount of memory at once (e. g. browser is opened with a complicated web page).

In that moment the system starts to use swap and hang is experienced. (It seems, that the slower system response happens on a system with 8GB RAM even without using swap.)
Comment 36 Jan Kara 2013-12-12 13:04:51 UTC
Yeah, that's kind of expected given the load and the configuration - when userspace creates some memory pressure we have to get free pages somewhere. Since the USB storage is slow in cleaning dirty pages, page reclaim likely finds itself under pressure by dirty pages remaining for a long time at the end of the LRU list and either decides to swap (if possible) or synchronously waits for IO to complete before giving back new pages. And that causes hiccups.

Actually for 4GB or 8GB systems current dirty_ratio setting is already a bit too high for a standard desktop. Can you try setting /proc/sys/vm/dirty_bytes to "250000000" (~250 MB) and /proc/sys/vm/dirty_background_bytes to "125000000" (~125 MB). Does it help with hangs? Can you observe some adverse effects of these changes when using the desktop?

I'm asking because we are thinking about a similar change upstream and you are my Guinea pig :)
Comment 37 Stanislav Brabec 2013-12-12 18:52:25 UTC
I will test it.

My current settings (openSUSE 13.1):

# cat /proc/sys/vm/dirty_ratio
20
# cat /proc/sys/vm/dirty_bytes
0
# cat /proc/sys/vm/dirty_background_bytes
0

My work-around preventing hangs was:

while : ; do echo 3 >/proc/sys/vm/drop_caches ; sleep 30 ; done
Comment 38 Stanislav Brabec 2015-12-02 16:14:41 UTC
I just experienced the problem in Tumbleweed, copying data (a lot of photos) to a 128GB USB 2.0 flash using NTFS. Kernel 4.3.0-2-default.
Comment 39 Jan Kara 2015-12-03 09:36:29 UTC
Well, without any traces it is hard to guess where the problem is (and whether it is in similar place as previously). There are lots of different causes which result in "hangs" in userspace when IO is happening. Also USB-2.0 device has different properties than DVD-RAM.

So can you get sysrq-w output during the hang, open a new bug and attach it there? Thanks!
Comment 40 Stanislav Brabec 2015-12-03 15:59:17 UTC
New bug 957816 reported.