Bugzilla – Bug 957816
system lags a lot while copying to slow devices
Last modified: 2015-12-14 14:32:27 UTC
Created attachment 658269 [details] dmesg SysRq-w dump while soft freeze occurs While copying large files to slow devices (DVD-RAM, USB 2.0 Flash discs and USB 2.0 attached hard discs), desktop experiences soft freezes. Depending on the device and files written, the freeze could last from fractions of seconds to many seconds. It affects either particular application (e. g. all terminals from on factory if rsync runs in terminal), all applications, or even freezes mouse movement. It was previously reported as bug 133718, but it still persists in some form. Attached SysRq-w dump shows such a short freeze (XFCE Terminal frozen for several seconds).
Note that this problem is strongly masked on FAT based drivers by a default "flush" option. To reproduce on FAT formatted flashes, you need to mount it without "flush".
So the reason for the stalls seems to be that page faults end up in direct reclaim waiting for IO. What is your /proc/sys/vm/dirty_ratio? How much memory does your machine have? Could you sample /proc/meminfo say every second while the copy is running and attach it here when the stall happens? Mel any other idea what to look at?
(In reply to Jan Kara from comment #2) > So the reason for the stalls seems to be that page faults end up in direct > reclaim waiting for IO. What is your /proc/sys/vm/dirty_ratio? How much > memory does your machine have? > > Could you sample /proc/meminfo say every second while the copy is running > and attach it here when the stall happens? Mel any other idea what to look > at? Altering dirty_ratio is certainly one option although it's possible it'll simply defer the problem. The wait_iff_congested is only meant to trigger when dirty or under-writeback pages are reaching the end of the LRU multiple times and the device is congested. If it's a case that most or all dirty files are really backed by this device then the stall will trigger. There is some anecdotal evidence upstream that this problem is worse on recent kernels than it used to be. I do not recall the specifics unfortunately but Michal was working in that area so it should be fresh in his mind. Michal? An extreme workaround would be to use cgcreate and cgset to create a memory-limited cgroup and run cp within that cgroup with cgexec. That would prevent too much memory being dirtied by the slow storage but it's not desirable as a general solution.
cat /proc/sys/vm/dirty_ratio 20 This is the default Tumbleweed kernel 4.3.0-2-default with the default setup. The machine has 8 GB RAM and 4 GB swap. This is a situation when no copy is running. free total used free shared buff/cache available Mem: 8100000 3444452 176580 185048 4478968 4318020 Swap: 4184768 994068 3190700 The Flash has 128GB and it is USB 2.0. I'll prepare reproducer and post the meminfo log.