The pernicious USB-stick stall problem

By Jonathan Corbet
November 6, 2013

Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes. Things eventually come back, but, by then, the user may well have given up in disgust and gone for a beer or two; by the time the system is usable again, that user may well have lost interest.

This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware. One might normally expect the block I/O subsystem to be reasonably well isolated from details like the word length of the processor, but, in this case, one would be surprised.

The problem

Linus was quick to understand what was going on here. It all comes down to the problem of matching the rate at which a process creates dirty memory to the rate at which that memory can be written to the underlying storage device. If a process is allowed to dirty a large amount of memory, the kernel will find itself committed to writing a chunk of data that might take minutes to transfer to persistent storage. All that data clogs up the I/O queues, possibly delaying other operations. And, as soon as somebody calls sync(), things stop until that entire queue is written. It's a storage equivalent to the bufferbloat problem.

The developers responsible for the memory management and block I/O subsystems are not entirely unaware of this problem. To prevent it from happening, they have created a set of tweakable knobs under /proc/sys/vm to control what happens when processes create a lot of dirty pages. These knobs are:

dirty_background_ratio specifies a percentage of memory; when at least that percentage is dirty, the kernel will start writing those dirty pages back to the backing device. So, if a system has 1000 pages of memory and dirty_background_ratio is set to 10% (the default), writeback will begin when 100 pages have been dirtied.
dirty_ratio specifies the percentage at which processes that are dirtying pages are made to wait for writeback. If it is set to 20% (again, the default) on that 1000-page system, a process dirtying pages will be made to wait once the 200th page is dirtied. This mechanism will, thus, slow the dirtying of pages while the system catches up.
dirty_background_bytes works like dirty_background_ratio except that the limit is specified as an absolute number of bytes.
dirty_bytes is the equivalent of dirty_ratio except that, once again, it is specified in bytes rather than as a percentage of total memory.

Setting these limits too low can affect performance: temporary files that will be deleted immediately will end up being written to persistent storage, and smaller I/O operations can lead to lower I/O bandwidth and worse on-disk placement. Setting the limits too high, instead, can lead to the sort of overbuffering described above.

The attentive reader may well be wondering: what happens if the administrator sets both dirty_ratio and dirty_bytes, especially if the values don't agree? The way things work is that either the percentage-based or byte-based limit applies, but not both. The one that applies is simply the one that was set last; so, for example, setting dirty_background_bytes to some value will cause dirty_background_ratio to be set to zero and ignored.

Two other details are key to understanding the behavior described by Artem: (1) by default, the percentage-based policy applies, and (2) on 32-bit systems, that ratio is calculated relative to the amount of low memory — the memory directly addressable by the kernel, not the full amount of memory in the system. In almost all 32-bit systems, only the first ~900MB of memory fall into the low-memory region. So on any current system with a reasonable amount of memory, a 64-bit kernel will implement dirty_background_ratio and dirty_ratio differently than a 32-bit system will. For Artem's 16GB system, the 64-bit dirty_ratio limit would be 3.2GB; the 32-bit system, instead, sets the limit at about 180MB.

The (huge) difference between these two limits is immediately evident when writing a lot of data to a slow storage device. The lower limit does not allow anywhere near as much dirty data to accumulate before throttling the process doing the writing, with much better results for the user of the system (unless said user wanted to give up in disgust and go for beer, of course).

Workarounds and fixes

When the problem is that clearly understood, one can start to talk about solutions. Linus suggested that anybody running into this kind of problem can work around it now by setting dirty_background_bytes and dirty_bytes to reasonable values. But it is generally agreed that the default values on 64-bit systems just don't make much sense on contemporary systems. In fact, according to Linus, the percentage-based limits have outlived their usefulness in general:

The percentage notion really goes back to the days when we typically had 8-64 *megabytes* of memory So if you had a 8MB machine you wouldn't want to have more than one megabyte of dirty data, but if you were "Mr Moneybags" and could afford 64MB, you might want to have up to 8MB dirty!!

Things have changed.

Thus, he suggested, the defaults should be changed to use the byte-based limits; either that, or the percentage-based limits could be deemed to apply only to the first 1GB of memory.

Of course, it would be nicer to have smarter behavior in the kernel. The limit that applies to a slow USB device may not be appropriate for a high-speed storage array. The kernel has logic now that tries to estimate the actual writeback speeds achievable with each attached device; with that information, one could try to limit dirty pages based on the amount of time required to write them all out. But, as Mel Gorman noted, this approach is "not that trivial to implement".

Andreas Dilger argued that the whole idea of building up large amounts of dirty data before starting I/O is no longer useful. The Lustre filesystem, he said, will start I/O with 8MB or so of dirty data; he thinks that kind of policy (applied on a per-file basis) could solve a lot of problems with minimal complexity. Dave Chinner, however, sees a more complex world where that kind of policy will not work for a wide range of workloads.

Dave, instead, suggests that the kernel focus on implementing two fundamental policies: "writeback caching" (essentially how things work now) and "writethrough caching," where much lower limits apply and I/O starts sooner. Writeback would be used for most workloads, but writethrough makes sense for slow devices or sequential streaming I/O patterns. The key, of course, is enabling the kernel to figure out which policy should apply in each case without the need for user intervention. There are some obvious indications, including various fadvise() calls or high-bandwidth sequential I/O, but, doubtless, there would be details to be worked out.

In the short term, though, we're most likely to see relatively simple fixes. Linus has posted a patch limiting the percentage-based calculations to the first 1GB of memory. This kind of change could conceivably be merged for 3.13; fancier solutions, obviously, will take longer.

Index entries for this article
Kernel	Memory management/Writeback

The pernicious USB-stick stall problem

Posted Nov 7, 2013 8:04 UTC (Thu) by eru (subscriber, #2753) [Link] (2 responses)

but writethrough makes sense for slow devices

And also for all removable devices, because with them it is common you want to flush all pending writes and unmount.

The pernicious USB-stick stall problem

Posted Nov 7, 2013 13:38 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

>>but writethrough makes sense for slow devices
>
> And also for all removable devices, because with them it is common you want to flush all pending writes and unmount.

*all* removable devices?? What you describe is common for USB key, sure, but what about USB HDD?
I'm sure that some users have such an HDD plugged in all the time and they would object to the performance degradation..

The pernicious USB-stick stall problem

Posted Nov 8, 2013 3:39 UTC (Fri) by eru (subscriber, #2753) [Link]

I'm sure that some users have such an HDD plugged in all the time and they would object to the performance degradation.

I am myself one of those, finding that the easiest way to expand storage on my old home PC. But I think that is still exceptional, and could be solved by passing a parameter when mounting (in fstab? I have those drives mounted the traditional way, instead of hot-plugging), or later via some tuning command.

The pernicious USB-stick stall problem

Posted Nov 7, 2013 8:34 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Using writeback rate estimation is not that trivial to get right *for all situations*, particularly when the writeback rate can change on the fly.

But... compare it to what we have now, which is *catastrophically* wrong by default, easily giving you a situation where you can have an amount of data that can take ten minutes to write out and jamming the entire system into a stall until it's done. In order to be as bad as what we have now, a rate-estimation-based system would have to be hit with a device which goes from above-SSD speeds to USB-key speeds on the fly -- and how likely is that?

In this case, I'd say, the perfect is the enemy of the good. What we have now is bad: simpleminded rate-estimation would be better, even if not perfect. Go for that first, pile in the complexity later, and throw away those horrible old knobs. (Whichever was written to last wins?! Ugh!)

The pernicious USB-stick stall problem

Posted Nov 7, 2013 14:48 UTC (Thu) by jzbiciak (guest, #5246) [Link]

a device which goes from above-SSD speeds to USB-key speeds on the fly

I guess it depends on whether there are devices that have their own in-built caching and can absorb quite a few writes until they slow down dramatically. They could exhibit rather bimodal behavior based on the size of the incoming writes. Also, where does NFS fit in the picture? There, performance may fluctuate quite a bit as well, although I don't know if it's affected by this particular set of knobs. (Seems like it ought to be.) With NFS, you have the combined effects of the buffering on the NFS server as well as all the other people on the network vying for the same bandwidth.

My gut feel tells me any simple-minded rate estimator should also have a fairly quick adaptation rate so it tracks any workload-dependent behavior and other variations in media performance. ie. it should probably represent recent history (the last several seconds), more than long-term history.

The pernicious USB-stick stall problem

Posted Nov 7, 2013 8:43 UTC (Thu) by johannbg (guest, #65743) [Link]

Cant the tuning be based on the detected device?

The pernicious USB-stick stall problem

Posted Nov 7, 2013 16:17 UTC (Thu) by luto (guest, #39314) [Link] (2 responses)

The one that applies is simply the one that was set last; so, for example, setting dirty_background_bytes to some value will cause dirty_background_ratio to be set to zero and ignored.

Seriously? I bet that no one (especially the sysctl tool) gets this right.

The pernicious USB-stick stall problem

Posted Nov 7, 2013 21:43 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

Exactly. I would have expected that both limits apply, with the actual limit determined by whichever is lower. In that case the fix would be easy: set a sensible default for the byte limit for large-memory systems, and a sensible percentage for small-memory ones.

In any case the percentage should be based on the same numbers--total RAM, not low memory--regardless of the word size.

The pernicious USB-stick stall problem

Posted Nov 8, 2013 6:42 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> Seriously? I bet that no one (especially the sysctl tool) gets this right.

Guess what. The kernel doesn't even get it right!! Almost but not quite.

There is a global variable "ratelimit_pages" which is effectively a granularity - we only do the expensive tests every "ratelimit_pages" pages.

This gets updated whenever you set dirty_ratio or dirty_bytes. It is set to dirty_thresh / (num_online_cpus() * 32)

However if you set "dirty_bytes" and then "dirty_ratio", the second calculation of ratelimit_pages will be based on the old "dirty_bytes" value, not the new "dirty_ratio" value.

It's a minor bug, but it confirms your assertion that this is an easy interface to get wrong.

[dirty_ratio_handler() should set "vm_dirty_bytes = 0" *before* the call to writeback_set_ratelimit()]

COW filesystems and the stall problem

Posted Nov 7, 2013 23:43 UTC (Thu) by jhhaller (guest, #56103) [Link] (3 responses)

This change may have interesting affects on COW filesystems like btrfs. I have been playing with an interesting combination of mythtv and btrfs. When recording HDTV, mythtv writes about 6GB of data per hour per channel being recorded, or about 16MB/s. Because of the cache/stall effects, mythtv calls fsync once per second, to avoid filling the cache and causing the problems mentioned in the article. While watching live TV, the program is written to disk by one process, and read by another, in a multimedia version of less+tail -f. On some occasions, when fsync runs, there is more than just a little bit of disk I/O, my guess is that the COW semantics are causing more than just the recently written data to be written. I wouldn't otherwise notice this, but the reading process hangs (causing the displayed TV program to hang) until the fsync completes, which can take seconds.

My question is whether writing dirty pages back more quickly will have a big effect on COW filesystem performance. Copying a large file to a COW filesystem may trigger more COW actions than before.

COW filesystems and the stall problem

Posted Nov 11, 2013 0:45 UTC (Mon) by giraffedata (guest, #1954) [Link] (2 responses)

fsync is a rather primitive way to cause the system not to keep memory needlessly dirty.

Lots of fadvise and madvise flags have been developed over the years; doesn't one of them get the writeback happening immediately? I created a TV recorder based on a ca 2004 Linux kernel that used msync(MS_ASYNC) for that purpose (yes, I made it write the file via mmap just so I could use msync).

Of course, if the issue is that you think the recorder might not actually be able to keep up with the data arriving, and want to make sure when that happens it just drops TV data and doesn't cripple the rest of the system too, then you do need something synchronous like fsync.

COW filesystems and the stall problem

Posted Nov 11, 2013 1:59 UTC (Mon) by jhhaller (guest, #56103) [Link] (1 responses)

The point was not how mythtv handles this, but that using more aggressive memory pressure may cause COW filesystems to have lower performance, by forcing large files writes to be flushed sooner than they currently do, with consequent COW filesystem behaviors to create copies of at least the tree data and last group of blocks originally written when the next memory pressure forces another write. For my case, I could adjust the configuration of the directory holding the video files to disable the COW behavior, but that's harder to do for the general case. I suspect msync would cause the same problem for a COW filesystem as fsync.

COW filesystems and the stall problem

Posted Nov 11, 2013 4:16 UTC (Mon) by giraffedata (guest, #1954) [Link]

Yes, I was not responding to the point.

The actual point lost me, because I don't see the connection between the high-volume writing and reading you described and copy-on-write and the stalling of the playback. But I also don't know anything about mythtv or btrfs specifically.

I don't think there's anything inherent in COW that means if you flush a large file write sooner that you make more copies of tree data or some group of blocks, but I may have just totally missed the scenario you have in mind.

The pernicious USB-stick stall problem

Posted Nov 8, 2013 16:12 UTC (Fri) by ssam (guest, #46587) [Link] (2 responses)

So does this mean if you have the issue of everything being slow and laggy while writing large files to USB you should look at the values in

cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_ratio

(I get 10 and 20 on fedora), and then change them to smaller numbers with something like

echo 2 > /proc/sys/vm/dirty_background_ratio
echo 5 > /proc/sys/vm/dirty_ratio

in your /etc/rc.d/rc.local

The pernicious USB-stick stall problem

Posted Nov 9, 2013 4:07 UTC (Sat) by naptastic (guest, #60139) [Link]

Close! Put the values in /etc/sysctl.conf.

The pernicious USB-stick stall problem

Posted Nov 11, 2013 0:34 UTC (Mon) by giraffedata (guest, #1954) [Link]

I don't see how that will make things noticeably better. In fact, it will probably make it worse.

With the defaults, your system can write up to .2M (where M is the size of your memory) to the USB stick before the system slows to a crawl. Remember that not just the process writing to the stick must wait for writeback before it can dirty more pages - all processes must. With your proposed numbers, the crawl happens for writes to the stick as small as .05M.

Lowering dirty_background_ratio will give your system a probably imperceptible head start (and thus earlier finish) on writing all that data to the USB stick. The headstart will be the amount of time it takes to buffer .08M of writes (10% - 2%).

These workarounds using global parameters can help only in carefully constructed cases. To prevent a slow write to USB from affecting non-USB-writing processes, the kernel would need some kind of memory allocation scheme that distinguishes processes or distinguishes write speed of backing devices.

The pernicious USB-stick stall problem

Posted Nov 9, 2013 4:25 UTC (Sat) by naptastic (guest, #60139) [Link]

As a "good" and not "perfect" solution, why not make dirty_background_ratio and dirty_ratio a percentage of some fixed value, and allow arbitrarily large* percentages?

* - dirty_background_ratio is a signed 32-bit value. If our "fixed value" were 1MiB, you could specify up to 2TiB of buffer space in the future. When we get to 128-bit architectures, we might want to increase the size of dirty_background_ratio to accommodate larger buffers.

The pernicious USB-stick stall problem

Posted Nov 9, 2013 6:58 UTC (Sat) by Nagilum (guest, #93411) [Link] (2 responses)

This has been bugging me for years and my solution has almost always been:
vm.dirty_background_ratio=0
or some other very low number (0..5). Personally I see no reason to delay starting to write dirty data out other than power saving. There is usually only a very slim chance that data will be written that will be deleted right away again so delaying starting to flush the data to disk makes very little sense to me.
If you have multiple writers it may also help with the performance if you have a higher value here but high for me is something like 5.

The pernicious USB-stick stall problem

Posted Nov 15, 2013 13:40 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> There is usually only a very slim chance that data will be written that will be deleted right away again

You're obviously not a developer (or gentoo user). I have a huge (20/30Gb ramdisk) for temp precisely because I quite often have gigs of data that gets created and deleted pretty quick. What's the point of writing it to disk when my system has plenty of ram?

Cheers,
Wol

The pernicious USB-stick stall problem

Posted Nov 15, 2013 14:03 UTC (Fri) by Nagilum (guest, #93411) [Link]

That's right I'm not a full time developer.
Anyway if nothing else is waiting for IO on that disk then it still wouldn't bother you very much since it won't block anything.
Anyhow you have your use-case solved.

The pernicious USB-stick stall problem

Posted Nov 9, 2013 8:54 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

> It's a storage equivalent to the bufferbloat problem.

Good analogy. It stops at the cure though: with TCP/IP it's all about dropping packets! Back-pressure is extremely rare in networking because of Head Of Line blocking.

Speaking of Head Of Line blocking, I suspect the queues involved in this article don't make the difference between users, do they? In other words, someone writing a lot to a slow device will considerably slow other users, correct?

(Yes, I do realize USB sticks don't tend to have a lot of concurrent users :-)

The pernicious USB-stick stall problem

Posted Nov 9, 2013 15:30 UTC (Sat) by raven667 (subscriber, #5198) [Link] (1 responses)

This has become one of my new pet points, queuing theory as it applies to IO has a ton of overlap between Disk IO and Net IO and there should be frequent reminders to make sure that new research and techniques cross pollinate between the two different communities. Storage doesn't have the option to just drop requests and return IO errors like IP does to indicate contention but it can certainly block writers until latencies improve. The same algorithms with different tunables might be useful across subsystems.

The pernicious USB-stick stall problem

Posted Nov 11, 2013 5:32 UTC (Mon) by jzbiciak (guest, #5246) [Link]

It turns out that bufferbloat also applies to on-chip switch fabrics and the buffering you can find there.

The pernicious USB-stick stall problem

Posted Nov 14, 2013 10:02 UTC (Thu) by callegar (guest, #16148) [Link]

I wonder if this can be the reason why linux software raid occasionally fails external usb disks are used for the raid and large data transfers are started. Any clue?

The pernicious USB-stick stall problem

Posted Nov 14, 2013 19:29 UTC (Thu) by chojrak11 (guest, #52056) [Link] (3 responses)

Is this problem really *that* hard to solve? Does it need tweaking instead of proper solving? Well... then how is it possible that Windows doesn't experience it *at all*? One time I had 5 different USB sticks plugged into my laptop and copied things all over in various directions, while working with other programs at the same time with no noticeable slowdowns? When reading about such simple problems (the other example is mtime update for mmap'ed files), it is obvious that Windows is still light years ahead...

The pernicious USB-stick stall problem

Posted Nov 14, 2013 19:34 UTC (Thu) by apoelstra (subscriber, #75205) [Link] (2 responses)

My understanding is that Windows basically commits everything to disk ASAP under the assumption that the user may pull the drive at any second and expect all the transferred data to be in place. (At least, all the data that the GUI says was transferred successfully.)

One result of this is that average-case data transfer on Windows is noticeably slower than on Linux, which is probably why the kernel folks are loath to copy such a solution.

The pernicious USB-stick stall problem

Posted Nov 14, 2013 19:51 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

This is classic case of "perfect is the enemy of good". Yes, Windows commits everything to USB stick right away, yes it's slow and inefficient, but it also means that you can actually use USB sticks without worry! With Linux small operations are extremely fast, but try to copy few gigs of data to USB stick and be ready to use a different computer for a few minutes (or hours) because system will be totally hosed.

The pernicious USB-stick stall problem

Posted Nov 14, 2013 21:48 UTC (Thu) by hummassa (subscriber, #307) [Link]

The problem here is "the knob works well, but the default value is not good for desktops", the solution being "when installing a desktop, remember to set the knob value to something saner" and you get the perfect case (as I have) that is fast USB copies without hogging the CPU.

Ah, and in my job I see hundreds of USB drives becoming totally hosed every year, by way users writing them on a windows machine and removing without unmounting. Don't do that, even on Windows.

The pernicious USB-stick stall problem

Posted Nov 15, 2013 11:09 UTC (Fri) by jezuch (subscriber, #52988) [Link] (1 responses)

My question is this: why is writing to one slow device blocking the entire system and not just the process that is doing the writing?

I remember there was an issue with page locking, I think, some time ago that was solved with stable pages. But it still doesn't make much sense.

The pernicious USB-stick stall problem

Posted Jan 8, 2014 10:26 UTC (Wed) by jospoortvliet (guest, #33164) [Link]

because when the IO subsystem gets clogged, other processes don't get anything either. It just becomes a huge mess with everything stalling.

The pernicious USB-stick stall problem

Posted Nov 15, 2013 21:02 UTC (Fri) by HenrikH (subscriber, #31152) [Link] (1 responses)

I wonder if this also can be responsible for the problems that I see with XTerm+SSH when lots and lots of lines of text are sent from a server to my ssh client. When that happens (often due to a grep done wrong) my whole machine freezes until all rows have been transferred.

The pernicious USB-stick stall problem

Posted Nov 17, 2013 4:25 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Two other situations I've seen which may be related:

- Heavy output on any local terminal (switching workspaces or to another tmux window makes it work better, so closer to the X side of the I/O pipeline; the switch might not take effect for 20–30 seconds though)
- rdesktop (I have to hide Windows' TTY window(s) during a build since its speed affects my local machine even when on another, hidden, desktop locally)

The pernicious USB-stick stall problem

Posted Jun 12, 2015 20:41 UTC (Fri) by evultrole (guest, #103116) [Link] (1 responses)

I've been fighting with this problem for the last year, and I had no luck with changing any of these. Came across this article many times while seeking an answer, so thought I'd leave what eventually worked for me.

Got the problem fixed with a custom udev rule.

/usr/lib/udev/rules.d/81-udisks_maxsect.rules

SUBSYSTEMS=="scsi", ATTR{max_sectors}=="240", ATTR{max_sectors}="32678"

My hangs disappeared after a reboot.

The pernicious USB-stick stall problem

Posted Jun 13, 2015 8:04 UTC (Sat) by paulj (subscriber, #341) [Link]

How does that work?

The pernicious USB-stick stall problem

Posted Nov 7, 2018 21:11 UTC (Wed) by sourcejedi (guest, #45153) [Link]

"The entire system proceeds to just hang" - I think this is misleading :-(. Artem didn't report this, and I don't see any other evidence presented for it.

I am hopeful that it is prevented, or at least mitigated, by the "No-I/O dirty throttling" code that you reported on in 2011 :-). This throttles write() calls to control both the size of the overall writeback cache, and the amount of writeback cache *for the specific backing device*.

Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained the "sync" command could take up to "dozens of minutes".

In his followup message, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though `mysqldump` is run with `ionice -c3`". But this was not the USB-stick problem. It happened after creating a 10GB file on an *internal* disk.

I'm not saying there isn't a bufferbloat-style problem. But I cant find any evidence here, that excessive writeback cache on one BDI is delaying writes to other BDIs. At least in the simple case you described.

I wrote a StackExchange post about this here.

The pernicious USB-stick stall problem

Posted Apr 5, 2020 16:10 UTC (Sun) by abdulla95 (guest, #138076) [Link]

Running the lines:

echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes

Didn't help me. The performance did improve but it would still lag. (I have a 1TB HDD and 8GB RAM)

My question is, is using a hack to go around this a good thing? Like `ionice`, `rsync`, `pv`? I have seen these being thrown around in the internet. And I have used rsync and it works.

The pernicious USB-stick stall problem

Posted Mar 5, 2021 14:23 UTC (Fri) by kolay.ne (guest, #145247) [Link] (2 responses)

Hello from 2021, lol
Still having this issue

The pernicious USB-stick stall problem

Posted Apr 18, 2021 13:19 UTC (Sun) by LaurentD (guest, #151713) [Link] (1 responses)

Thanks for sharing the issue and all the information.

That said, I concur. 2021 already and still seeing the issue here as well. The below does not seem to help much.

echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes

I wonder: how have other OSs addressed the issue?

The pernicious USB-stick stall problem

Posted Apr 20, 2021 0:42 UTC (Tue) by flussence (guest, #85566) [Link]

Windows neatly sidesteps the entire problem by making file IO CPU-bound via malware scanners.

More serious answer: has anyone benchmarked the bufferbloat in writing directly to a USB stick compared to spinning up a VM, handing the USB device to that, exposing the stick as an NFS share and writing that way? I honestly wouldn't be surprised if the latter works better.

The pernicious USB-stick stall problem

Posted Aug 20, 2021 20:10 UTC (Fri) by xmready (guest, #153808) [Link] (2 responses)

8 years later this is still an issue. Do I have to become a Linux Dev so I can fix this?

The pernicious USB-stick stall problem

Posted Aug 21, 2021 1:25 UTC (Sat) by pizza (subscriber, #46) [Link] (1 responses)

> Do I have to become a Linux Dev so I can fix this?

Patches welcome!

The pernicious USB-stick stall problem

Posted Jul 12, 2023 19:43 UTC (Wed) by juliano_vs (guest, #166031) [Link]

10 years later, this is still causing headaches for many users
I keep imagining a new user going to linux and facing this problem, having to wait 2 hours to unmount his usb stick 2.0 and having to go out looking for a manual solution on google
In my case i am not a new user but i had this problem and i ended up finding a solution and now i can unmount the pendrive as soon as the copy progress bar ends (just like it is in windows)
correction:
create the file:
/etc/udev/rules.d/60-usb-dirty-pages-udev.rules

with the following content:
ACTION=="add", KERNEL=="sd[a-z]", SUBSYSTEM=="block", ENV{ID_USB_TYPE}=="disk", RUN+="/usr/bin/bash -c 'echo 1 > /sys/block/%k/bdi/strict_limit; echo 16777216 > /sys/block/%k/bdi/max_bytes'"

then restart your machine !

After so many years this should already have a definitive solution in the kernel and not need manual intervention by the user.
It's this kind of thing that keeps new users away from the system