GNU C Library version 2.39
GNU C Library version 2.39
Posted Feb 7, 2024 11:41 UTC (Wed) by paulj (subscriber, #341)In reply to: GNU C Library version 2.39 by meven-collabora
Parent article: GNU C Library version 2.39
Posted Feb 7, 2024 12:34 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (9 responses)
There's a related issue that distros don't currently set per-bdi writeback limits "sensibly" (and indeed, "sensible" is in the eyes of the beholder here). If the user expects to remove a device, then the per-bdi writeback limits should be set so that the kernel won't buffer more than a second or so worth of writeback, accepting that the consequence is that all operations on the device slow down as soon as there's a small amount of dirty data to write, while for devices intended to be permanently connected, the writeback limits should be large so that the kernel can delay writes for longer, and only pays the penalty of delaying if you have a large amount of data pending writeback at shutdown time.
The kernel can't do this itself, because the policy about "permanent" or "removeable" isn't known to the kernel; if you have a USB SSD attached as the main drive for a Raspberry Pi, that's "permanent", and a large writeback limit is reasonable. If you have a USB SSD plugged into that same RPi so that you can copy data to it and then unplug it to move to another location, that's "removeable", and you want the writeback limit to be small.
Posted Feb 7, 2024 13:04 UTC (Wed)
by paulj (subscriber, #341)
[Link] (6 responses)
Interesting parallels. ;)
Posted Feb 7, 2024 14:00 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (5 responses)
The difference is that a networking host doesn't have any way to determine the path capacity - indeed, it can change significantly over time. A storage host does usually have a way to determine the device capacity; we can make good guesses at the number of IOPS the device can do, and at how large each IOP can be before it reduces the number of IOPS we can handle.
Also, we have the weirdness that for some devices, we want the buffer to be bloated, and for others, we don't; /, /home and other internal filesystems on my laptop can have a very bloated buffer, since there's no practical cost to it, but there is a potential gain from a huge buffer (turning lots of small operations into a smaller number of big operations).
Posted Feb 7, 2024 15:30 UTC (Wed)
by paulj (subscriber, #341)
[Link] (2 responses)
One difference you're raising there is that the storage case, you have what the networking world would call "content addressable networking". I.e., the process specifies /what/ content to read and write, thus allowing the system (a tiny distributed system, in a case) to offer caching (inc. write caching) at various levels. This is something the networking world generally lacks, sadly (?). In networking the reads/writes are generally intimately tied to the location of the data. Caching is thus minimal, and we have to build very complicated systems to virtualise the location of the data /somewhat/ (within the scope of that complicated system).
Of course, the single-system storage model morphs into that same problem once it exceeds the capacity of the highly-cohesive, coherent single-system model, as the answer will involve introducing much less coherent technologies, i.e. networking . ;)
Posted Feb 7, 2024 16:32 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (1 responses)
The other important difference is that in the storage case, we rarely care about the effects of congestion on shared links. Either we can afford to wait when the in-memory kernel cache flushes out to the device (the internal drive case), or we want to keep the cache small compared to the speed of the device so that it's quick to flush when needed (the removable drive case), and we very rarely have links slower than the devices they're connecting (even under congestion).
Posted Feb 7, 2024 16:55 UTC (Wed)
by paulj (subscriber, #341)
[Link]
But that was what my first comment was pointing at: the noted use-case of /end process/ workload pacing would start to introduce some of that functionality into the user process (which is equivalent to the "end host"). ;) Who knows where that leads to in the future. ;)
Maybe at some point the coherent single-system becomes more of an explicit distributed system. (It already is a distributed system, but hides it very well; HyperTransport, PCIe, etc., are all at least packet based, but non-blocking and [very near] perfectly reliable - making the presentation of a very coherent system much easier than with networking).
Posted Feb 15, 2024 11:32 UTC (Thu)
by tanriol (guest, #131322)
[Link] (1 responses)
And then the RAM cache of the SSD fills up and the available bandwidth drops. And then the SLC cache area of the SSD fills up and it drops again.
Posted Feb 15, 2024 11:55 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Feb 8, 2024 4:40 UTC (Thu)
by intelfx (subscriber, #130118)
[Link] (1 responses)
Could you please clarify how do I make them work? :-)
Posted Feb 8, 2024 10:28 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Tested on 6.6.6 and 6.7.3 kernels, on a Fedora 39 system. I have a slow USB device as /dev/sda, which is therefore bdi 8:0. To restrict it to 1 second of dirty data, I need to run two commands as root:
Per the documentation for bdi limits,
the first command tells the kernel that this device's limits must be respected even if the global background dirty limit isn't reached; my system has a global background dirty limit around 8 GiB, so without this, any limit I set below 8 GiB is ignored.
The second sets the actual limit - in this case, 1 second of writes to the device, which is 4 MiB of data. You can see why strict limits matters here, though - without strict limits, the global background dirty limit would push me to 8 GiB of data before even checking the per-bdi limits, unless I had a lot of writes in progress to other devices. And I want relatively large limits for the global limits, since my laptop has a big battery, and when I build a Yocto image, I've often got many readers in parallel that need time on the NVMe drive, along with writes that can be delayed, and I'd prefer the writes to wait so that I'm blocking on CPU, not on I/O.
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
GNU C Library version 2.39
Wol
GNU C Library version 2.39
GNU C Library version 2.39