Bye-bye bdflush()
Linux, like most operating systems, buffers filesystem I/O through memory; a write() call results in a memory copy into the kernel's page cache, but does not immediately result in a write to the underlying block storage device. This buffering is necessary for writes of anything other than complete blocks; it is also important for filesystem performance. Deferring block writes can allow operations to be coalesced, provide opportunities for better on-disk file layout, and enables the batching of operations.
Buffered file data cannot be allowed to live in memory forever, though; eventually the system must arrange for it to be flushed back to disk. Even the 0.01 Linux release included a version of the sync() system call, which forces all cached filesystem data to be written out. While the kernel would flush some buffers when the buffer cache (which preceded the page cache and was a fixed-size array at that time) filled up, there was no provision for regularly ensuring that all buffers were pushed out to disk. That task was, if your editor's memory serves, handled by a user-space process that would occasionally wake up and call sync().
There are advantages to handling this task in the kernel, though; it has a much better idea of the state of both the buffer cache and the underlying devices. As a step in that direction, the bdflush() system call was added to the 0.99.14y release on February 2, 1994. (This was a different era of kernel development; the preceding 0.99.14x release came out seven hours earlier, and 0.99.14z came out nine hours later). That implementation was not particularly useful, though; all it did was return a "not implemented" error. An actual bdflush() implementation was not added until the 1.1.3 development kernel in April 1994.
It must be said that bdflush() was a strange system call. It was defined as:
int bdflush(int func, long data);
If func was zero, bdflush() would never return; instead, it would loop within the kernel, occasionally flushing out dirty buffers. In essence, a user-space process would become the kernel buffer-flushing thread by making that call; these were the days before proper kernel threads, after all. Passing func as one would cause some buffers to be flushed immediately. Higher values of func would either read or write the value of a control parameter for the flushing thread; these included the percentage of dirty buffers needed to activate flushing, the number of blocks to write in each cycle, etc.
While bdflush() was an improvement, there were a number of problems with it as well. One of those was that it relied on user space for a critical kernel function; if no process ever set itself up with bdflush(), or if that process were killed, bad things would happen. In the 1.3.50 development release (December 1995), the kernel was changed to automatically create a kernel thread (something it could do at that point) to do the flushing work. User space could still call bdflush() to tweak the various parameters, but an attempt to run as the flushing daemon would turn into an immediate call to exit(); that caused the update process started by older init systems to "work", avoiding boot-time unhappiness.
Another problem with bdflush() — or, more specifically, with the underlying implementation — since the beginning is that it was a single thread. As Linux grew in popularity and found itself on bigger systems, that single thread became an increasingly severe bottleneck. If you have a number of drives on a system, it will eventually take multiple threads to keep them all busy. Thus Andrew Morton replaced the remaining bdflush() infrastructure entirely in 2002 for the 2.5.8 development kernel; in its place was a new set of kernel threads called pdflush. Each pdflush thread was dedicated to a separate physical drive, providing a much-needed scalability improvement.
In December 2002, Morton merged a patch from Robert Love formally
deprecating the bdflush() system call, promising that it
"will be removed in a future kernel
". The pdflush
threads were removed in
2009 (for 2.6.32) in favor of a rather
more
elaborate, workqueue-based, writeback-control mechanism; those can still be
seen in the form of kernel threads with names like
kworker/u8:3-flush-259:0. Meanwhile, though, bdflush()
lives on in current kernels, even though it has not done anything for many
years.
Now, however, Eric Biederman is proposing to remove bdflush() entirely as part of a larger project he has to rework the kernel's exit() code. Given that this system call does nothing, was never widely used in the first place, and has been deprecated for nearly 19 years, one might confidently conclude that there are no users left. As it turns out, though, Geert Uytterhoeven has an old m68k image that he occasionally boots, presumably on days when he is overcome with nostalgia. Michael Schmitz demonstrated, though, that said image still boots successfully in the absence of bdflush(), so it is not an impediment to the system call's removal.
There are no other known users of bdflush() out there, so there
would appear to be nothing preventing this removal from happening. At that
point, it will be the first system call removed since late 2019, when
sysctl() was deleted — by the same Eric Biederman. It would be
surprising to see that happen in 5.14, though, given how recently this
patch was posted. This system call has endured for almost
19 years after it ceased to be useful; keeping it for another two
months until 5.15 does not seem like much of an imposition.
Index entries for this article | |
---|---|
Kernel | System calls/bdflush() |
Posted Jul 5, 2021 18:41 UTC (Mon)
by mb (subscriber, #50428)
[Link] (1 responses)
The noflushd (http://noflushd.sourceforge.net/) daemon used bdflush() in versions noflushd-2.6.3 and earlier.
Posted Jul 5, 2021 21:20 UTC (Mon)
by dharding (subscriber, #6509)
[Link]
Posted Jul 5, 2021 22:56 UTC (Mon)
by jreiser (subscriber, #11027)
[Link] (1 responses)
Posted Jul 5, 2021 23:54 UTC (Mon)
by dvrabel (subscriber, #9500)
[Link]
-114 common bdflush sys_bdflush
So the system call numbers are still reserved and will now return -ENOSYS.
Posted Jul 5, 2021 23:53 UTC (Mon)
by geofft (subscriber, #59789)
[Link] (1 responses)
Posted Jul 6, 2021 16:06 UTC (Tue)
by BenHutchings (subscriber, #37955)
[Link]
It's ugly but I'm sure it worked because it had to work when it was introduced. Also bear in mind that the different functions (daemon, flush-some, and tunables) were removed in stages. Had they been removed at the same time, probably the behaviour would be a bit different. The existing callers that passed func <= 0 would presumably exit if it ever returned, so there was no need for a forceable exit. The existing callers that passed func == 1 would presumably do so in an infinite loop (with a sleep). So these were completely unnecessary processes that could be cleaned up with a forceable exit.
Posted Jul 6, 2021 9:16 UTC (Tue)
by error27 (subscriber, #8346)
[Link]
Posted Jul 7, 2021 0:57 UTC (Wed)
by dgc (subscriber, #6611)
[Link]
Maybe in 2002 that was worth something. However, we have been limited in writeback performance since the start of the SSD era (i.e. since ~2008) or so by having only a single flusher thread per physical block device. Writeback, especially with the delayed allocation design XFS, ext4, btrfs and other modern filesystems have, hits single CPU usage limits long before SSDs hit their hardware capability limits.
-Dave.
Posted Jul 7, 2021 16:59 UTC (Wed)
by mwsealey (subscriber, #71282)
[Link]
Posted Jul 8, 2021 23:07 UTC (Thu)
by gerdesj (subscriber, #5446)
[Link] (2 responses)
Posted Jul 13, 2021 10:12 UTC (Tue)
by ghane (guest, #1805)
[Link] (1 responses)
A long time ago, I was told that one always says:
--
Posted Jul 14, 2021 11:24 UTC (Wed)
by sandsmark (guest, #62172)
[Link]
sync (the command) seems like it calls sync() by default: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/...
And I didn't know before I looked at the source that you could specify a file, but that is neat.
Bye-bye bdflush()
With 2.7 I ported it to /proc/sys/vm/dirty_writeback_centisecs.
Bye-bye bdflush()
Proper cleanup
Proper cleanup
+114 common bdflush sys_ni_syscall
Bye-bye bdflush()
Bye-bye bdflush()
So I think the compatibility layer for userspace bdflush daemons never worked, right?
Bye-bye bdflush()
Bye-bye bdflush()
Bye-bye bdflush()
Bye-bye bdflush()
"That task was, if your editor's memory serves, handled by a user-space process that would occasionally wake up and call sync()."
I still type sync occasionally. Usually after a dd session to a USB stick and I can't be arsed to grab the mouse and find the widget, right click and "safely remove" or whatever it is called. Sometimes I even eject the device first before whipping it out *.
You don't get more user-space than that! I'm probably not alone either.
(*) Ooerr missus
Bye-bye bdflush()
So does the userspace sync command do anything at all?
sync; sync; sync
on a multi-user Unix (there were no desktops) to move other people's data closer to the end of the line, so that my data would fall off onto the disk.
Sanjeev
Bye-bye bdflush()