Bye-bye bdflush()

By Jonathan Corbet
July 5, 2021

The addition of system calls to the Linux kernel is a routine affair; it happens during almost every merge window. The removal of system calls, instead, is much more uncommon. That appears likely to happen soon, though, as discussions proceed on the removal of bdflush(). Read on for a look at the purpose and history of this obscure system call and to learn whether you will miss it (you won't).

Linux, like most operating systems, buffers filesystem I/O through memory; a write() call results in a memory copy into the kernel's page cache, but does not immediately result in a write to the underlying block storage device. This buffering is necessary for writes of anything other than complete blocks; it is also important for filesystem performance. Deferring block writes can allow operations to be coalesced, provide opportunities for better on-disk file layout, and enables the batching of operations.

Buffered file data cannot be allowed to live in memory forever, though; eventually the system must arrange for it to be flushed back to disk. Even the 0.01 Linux release included a version of the sync() system call, which forces all cached filesystem data to be written out. While the kernel would flush some buffers when the buffer cache (which preceded the page cache and was a fixed-size array at that time) filled up, there was no provision for regularly ensuring that all buffers were pushed out to disk. That task was, if your editor's memory serves, handled by a user-space process that would occasionally wake up and call sync().

There are advantages to handling this task in the kernel, though; it has a much better idea of the state of both the buffer cache and the underlying devices. As a step in that direction, the bdflush() system call was added to the 0.99.14y release on February 2, 1994. (This was a different era of kernel development; the preceding 0.99.14x release came out seven hours earlier, and 0.99.14z came out nine hours later). That implementation was not particularly useful, though; all it did was return a "not implemented" error. An actual bdflush() implementation was not added until the 1.1.3 development kernel in April 1994.

It must be said that bdflush() was a strange system call. It was defined as:

    int bdflush(int func, long data);

If func was zero, bdflush() would never return; instead, it would loop within the kernel, occasionally flushing out dirty buffers. In essence, a user-space process would become the kernel buffer-flushing thread by making that call; these were the days before proper kernel threads, after all. Passing func as one would cause some buffers to be flushed immediately. Higher values of func would either read or write the value of a control parameter for the flushing thread; these included the percentage of dirty buffers needed to activate flushing, the number of blocks to write in each cycle, etc.

While bdflush() was an improvement, there were a number of problems with it as well. One of those was that it relied on user space for a critical kernel function; if no process ever set itself up with bdflush(), or if that process were killed, bad things would happen. In the 1.3.50 development release (December 1995), the kernel was changed to automatically create a kernel thread (something it could do at that point) to do the flushing work. User space could still call bdflush() to tweak the various parameters, but an attempt to run as the flushing daemon would turn into an immediate call to exit(); that caused the update process started by older init systems to "work", avoiding boot-time unhappiness.

Another problem with bdflush() — or, more specifically, with the underlying implementation — since the beginning is that it was a single thread. As Linux grew in popularity and found itself on bigger systems, that single thread became an increasingly severe bottleneck. If you have a number of drives on a system, it will eventually take multiple threads to keep them all busy. Thus Andrew Morton replaced the remaining bdflush() infrastructure entirely in 2002 for the 2.5.8 development kernel; in its place was a new set of kernel threads called pdflush. Each pdflush thread was dedicated to a separate physical drive, providing a much-needed scalability improvement.

In December 2002, Morton merged a patch from Robert Love formally deprecating the bdflush() system call, promising that it "will be removed in a future kernel". The pdflush threads were removed in 2009 (for 2.6.32) in favor of a rather more elaborate, workqueue-based, writeback-control mechanism; those can still be seen in the form of kernel threads with names like kworker/u8:3-flush-259:0. Meanwhile, though, bdflush() lives on in current kernels, even though it has not done anything for many years.

Now, however, Eric Biederman is proposing to remove bdflush() entirely as part of a larger project he has to rework the kernel's exit() code. Given that this system call does nothing, was never widely used in the first place, and has been deprecated for nearly 19 years, one might confidently conclude that there are no users left. As it turns out, though, Geert Uytterhoeven has an old m68k image that he occasionally boots, presumably on days when he is overcome with nostalgia. Michael Schmitz demonstrated, though, that said image still boots successfully in the absence of bdflush(), so it is not an impediment to the system call's removal.

There are no other known users of bdflush() out there, so there would appear to be nothing preventing this removal from happening. At that point, it will be the first system call removed since late 2019, when sysctl() was deleted — by the same Eric Biederman. It would be surprising to see that happen in 5.14, though, given how recently this patch was posted. This system call has endured for almost 19 years after it ceased to be useful; keeping it for another two months until 5.15 does not seem like much of an imposition.

Index entries for this article
Kernel	System calls/bdflush()

Bye-bye bdflush()

Posted Jul 5, 2021 18:41 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

> There are no other known users of bdflush() out there

The noflushd (http://noflushd.sourceforge.net/) daemon used bdflush() in versions noflushd-2.6.3 and earlier.
With 2.7 I ported it to /proc/sys/vm/dirty_writeback_centisecs.

Bye-bye bdflush()

Posted Jul 5, 2021 21:20 UTC (Mon) by dharding (subscriber, #6509) [Link]

For the curious, version 2.7 of noflushd was released in January 2004.

Proper cleanup

Posted Jul 5, 2021 22:56 UTC (Mon) by jreiser (subscriber, #11027) [Link] (1 responses)

If Linux really believes in "compatibility forever" then the syscall number should not be re-used for any purpose, and any actual attempt to invoke it should return -ENOSYS. I looked at Biederman's patch. There is no remark about such a cleanup, and no obvious code that does so. Even if the patch does result in the proper cleanup, it still would be appropriate to document explicitly.

Proper cleanup

Posted Jul 5, 2021 23:54 UTC (Mon) by dvrabel (subscriber, #9500) [Link]

All the syscall tables have been updated as follows (or similar):

-114 common bdflush sys_bdflush
+114 common bdflush sys_ni_syscall

So the system call numbers are still reserved and will now return -ENOSYS.

Bye-bye bdflush()

Posted Jul 5, 2021 23:53 UTC (Mon) by geofft (subscriber, #59789) [Link] (1 responses)

According to the patch, the current version of bdflush calls exit when func == 1, not when func == 0 as this article (and the manpage) seems to imply would make sense. And it seems like it's been that way since, at least, v2.6.12 (the first version released using Git). So I think the compatibility layer for userspace bdflush daemons never worked, right?

Bye-bye bdflush()

Posted Jul 6, 2021 16:06 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

So I think the compatibility layer for userspace bdflush daemons never worked, right?

It's ugly but I'm sure it worked because it had to work when it was introduced. Also bear in mind that the different functions (daemon, flush-some, and tunables) were removed in stages. Had they been removed at the same time, probably the behaviour would be a bit different.

The existing callers that passed func <= 0 would presumably exit if it ever returned, so there was no need for a forceable exit.

The existing callers that passed func == 1 would presumably do so in an infinite loop (with a sleep). So these were completely unnecessary processes that could be cleaned up with a forceable exit.

Bye-bye bdflush()

Posted Jul 6, 2021 9:16 UTC (Tue) by error27 (subscriber, #8346) [Link]

What a fun trip down memory lane. Thanks, Jon!

Bye-bye bdflush()

Posted Jul 7, 2021 0:57 UTC (Wed) by dgc (subscriber, #6611) [Link]

"Another problem with bdflush() — or, more specifically, with the underlying implementation — since the beginning is that it was a single thread. [...] Each pdflush thread was dedicated to a separate physical drive, providing a much-needed scalability improvement."

Maybe in 2002 that was worth something. However, we have been limited in writeback performance since the start of the SSD era (i.e. since ~2008) or so by having only a single flusher thread per physical block device. Writeback, especially with the delayed allocation design XFS, ext4, btrfs and other modern filesystems have, hits single CPU usage limits long before SSDs hit their hardware capability limits.

-Dave.

Bye-bye bdflush()

Posted Jul 7, 2021 16:59 UTC (Wed) by mwsealey (subscriber, #71282) [Link]

This article has me worried that there is a second Eric Biederman lurking around...

Bye-bye bdflush()

Posted Jul 8, 2021 23:07 UTC (Thu) by gerdesj (subscriber, #5446) [Link] (2 responses)

"That task was, if your editor's memory serves, handled by a user-space process that would occasionally wake up and call sync()."

I still type sync occasionally. Usually after a dd session to a USB stick and I can't be arsed to grab the mouse and find the widget, right click and "safely remove" or whatever it is called. Sometimes I even eject the device first before whipping it out ^*.

You don't get more user-space than that! I'm probably not alone either.

(*) Ooerr missus

Bye-bye bdflush()

Posted Jul 13, 2021 10:12 UTC (Tue) by ghane (guest, #1805) [Link] (1 responses)

Question, please:
So does the userspace sync command do anything at all?

A long time ago, I was told that one always says:
sync; sync; sync
on a multi-user Unix (there were no desktops) to move other people's data closer to the end of the line, so that my data would fall off onto the disk.

--
Sanjeev

Bye-bye bdflush()

Posted Jul 14, 2021 11:24 UTC (Wed) by sandsmark (guest, #62172) [Link]

> So does the userspace sync command do anything at all?

sync (the command) seems like it calls sync() by default: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/...

And I didn't know before I looked at the source that you could specify a file, but that is neat.