Toward non-blocking asynchronous I/O

By Jonathan Corbet
May 30, 2017

The Linux asynchronous I/O (AIO) layer tends to have many critics and few defenders, but most people at least expect it to actually be asynchronous. In truth, an AIO operation can block in the kernel for a number of reasons, making AIO difficult to use in situations where the calling thread truly cannot afford to block. A longstanding patch set aiming to improve this situation would appear to be nearing completion, but it is more of a step in the right direction than a true solution to the problem.

To perform AIO, a program must set up an I/O context with io_setup(), fill in one or more iocb structures describing the operation(s) to be performed, then submit those structures with io_submit(). A call to io_getevents() can be made to learn about the status of outstanding I/O operations and, optionally, wait for them. All of those system calls should, with the exception of the last, be non-blocking. In the real world, things are more complicated. Memory allocations or lock contention can cause any AIO operation to block before it starts to move any data at all. And, even in the best-supported case (direct file I/O), the operation itself can block in a number of places.

The no-wait AIO patch set from Goldwyn Rodrigues seeks to improve this situation in a number of ways. It does not make AIO any more asynchronous, but it will cause AIO operations to fail with EAGAIN errors rather than block in a number of situations. If a program is prepared for such errors, it can opportunistically try to submit I/O in its main thread; it will then only need to fall back to a separate submission thread in cases where the operation would block.

If a program is designed to use no-wait AIO, it must indicate the fact by setting the new IOCB_RW_FLAG_NOWAIT flag in the iocb structure. That structure has a field (aio_flags) that is meant to hold just this type of flag, but there is a problem: the kernel does not currently check for unknown flags in that field. That makes it impossible to add a new flag, since a calling program can never know whether the kernel it is running on supports that flag or not. Fortunately, that structure contains a couple of reserved fields that are checked in current kernels; the field formerly known as aio_reserved1 is changed to aio_rw_flags in this patch set and used for the new flag.

One of the places where an I/O request can block is if the operation will trigger a writeback operation; in that case, the request will be held up until the writeback completes. This wait happens early in the submission process; in particular, it can happen before io_submit() completes its work and returns. Setting IOCB_RW_FLAG_NOWAIT will cause submission to fail with EAGAIN in this case.

Another common blocking point is I/O submission at the block level, where, in particular, a request can be stalled because the underlying block device is too busy. Avoiding that involves the creation of a new REQ_NOWAIT flag that can be set in the BIO structure used to describe block I/O requests. When that flag is present, I/O submission will, once again, fail with an EAGAIN error rather than block waiting for the level of block-device congestion to fall.

Support is also needed at the filesystem level; each filesystem has its own places where execution can block on the way to submitting a request. The patch set includes support for Btrfs, ext4, and XFS. In each case, situations like the inability to obtain a lock on the relevant inode will cause a request to fail.

All of this work can make AIO better, but only for a limited set of use cases. It only improves direct I/O, for example. Buffered I/O, which has always been a sort of second-class citizen in the AIO layer, is unchanged; there are simply too many places where things can block to try to deal with them all. Similarly, there is no support for network filesystems or for filesystems on MD or LVM volumes — though Rodrigues plans to fill some of those gaps at some future point.

In other words, AIO seems likely to remain useful only for the handful of applications that perform direct I/O to files. There have been a number of attempts to improve the situation in the past, including fibrils, threadlets, syslets, acall, and an AIO reimplementation based on kernel threads done by the original AIO author. None of those have ever reached the point of being seriously considered for merging into the mainline, though. There are a lot of tricky details to be handled to implement a complete solution, and nobody has ever found the goal to be important enough to justify the considerable work required to come up with a better solution to the problem. So the kernel will almost certainly continue to crawl forward with incremental improvements to AIO.

Index entries for this article
Kernel	Asynchronous I/O

Toward non-blocking asynchronous I/O

Posted May 31, 2017 6:07 UTC (Wed) by ringerc (subscriber, #3071) [Link] (1 responses)

Last time I checked it seemed like we couldn't even trust the AIO layer to provide a reliable fsync() for buffered AIO, so it seems like it's pretty useless for anything except O_DIRECT.

Would still be beneficial to have async fsync even with O_DIRECT

Posted May 31, 2017 20:20 UTC (Wed) by sitsofe (guest, #104576) [Link]

O_DIRECT implies that the I/O won't be left rolling around in the OS' cache but says nothing about whether it is still in the disk device's non-volatile cache. You could send all I/Os down with O_SYNC too but speeds will plummet. Thus it's still desirable to be able to send down an fsync (and it would have been preferable if submitting it didn't have to block)...

Toward non-blocking asynchronous I/O

Posted Jun 1, 2017 16:22 UTC (Thu) by oever (guest, #987) [Link] (1 responses)

I appreciate the work on asynchronous I/O very much. There is a lot of potential to improve software with AIO. A good example I came across recently is the venerable find.

find is a single-threaded application that uses blocking IO. For each directory that it reads it does one or more getdents calls. These calls are done sequentially. find could be sped up by doing many calls at the same time.

Consider the extreme case where each subsequent directory is on the opposite site of the disk. The disk head would travel to the other side of the disk for each directory. The kernel IO scheduler cannot help because it only knows about the next location.

If 100 parallel requests were done instead of one, the IO scheduler would handle them in an efficient and quick manner: it would read the nearest entries first. It is possible to send parallel getdents requests with threads. This requires one thread per parallel request: quite an overhead. libuv does this with a thread pool. This approach can roughly double the speed of find for a cold cache.

If this would be implemented with libaio, the thread overhead could be eliminated.

Toward non-blocking asynchronous I/O

Posted Jun 2, 2017 12:53 UTC (Fri) by oever (guest, #987) [Link]

Alas, this plan cannot be executed while there is no LIO_GETDENTS in aiocb.

Toward non-blocking asynchronous I/O

Posted Jun 2, 2017 6:19 UTC (Fri) by ssmith32 (subscriber, #72404) [Link] (1 responses)

I'm assuming older kernels throw an error if they check the new flag, and it's set? Otherwise I can't understand why that matters...

Toward non-blocking asynchronous I/O

Posted Jun 2, 2017 7:06 UTC (Fri) by peter-b (guest, #66996) [Link]

I think the problem is that older kernels ignore those bits entirely, rather than forcing them to be zeroed. So there's no reliable way to find out whether the flag's having any effect.