2003 Kernel Summit: Asynchronous I/O
[Posted July 22, 2003 by corbet]
Suparna Bhattacharya ran a session on asynchronous I/O support in the 2.6
kernel. The discussion began with a summary of the current status of AIO
support: the core API is in place,
O_DIRECT AIO works with some
filesystems, and the epoll interface (which is a separate development) is
in place. A number of things remain to be done; most significantly,
buffered filesystem I/O remains synchronous.
That situation could change, however; Suparna has had a set of filesystem
AIO patches in circulation for some months. These patches take a
retry-based approach; the code makes as much progress as it can without
blocking on a given operation, then returns a "try again" error and stops.
Later on, the AIO subsystem will restart the request, in the hopes of
getting further. Only parts of the I/O path have been made truly
asynchronous; a number of operations (getting block request queue entries,
metadata operations) can still block. But it is a step in the right
direction.
There are some more experimental bits in circulation as well, including an
asynchronous version of get_block() and asynchronous semaphore
operations.
The discussion of experimental stuff was Linus's cue to step in and note
that he was not all that enthusiastic about merging even the
less-experimental parts without more evidence that there is interest in
asynchronous I/O. There is also a lack of publicly-available benchmarks
showing the benefits of AIO. Linus would like to see benchmarks that he
can run that do not involve setting up a huge database. In fact, he wants
"random 16-year-olds in Czechoslovakia" (running the kernel apparently
leaves him little time for geopolitics) to be trying out AIO benchmarks.
With enough benchmarks in circulation, interest in - and use of - AIO
should increase.
Oracle, it seems, has some benchmarks that it will, with luck, be releasing
soon.
Suparna then got into the list of things yet to be done with AIO. These
include more performance tuning, support for more filesystems (currently,
even O_DIRECT is only supported on ext2, ext3, and JFS), a
possible convergence of AIO and the epoll interface, vector AIO (it is more
efficient to have a lot of iovecs than a bunch of separate I/O control
blocks), implementing the AIO version of fsync(), and network
AIO. Network AIO was the topic that drew the most interest; it is, it was
said, the "only sane way" to do zero-copy TCP transfers. Linus suggested
that the way to get network AIO working was to publish a benchmark showing
that Solaris did it faster; then there would be an implementation within
two days.
It was also suggested that AIO support should be patched into a few
applications; that would create an incentive for implementing and testing
AIO features. It turns out that some applications (squid and MySQL were
mentioned) already have such support. There were some complaints about the
current glibc implementation of the user-space POSIX AIO functions;
apparently there is a lock there which limits the number of outstanding
operations.
Looking further ahead, some possible improvements to AIO were discussed.
Most of these seem to involve having AIO take over much of the rest of the
I/O infrastructure - an idea which came up at last year's kernel summit as
well. So, for example, the various read and write methods from the
file_operations structure could be removed, and
aio_read() and aio_write() used for everything. The
entire file I/O path could be changed over to a retry-based mode for all
operations. In this future, synchronous operations would be implemented by
invoking the corresponding asynchronous operations, then waiting for the
result. Such changes are far into the 2.7 (or later) future, however.
(
Log in to post comments)