LWN.net Logo

2003 Kernel Summit: Asynchronous I/O

This article is part of LWN's 2003 Kernel Developers' Summit coverage.
Suparna Bhattacharya ran a session on asynchronous I/O support in the 2.6 kernel. The discussion began with a summary of the current status of AIO support: the core API is in place, O_DIRECT AIO works with some filesystems, and the epoll interface (which is a separate development) is in place. A number of things remain to be done; most significantly, buffered filesystem I/O remains synchronous.

That situation could change, however; Suparna has had a set of filesystem AIO patches in circulation for some months. These patches take a retry-based approach; the code makes as much progress as it can without blocking on a given operation, then returns a "try again" error and stops. Later on, the AIO subsystem will restart the request, in the hopes of getting further. Only parts of the I/O path have been made truly asynchronous; a number of operations (getting block request queue entries, metadata operations) can still block. But it is a step in the right direction.

There are some more experimental bits in circulation as well, including an asynchronous version of get_block() and asynchronous semaphore operations.

The discussion of experimental stuff was Linus's cue to step in and note that he was not all that enthusiastic about merging even the less-experimental parts without more evidence that there is interest in asynchronous I/O. There is also a lack of publicly-available benchmarks showing the benefits of AIO. Linus would like to see benchmarks that he can run that do not involve setting up a huge database. In fact, he wants "random 16-year-olds in Czechoslovakia" (running the kernel apparently leaves him little time for geopolitics) to be trying out AIO benchmarks. With enough benchmarks in circulation, interest in - and use of - AIO should increase. Oracle, it seems, has some benchmarks that it will, with luck, be releasing soon.

Suparna then got into the list of things yet to be done with AIO. These include more performance tuning, support for more filesystems (currently, even O_DIRECT is only supported on ext2, ext3, and JFS), a possible convergence of AIO and the epoll interface, vector AIO (it is more efficient to have a lot of iovecs than a bunch of separate I/O control blocks), implementing the AIO version of fsync(), and network AIO. Network AIO was the topic that drew the most interest; it is, it was said, the "only sane way" to do zero-copy TCP transfers. Linus suggested that the way to get network AIO working was to publish a benchmark showing that Solaris did it faster; then there would be an implementation within two days.

It was also suggested that AIO support should be patched into a few applications; that would create an incentive for implementing and testing AIO features. It turns out that some applications (squid and MySQL were mentioned) already have such support. There were some complaints about the current glibc implementation of the user-space POSIX AIO functions; apparently there is a lock there which limits the number of outstanding operations.

Looking further ahead, some possible improvements to AIO were discussed. Most of these seem to involve having AIO take over much of the rest of the I/O infrastructure - an idea which came up at last year's kernel summit as well. So, for example, the various read and write methods from the file_operations structure could be removed, and aio_read() and aio_write() used for everything. The entire file I/O path could be changed over to a retry-based mode for all operations. In this future, synchronous operations would be implemented by invoking the corresponding asynchronous operations, then waiting for the result. Such changes are far into the 2.7 (or later) future, however.


(Log in to post comments)

Solaris does do it faster !

Posted Jul 23, 2003 12:31 UTC (Wed) by johnjones (guest, #5462) [Link]

Solaris is much faster AIO networking wise than linux

no really trust me.....

john jones

Interest in asynchronous I/O

Posted Jul 23, 2003 23:02 UTC (Wed) by larryr (guest, #4030) [Link]

I have always been mystified by the idea that there is little demand for AIO, or that it needs to prove its merit with benchmarks. To me it seems like it provides a fundamental capability to be utilized directly by applications as integral to their implementation, and to some extent even their design. It is supported by FreeBSD, Solaris, NT, etc.

Oddly enough, around the time 2.5 started, Linus said "I think it's clear that many people do want to have aio support.", and "Done right, it becomes a very natural way of doing event handling, and it could very well be rather useful for many things that use select loops right now."

I guess I should accept that AIO support in 2.6 is going to be weak; I just hope it is good enough to create a demand for dramatic improvements in 2.8...

Larry

Interest in asynchronous I/O

Posted Jul 25, 2003 17:11 UTC (Fri) by GreyWizard (subscriber, #1026) [Link]

Could you (or someone else) explain in more technical detail what the benefit of asynchronous I/O is? Why is it better than a select(2) loop?

Interest in asynchronous I/O

Posted Jul 31, 2003 12:52 UTC (Thu) by mwilck (guest, #1966) [Link]


The asynchronous IO programming paradigm is more intuitive. No select loop needed. You just launch a number of AIO requests, do whatever else you want, and if you need the data you wait for the AIO completion.

Moreover, between the aio submission and its completion the associated buffers belong to the kernel. That is why 0-copy basically only works this way.

Interest in asynchronous I/O

Posted Jul 31, 2003 16:39 UTC (Thu) by johnchx (guest, #4262) [Link]

One problem with select() is that it scales poorly to very large numbers of file descriptors. It works fine with a couple of dozen, but when you're handling 10,000 + network connections, the work of simply passing the file descriptors to and from the system call and the work (when select() returns) of figuring out which of the 10,000 descriptors has become ready can consume a significant fraction of the available cpu cycles.

There's a paper on this at:

http://citeseer.nj.nec.com/cache/papers/.../banga99scalable.pdf

Interest in asynchronous I/O

Posted Nov 26, 2003 10:09 UTC (Wed) by yzhuang (guest, #3178) [Link]

epoll has broken the bottleneck very well. it is much more scalible

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds