User: Password:
|
|
Subscribe / Log in / New account

Async I/O

Async I/O

Posted Nov 24, 2010 22:02 UTC (Wed) by kleptog (subscriber, #1183)
In reply to: Async I/O by wahern
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

The fundamental problem with AIO is that the block drivers aren't interruptible at their core. They need to run on a thread and block waiting on certain hardware operations.

Is that really true? I was under the impression that with modern hardware interfaces were becoming more network-like, with sending and receiving messages. You can have dozens of outstanding requests queued on a modern hard disk, I'm not sure why AIO is a big deal. Why is requesting data off a file on disk different from requesting a block from a network block device?

The messy part about POSIX AIO is more the interface, using signals again while they are completely inappropriate for the task (see the earlier article). Yes, you can emulate it with threads, but they have their own baggage. Glibc did aio that way for a while and injecting threads into an otherwise unthreaded program tends to have side-effects (on fork/exec, amongst others). The emulation sucked in other ways, like no more than one request outstanding per FD, WTF!

What I'd like to see is a message passing interface connecting you to the block device queue, where you submit your requests as messages and receive the data back as messages. Perhaps even a read giving you the data preceded by a header saying which FD it came from. That saves the kernel keeping pointers to user-space. send/recvmsg() seem especially suited.

Thanks for the kqueue tip, that looks like a solid solution.


(Log in to post comments)

Async I/O

Posted Nov 25, 2010 0:01 UTC (Thu) by neilbrown (subscriber, #359) [Link]

> Why is requesting data off a file on disk different from requesting a block from a network block device?

Requesting a block from a disk maybe isn't. Inside Linux you call 'submit_bio' and inside the bio is a callback pointer which will eventually get called (theoretically it could be called even before the submit_bio call completes). It is completely async and it would be quiet easy to plug that into an async user-space interface if you had a good one.

But requesting data off a file is different. The fs might have to load one or more indirect blocks before it even knows where to look for the block. And then (for a more advanced fs), it might want to respond to a read failure in some arbitrarily complex fashion. Obviously this is all possible. The fs could encode that state concerning 'what to do next' in some structure and attach it to the bio that is sent off.

In generally you might have arbitrarily many levels each of which attaches state and a call-back pointer to a request and ships it down to the lower level, and that state can be arbitrarily complex and will often have 'stage' information for walking through a state-machine. What you end up building looks a lot like a traditional call stack with local variable and return addresses. But we already have a well-tested and well-understood mechanism for holding local variable and return addresses. We call it a 'thread'.

So the question is: when is it better to have lots of threads each running traditional single-threaded code, and when it is better to encode state in a separate data structure and use a state machine to process it all 'asynchronously'.

Naturally we have both approaches in the Linux kernel so we can compare.

The examples of state-machines that I am aware of are the block layer (submit_bio mentioned above) and the sunrpc layer used by NFS. Obviously the tcp/ip stack would work like a state machine, but I have no directly familiarity with its complexity. It would have the benefit of being designed and specified as a state machine, so that might help it a little.

The block layer has a single level - the queue. When lower-levels want to do something extra, they make another request and put it back on the queue. This leads to a modest amount of complexity in different flags on requests saying different things about how much the queue handlers are allowed to re-order them. It is manageable, but it isn't simple.

The sunrpc layer has wheels-within-wheels. Much like the 'indirect block' lookup mentioned earlier, sunrpc might need to perform a portmap lookup to find out which port to send a request to, and might need to try again after a timeout. I think the sunrpc code is reasonably well structured in that is isn't too hard to get an over-view of how things flow around. But I find that going much deeper gets really hard. Following error statuses around is particularly a challenge.

If these two are good examples, I would suggest that these state-machine-with-explicit-state approaches should be avoided where possible. I think it cannot be avoided in the block layer. I am less convinced about sunrpc, though at the time it was written it may well have been the best option.

I would be very hesitant about suggesting that filesystems be re-written to be completely asynchronous. It would be much more robust to find a way to fork a blocked task of into a separate thread.

Then an AIO request would either return a status or a pid, and as you suggested, reading from some fd would return the pids of the tasks as they complete. 'kill' could be used as a cancellation interface. This has all largely been suggested already on lkml. I seem to recall there were some uncertainties about making sure the user-space side always had the same pid. So maybe some issues never got ironed out and nobody put the time in to implementing it (that I am aware of).

http://lwn.net/Articles/316806/ seems relevant here.

Async I/O

Posted Nov 25, 2010 17:39 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Thanks a lot for that explanation, I see the difficulties now.

State machines are nice and efficient, but hard to understand and program. What are probably the most common large state machines, parsers, are often generated by other programs (flex & bison) from a higher level description, whose output no human can understand directly.

I don't think anyone has yet defined a language for filesystems that could be used to generate an appropriate state machine, perhaps the state of the art just isn't there yet.

Async I/O

Posted Nov 27, 2010 2:34 UTC (Sat) by skissane (guest, #38675) [Link]

Couldn't this be a good case for writing a kernel in a language that had coroutines or continuations? One can write the code in the easier to follow procedural style, and then make it asynchronous just by adding "yield" or "call/cc" at the appropriate points. I wonder how hard it would be to come up with a version of C with coroutines, and whether that could be used with the Linux kernel...

Is there some way you could take the synchronous implementation of a driver or filesystem or other kernel component, determine the blocking points (or just require them to be annotated), and then automatically generate a state-machine-based asynchronous implementation from the synchronous one? A single source code for both synch and asynch implementations would make one more confident they were both correct.

Async I/O

Posted Nov 27, 2010 3:06 UTC (Sat) by foom (subscriber, #14868) [Link]

But why bother? Kernel threads *are* such a state machine, just a very convenient form. What good would reinventing threads do?

Async I/O

Posted Nov 29, 2010 11:05 UTC (Mon) by marcH (subscriber, #57642) [Link]

> Yes, you can emulate it with threads, but they have their own baggage.

That's an understatement

http://stackoverflow.com/questions/220752/what-is-the-c-m...
http://blogs.sun.com/dave/entry/parallelism_manifesto

etc.

Async I/O

Posted Nov 29, 2010 19:42 UTC (Mon) by jra (subscriber, #55261) [Link]

> Glibc did aio that way for a while and injecting threads into an otherwise
> unthreaded program tends to have side-effects (on fork/exec, amongst
> others). The emulation sucked in other ways, like no more than one request
> outstanding per FD, WTF!

Interestingly enough, I also thought "WTF" when I came across this in glibc. So I wrote a test patch for glibc which removed this restriction and allowed multiple outstanding reqests per fd.

When I ran my aio test program using this change, I got a factor of 5 speedup with glibc aio. When I ran Samba under this change using SMB1/smbclient, or SMB2 with the re-written Windows redirector (both of which issue multiple outstanding async IO requests) it went *slower*, by a factor of about 0.7 of the single request per-fd speed.

Clearly there is something interesting going on here. Ping me if you want to try my glibc patch for yourself (which I haven't promoted as clearly I don't understand what is going on here :-).

Jeremy.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds