> Glibc did aio that way for a while and injecting threads into an otherwise
> unthreaded program tends to have side-effects (on fork/exec, amongst
> others). The emulation sucked in other ways, like no more than one request
> outstanding per FD, WTF!
Interestingly enough, I also thought "WTF" when I came across this in glibc. So I wrote a test patch for glibc which removed this restriction and allowed multiple outstanding reqests per fd.
When I ran my aio test program using this change, I got a factor of 5 speedup with glibc aio. When I ran Samba under this change using SMB1/smbclient, or SMB2 with the re-written Windows redirector (both of which issue multiple outstanding async IO requests) it went *slower*, by a factor of about 0.7 of the single request per-fd speed.
Clearly there is something interesting going on here. Ping me if you want to try my glibc patch for yourself (which I haven't promoted as clearly I don't understand what is going on here :-).