LWN: Comments on "Ringing in a new asynchronous I/O API" https://lwn.net/Articles/776703/ This is a special feed containing comments posted to the individual LWN article titled "Ringing in a new asynchronous I/O API". en-us Fri, 03 Oct 2025 04:37:29 +0000 Fri, 03 Oct 2025 04:37:29 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Guidance on using io_uring to support 60,000+ TCP connections with <1ms RTT https://lwn.net/Articles/950744/ https://lwn.net/Articles/950744/ Tushar <div class="FormattedComment"> Hi,<br> <p> I am working on building a new application which is required to support 60,000+ tcp connections on a single server (preferably a single POSIX thread) with &lt;1ms RTT. I am considering using io_uring for this.<br> <p> I have not found any data for similar application of io_uring for other applications. Do you have some benchmark that I might refer to to see if something like this may be possible.<br> <p> Thanks for your time and help.<br> </div> Thu, 09 Nov 2023 02:30:21 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/788803/ https://lwn.net/Articles/788803/ crzbear <div class="FormattedComment"> this sounds awesome<br> <p> is there any particular reason the kernel has to allocate those buffers<br> couldn't they be passed from userspace in the setup call<br> and then the kernel maps those into its address space<br> <p> while this might obviously lead to not properly aligned buffers,<br> the kernel can check that and return with an error if needed<br> <p> this would do away with the mmapping<br> </div> Fri, 17 May 2019 19:30:26 +0000 How to register more files while using some registered files? https://lwn.net/Articles/788131/ https://lwn.net/Articles/788131/ hnakamur <div class="FormattedComment"> My understanding is you can register new set of file descriptors after unregistering all of old ones.<br> <p> io_uring_register(ring_fd, IORING_REGISTER_FILES, fds, nr_files);<br> io_uring_register(ring_fd, IORING_UNREGISTER_FILES);<br> io_uring_register(ring_fd, IORING_REGISTER_FILES, fds2, nr_files2);<br> <p> But, what to do if you want to add some more file descriptors while using some of already registered<br> file descriptors?<br> </div> Sat, 11 May 2019 10:35:55 +0000 What about async metadata https://lwn.net/Articles/780716/ https://lwn.net/Articles/780716/ josh <div class="FormattedComment"> <font class="QuotedText">&gt; It would be convenient to have a system call that declares 'I plan to read this file in the near future'.</font><br> <p> The readahead system call does that.<br> </div> Tue, 26 Feb 2019 01:53:49 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777484/ https://lwn.net/Articles/777484/ Wol <div class="FormattedComment"> Can't you stick a "t" in there? uring -&gt; turing <br> <p> Cheers,<br> Wol<br> </div> Thu, 24 Jan 2019 16:09:29 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777453/ https://lwn.net/Articles/777453/ joib <div class="FormattedComment"> For inspiration in the IO world, there's IOCP (<a href="https://en.wikipedia.org/wiki/Input/output_completion_port">https://en.wikipedia.org/wiki/Input/output_completion_port</a> ).<br> </div> Thu, 24 Jan 2019 12:37:23 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777452/ https://lwn.net/Articles/777452/ joib <div class="FormattedComment"> What is the mechanics of doing buffered AIO? I faintly recall all those old attempts (syslets, fibrils, whatever) failed partly because there was no natural in-kernel context for progressing those IO's? An in-kernel threadpool is of course always a possibility, but is that noticeably better than a user-space threadpool?<br> </div> Thu, 24 Jan 2019 12:35:32 +0000 What about async metadata https://lwn.net/Articles/777283/ https://lwn.net/Articles/777283/ epa <div class="FormattedComment"> You mentioned posix_fadvise(). That is useful but not quite the stupidly simple interface I had in mind. It requires an open file handle. I envisaged a call that takes a filename and nothing else, works entirely in the background, and does not fail (not even if the file doesn't exist or whatever; it just does nothing in that case).<br> <p> You could then sprinkle these calls all over your code -- including scripting languages -- and get a handy speedup without having to do any real programming.<br> </div> Tue, 22 Jan 2019 12:38:12 +0000 What about async metadata https://lwn.net/Articles/777282/ https://lwn.net/Articles/777282/ epa <div class="FormattedComment"> Yes, I was thinking of a few large files, where the overhead really is in I/O and not in bookkeeping.<br> <p> How about a generalized stat() that lets you open a directory and get info on all the files it contains? That would save a lot of time, and not just for parallel code. Network filesystems, for example.<br> <p> </div> Tue, 22 Jan 2019 12:22:51 +0000 What about async metadata https://lwn.net/Articles/777277/ https://lwn.net/Articles/777277/ dw <div class="FormattedComment"> Isn't this basically what posix_fadvise() gives us already? But IIRC that interface currently or previously blocked while readahead happened.<br> <p> For zipping, imagine something like a 100k item maildir of tiny 1.5kb messages. While the compression is still relatively expensive, a huge chunk of the operation will be wasted on ceremonial serialized filesystem round-trips (open/close/read/stat/getdents/etc). To avoid that I'm not sure there is any way around it except a whole bunch of threads keeping as many FS operations in flight (either doing the CPU bits or any IO bits for uncached data) to get even close to a genuinely busy computer.<br> </div> Tue, 22 Jan 2019 11:35:37 +0000 What about async metadata https://lwn.net/Articles/777276/ https://lwn.net/Articles/777276/ epa <div class="FormattedComment"> It would be convenient to have a system call that declares 'I plan to read this file in the near future'. The kernel would make a best effort to get that file into the page cache, using background I/O, while your process continues. So if you are about to zip up a directory, call plan_to_read() on each file, then continue reading them sequentially as normal. It wouldn't be quite as fast as a true parallel implementation, but for some tasks it could give you 80% of the performance gains without having to rewrite your creaky old sequential code.<br> </div> Tue, 22 Jan 2019 10:09:58 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777180/ https://lwn.net/Articles/777180/ zse <div class="FormattedComment"> Nice that we're finally getting a decent AIO option.<br> <p> I haven't found the complete list of opcodes that are proposed, so don't know if this is already in the works, but I'd think you'll also need synchronization primitives (e.g. a barrier so that all io ops before it need to complete before those after the barrier can start).<br> <p> In general this proposal kind of reminds me of the command queues you have for graphics hardware (OpenGL/Vulkan). I'm wondering if there is potential for (partial) unification or at least mutual inspiration...<br> </div> Sun, 20 Jan 2019 13:22:51 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777176/ https://lwn.net/Articles/777176/ nix <div class="FormattedComment"> Yeah, exactly. I was confused despite being sure I would be confused and checking the parent article before posting (my fault, not the article's).<br> </div> Sun, 20 Jan 2019 10:20:27 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777167/ https://lwn.net/Articles/777167/ zdzichu <div class="FormattedComment"> sqe and cqe are for “submit” and“completion” queues. Your confusion demonstrates the need for less brief identificators :)<br> </div> Sat, 19 Jan 2019 20:27:13 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777164/ https://lwn.net/Articles/777164/ nix <div class="FormattedComment"> Ooh that's honestly nicer than plain read()/write() would be. No -EINTR worries, no short reads, the only bit that makes me squint is the cqe/sqe acronyms, which are perhaps too concise to be easily understandable (it's a bit longer, but maybe _cmd and _stat suffixes would be easier to read? You're really only granting yourself one variable letter is the existing scheme, which is... a small naming budget.)<br> <p> </div> Sat, 19 Jan 2019 19:00:22 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777163/ https://lwn.net/Articles/777163/ nix <div class="FormattedComment"> We're in "passing massive structures and/or arrays of structures" land, and trying to *not* produce a horror show like perf_event_open() is fairly high on my priority list! (Though, really, arguing against myself: the reason perf_event_open() is horrifying is that it does a lot, and honestly the same would be true of any attempt to wrap it in a uring, or in anything else. It's not really a win to move from 'this ioctl/syscall has security holes because the interface has too many edges to test exhaustively' to 'this uringed interface has security holes because the interface has too many edges to test exhaustively', alas. :( )<br> </div> Sat, 19 Jan 2019 18:57:40 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777136/ https://lwn.net/Articles/777136/ axboe <div class="FormattedComment"> It's the brain making that leap, since uring isn't a word it currently recognizes. This will go away as it becomes a bit more ubiquitous. I see no reason to change the name.<br> </div> Fri, 18 Jan 2019 17:22:56 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777094/ https://lwn.net/Articles/777094/ zdzichu <div class="FormattedComment"> Not only yours.<br> </div> Fri, 18 Jan 2019 12:06:37 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777092/ https://lwn.net/Articles/777092/ NAR <div class="FormattedComment"> My first association/misread was uring - urine and that's not that nice name. Maybe I just need new glasses.<br> </div> Fri, 18 Jan 2019 10:36:45 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777079/ https://lwn.net/Articles/777079/ axboe <div class="FormattedComment"> Might even makes sense to just move away from an ioctl, and provide an sqe entry into the driver through the file_operations, for instance.<br> </div> Fri, 18 Jan 2019 04:21:00 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777069/ https://lwn.net/Articles/777069/ axboe <div class="FormattedComment"> Since it already has all the mechanics to do buffered IO async, any ioctl could easily be channeled through the API. We're already grabbing the files/mm of the original process. Depending on how many arguments you need to the ioctl, we'd need to tweak the sqe a bit.<br> <p> With liburing, it should be _very_ easy for applications to use. If you go native, yes, you need to be a bit more careful, and it's more hairy. But even with the basic support liburing has now, you just do:<br> <p> {<br> struct io_uring ring;<br> struct io_uring_sqe *sqe;<br> struct io_uring_cqe *cqe;<br> <p> io_uring_queue_init(queue_depth, &amp;ring, 0);<br> <p> sqe = io_uring_get_sqe(&amp;ring);<br> sqe-&gt;opcode = IORING_OP_READV;<br> sqe-&gt;fd = fd;<br> [...]<br> <p> io_uring_submit(&amp;ring);<br> <p> io_uring_wait_completion(&amp;ring, &amp;cqe);<br> }<br> <p> as a very basic example.<br> </div> Thu, 17 Jan 2019 23:34:55 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777068/ https://lwn.net/Articles/777068/ axboe <div class="FormattedComment"> Don't give away all my secrets :-)<br> </div> Thu, 17 Jan 2019 23:30:06 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777067/ https://lwn.net/Articles/777067/ nix <div class="FormattedComment"> This is such a nice interface, I'm wondering if heavy ioctl() users might be able to reuse some of these ideas to redo their ioctl madness into a nice high-performance command/response ring :) I mean yeah it's not exactly simple for userspace to use, but if the complexity is wrapped away from users its properties are seriously slobbersome.<br> <p> (yes, I help maintain one of those monsters, making heavy use of ioctl() passing intricate structures into and out of the kernel, and the massive use of ioctl() is one thing I at least am hoping to get rid of in the process of getting it ready for upstreaming.)<br> <p> </div> Thu, 17 Jan 2019 23:16:51 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777066/ https://lwn.net/Articles/777066/ nix <div class="FormattedComment"> Nice library name. The plan is to lure people into using it, right? :)<br> </div> Thu, 17 Jan 2019 23:12:46 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/777020/ https://lwn.net/Articles/777020/ axboe <div class="FormattedComment"> Just to follow up on this, if you check the latest repo, the ring_fd is now both pollable (in terms of poll(2)/epoll(2), not io-pollable), and io_uring also supports IORING_OP_POLL to offer the same functionality that aio does in that regard.<br> <p> I believe this caters to both of your needs.<br> </div> Thu, 17 Jan 2019 16:26:52 +0000 What about async metadata https://lwn.net/Articles/776980/ https://lwn.net/Articles/776980/ dw <div class="FormattedComment"> There's always newer and better technology around, but tech is only useful when it's compatible with what you already have :) And ZIPs are eeeverywhere<br> </div> Thu, 17 Jan 2019 12:33:09 +0000 What about async metadata https://lwn.net/Articles/776979/ https://lwn.net/Articles/776979/ Sesse <div class="FormattedComment"> So you want to demonstrate that something is obsolete by implementing… an obsolete compression algorithm? :-)<br> <p> (zlib/deflate is still around pretty much only due to huge transition costs, and a fragmented market among the alternatives. Try something like zstd if you want to make a clean break.)<br> </div> Thu, 17 Jan 2019 12:30:57 +0000 Physically-contiguous buffers https://lwn.net/Articles/776968/ https://lwn.net/Articles/776968/ nilsmeyer <div class="FormattedComment"> And you could potentially allocate the buffers as huge pages, right? <br> </div> Thu, 17 Jan 2019 09:45:33 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776961/ https://lwn.net/Articles/776961/ axboe <div class="FormattedComment"> I'm addressing what is a misconception in the original comment. It referred to the "bored" part, which is when you have the kernel side doing polling for you. That has nothing to do with epoll(), and epoll() would not be a solution for this at all.<br> <p> If the ring_fd should be pollable, in terms of epoll, absolutely. That would be trivial to add. It would NOT work for IORING_SETUP_IOPOLL for obvious reasons, as you can't sleep for those kinds of completions. But for "normal", IRQ driven IO, adding epoll() support for the CQ side of the ring_fd is straight forward. On the SQ ring side, there's nothing to epoll for. The application knows if the ring is writeable (eg can hold new entries) without entering the kernel.<br> <p> Outside of that, my IOCB_CMD_POLL reference has to do with this:<br> <p> <a href="https://lwn.net/Articles/743714/">https://lwn.net/Articles/743714/</a><br> <p> and adding IORING_OP_POLL for similar functionality on the io_uring side.<br> </div> Thu, 17 Jan 2019 03:28:30 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776960/ https://lwn.net/Articles/776960/ samroberts <div class="FormattedComment"> I think you can forgive the OP for calling what epoll() does "polling", though it doesn't fit your definition, given its the name of the syscall!<br> <p> The point stands: io_uring should be easily useable with poll/select/epoll so it can be integrated with existing event loop based code, networking code in particular are heavy users of these calls. Specifically, this fd<br> <p> <font class="QuotedText">&gt; The return value from io_uring_setup() is a file descriptor that can then be passed to mmap() to map the buffer into the process's address space.</font><br> <p> should be epoll()able.<br> </div> Thu, 17 Jan 2019 03:12:09 +0000 What about async metadata https://lwn.net/Articles/776956/ https://lwn.net/Articles/776956/ dw <div class="FormattedComment"> I have it on my todo list to write a fully CPU/IO-parallel ZIP implementation (because it's fairly straightforward), with an article around it highlighting most of the traditional UNIX tooling is utterly obsolete on pretty much all modern devices. Naturally it can't really benefit from the work here due to the parent comment, but yeah, the problem is very real, and frankly an entirely ridiculous state of affairs<br> </div> Thu, 17 Jan 2019 01:03:19 +0000 Physically-contiguous buffers https://lwn.net/Articles/776950/ https://lwn.net/Articles/776950/ ms-tg <div class="FormattedComment"> This sort of comment by the patch author is a major example of the value of LWN<br> </div> Wed, 16 Jan 2019 23:31:19 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776936/ https://lwn.net/Articles/776936/ HIGHGuY <div class="FormattedComment"> Would the interface support operations that require 2 file descriptors, like splicing from a file to a socket and vice versa? I have the impression it doesn’t but could have easily missed something.<br> <p> Slightly more complex operations could be useful in combination with primitives like the P2P PCIe transfers that are being worked on to avoid going through main memory altogether.<br> </div> Wed, 16 Jan 2019 20:09:44 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776928/ https://lwn.net/Articles/776928/ axboe <div class="FormattedComment"> It is, check the git repo!<br> </div> Wed, 16 Jan 2019 19:30:56 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776927/ https://lwn.net/Articles/776927/ arjan <div class="FormattedComment"> void *addr; /* buffer or iovecs */<br> <p> <p> hmm that makes 32/64 compat funky.. wonder if it really should just be a u64<br> </div> Wed, 16 Jan 2019 19:24:52 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776915/ https://lwn.net/Articles/776915/ axboe <div class="FormattedComment"> io_uring will grow support for IORING_OP_POLL, but it's outside the scope of the initial implementation. This will work similarly to the recently added IOCB_CMD_POLL support for aio.<br> <p> Apart from that, I do think you're mixing up the polling with the io polling. One provides a way to signal when data is ready, the other skips IRQs in favor of busy polling for completion events.<br> </div> Wed, 16 Jan 2019 18:07:21 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776914/ https://lwn.net/Articles/776914/ axboe <div class="FormattedComment"> System calls should go into libc, but anything else will reside in liburing. You can clone that here:<br> <p> git://git.kernel.dk/liburing<br> <p> though not a lot of items are in there yet. It does contain helpers to setup/teardown the ring, and submit/complete helpers for applications that don't want (or need) to muck with the ring itself. This will grow some more features, the intent is that most applications will _probably_ end up using that instead of handling all the details themselves.<br> </div> Wed, 16 Jan 2019 18:05:21 +0000 Physically-contiguous buffers https://lwn.net/Articles/776911/ https://lwn.net/Articles/776911/ axboe <div class="FormattedComment"> Not only that, but you could also pre-map the SG lists, instead of having to do map SG and unmap SG for each IO. Right now the registered buffers only avoid the get_user_pages() and put_pages() for each IO, which is (by far) the biggest overhead. But if we fix the kernel parts as well, then we can avoid the dma map/unmap for each IO. That'd bypass the split as well, some quick mental math shows we should be able to kill ~5% of the overhead on my box with that.<br> <p> In general we have various pieces of low hanging fruit on the block layer side, which are readily apparent now that we have an efficient interface into the kernel. Work in progress! But I'd like to wrap up io_uring first.<br> </div> Wed, 16 Jan 2019 18:03:46 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776902/ https://lwn.net/Articles/776902/ nix <div class="FormattedComment"> You can just leave it as a separate library, like libaio is now, and libattr, and libacl, and many others. (Though if it's useful and glibc wants to use it itself, it's not unimaginable that it might find its way in there in time -- but glibc has harsh backward-compatibility constraints that argue in favour of a trial period in an external library in any case, until we know whether the API the library provides works well.)<br> </div> Wed, 16 Jan 2019 16:19:35 +0000 Ringing in a new asynchronous I/O API https://lwn.net/Articles/776899/ https://lwn.net/Articles/776899/ me@jasonclinton.com <div class="FormattedComment"> <font class="QuotedText">&gt; It's perhaps worth noting at this point that Axboe is working on a user-space library that will hide much of the complexity of this interface from most users.</font><br> <p> What's the procedure for a user-space library tightly coupled to a kernel API, like this one, getting into glibc (or any of the rest of the libcs)?<br> <p> </div> Wed, 16 Jan 2019 16:15:56 +0000