The rapid growth of io_uring
The rapid growth of io_uring
Posted Feb 9, 2020 22:47 UTC (Sun) by dcoutts (subscriber, #5387)Parent article: The rapid growth of io_uring
The io_uring API is designed to scale to a moderate number of simultaneous I/O operations. The size of the submit and collect rings has to be enough to cover all the simultaneous operations. The max ring size is 4k entries. This is fine for the use case of disk I/O, connect, accept etc.
It's not fine for the "10k problem" of having 10s of 1000s of idle network connections. That's what epoll is designed for. We don't really want to have 10s of 1000s of pending async IO recv operations, we just want to wait for data to arrive on any connection, and then we can execute the IO op to collect the data.
So what's the idea for handling large numbers of network connections using io_uring, or some combo of io_uring and epoll? We have IORING_OP_POLL_ADD but of course this costs one io_uring entry so we can't go over 4k of them. There's IORING_OP_EPOLL_CTL for adjusting the fds in an epoll set. But there's no io_uring operation for epoll_wait. So do we have to use both io_uring and epoll_wait? Now that needs two threads, so no nice single-threaded event loop.
Perhaps I'm missing something. If not, isn't the obvious thing to add support for IORING_OP_EPOLL_WAIT? Then we can use IORING_OP_EPOLL_CTL to adjust the network fds we're monitoring and then issue a single IORING_OP_EPOLL_WAIT to wait for any network fd to have activity.
Alternatively, io_uring could subsume the epoll API entirely. The single-shot style of IORING_OP_POLL_ADD is actually very nice. But it has to scale to the 10k+ case, so cannot consume a completion queue entry for each fd polled like IORING_OP_POLL_ADD does.
