Ringing in a new asynchronous I/O API

Posted Jan 17, 2019 3:12 UTC (Thu) by samroberts (subscriber, #46749)
In reply to: Ringing in a new asynchronous I/O API by axboe
Parent article: Ringing in a new asynchronous I/O API

I think you can forgive the OP for calling what epoll() does "polling", though it doesn't fit your definition, given its the name of the syscall!

The point stands: io_uring should be easily useable with poll/select/epoll so it can be integrated with existing event loop based code, networking code in particular are heavy users of these calls. Specifically, this fd

> The return value from io_uring_setup() is a file descriptor that can then be passed to mmap() to map the buffer into the process's address space.

should be epoll()able.

Ringing in a new asynchronous I/O API

Posted Jan 17, 2019 3:28 UTC (Thu) by axboe (subscriber, #904) [Link] (9 responses)

I'm addressing what is a misconception in the original comment. It referred to the "bored" part, which is when you have the kernel side doing polling for you. That has nothing to do with epoll(), and epoll() would not be a solution for this at all.

If the ring_fd should be pollable, in terms of epoll, absolutely. That would be trivial to add. It would NOT work for IORING_SETUP_IOPOLL for obvious reasons, as you can't sleep for those kinds of completions. But for "normal", IRQ driven IO, adding epoll() support for the CQ side of the ring_fd is straight forward. On the SQ ring side, there's nothing to epoll for. The application knows if the ring is writeable (eg can hold new entries) without entering the kernel.

Outside of that, my IOCB_CMD_POLL reference has to do with this:

https://lwn.net/Articles/743714/

and adding IORING_OP_POLL for similar functionality on the io_uring side.

Ringing in a new asynchronous I/O API

Posted Jan 17, 2019 16:26 UTC (Thu) by axboe (subscriber, #904) [Link] (8 responses)

Just to follow up on this, if you check the latest repo, the ring_fd is now both pollable (in terms of poll(2)/epoll(2), not io-pollable), and io_uring also supports IORING_OP_POLL to offer the same functionality that aio does in that regard.

I believe this caters to both of your needs.

Ringing in a new asynchronous I/O API

Posted Jan 17, 2019 23:16 UTC (Thu) by nix (subscriber, #2304) [Link] (7 responses)

This is such a nice interface, I'm wondering if heavy ioctl() users might be able to reuse some of these ideas to redo their ioctl madness into a nice high-performance command/response ring :) I mean yeah it's not exactly simple for userspace to use, but if the complexity is wrapped away from users its properties are seriously slobbersome.

(yes, I help maintain one of those monsters, making heavy use of ioctl() passing intricate structures into and out of the kernel, and the massive use of ioctl() is one thing I at least am hoping to get rid of in the process of getting it ready for upstreaming.)

Ringing in a new asynchronous I/O API

Posted Jan 17, 2019 23:34 UTC (Thu) by axboe (subscriber, #904) [Link] (6 responses)

Since it already has all the mechanics to do buffered IO async, any ioctl could easily be channeled through the API. We're already grabbing the files/mm of the original process. Depending on how many arguments you need to the ioctl, we'd need to tweak the sqe a bit.

With liburing, it should be _very_ easy for applications to use. If you go native, yes, you need to be a bit more careful, and it's more hairy. But even with the basic support liburing has now, you just do:

{
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;

io_uring_queue_init(queue_depth, &ring, 0);

sqe = io_uring_get_sqe(&ring);
sqe->opcode = IORING_OP_READV;
sqe->fd = fd;
[...]

io_uring_submit(&ring);

io_uring_wait_completion(&ring, &cqe);
}

as a very basic example.

Ringing in a new asynchronous I/O API

Posted Jan 18, 2019 4:21 UTC (Fri) by axboe (subscriber, #904) [Link] (1 responses)

Might even makes sense to just move away from an ioctl, and provide an sqe entry into the driver through the file_operations, for instance.

Ringing in a new asynchronous I/O API

Posted Jan 19, 2019 18:57 UTC (Sat) by nix (subscriber, #2304) [Link]

We're in "passing massive structures and/or arrays of structures" land, and trying to *not* produce a horror show like perf_event_open() is fairly high on my priority list! (Though, really, arguing against myself: the reason perf_event_open() is horrifying is that it does a lot, and honestly the same would be true of any attempt to wrap it in a uring, or in anything else. It's not really a win to move from 'this ioctl/syscall has security holes because the interface has too many edges to test exhaustively' to 'this uringed interface has security holes because the interface has too many edges to test exhaustively', alas. :( )

Ringing in a new asynchronous I/O API

Posted Jan 19, 2019 19:00 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

Ooh that's honestly nicer than plain read()/write() would be. No -EINTR worries, no short reads, the only bit that makes me squint is the cqe/sqe acronyms, which are perhaps too concise to be easily understandable (it's a bit longer, but maybe _cmd and _stat suffixes would be easier to read? You're really only granting yourself one variable letter is the existing scheme, which is... a small naming budget.)

Ringing in a new asynchronous I/O API

Posted Jan 19, 2019 20:27 UTC (Sat) by zdzichu (subscriber, #17118) [Link] (1 responses)

sqe and cqe are for “submit” and“completion” queues. Your confusion demonstrates the need for less brief identificators :)

Ringing in a new asynchronous I/O API

Posted Jan 20, 2019 10:20 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, exactly. I was confused despite being sure I would be confused and checking the parent article before posting (my fault, not the article's).

Ringing in a new asynchronous I/O API

Posted Jan 24, 2019 12:35 UTC (Thu) by joib (subscriber, #8541) [Link]

What is the mechanics of doing buffered AIO? I faintly recall all those old attempts (syslets, fibrils, whatever) failed partly because there was no natural in-kernel context for progressing those IO's? An in-kernel threadpool is of course always a possibility, but is that noticeably better than a user-space threadpool?