|
|
Subscribe / Log in / New account

The rapid growth of io_uring

The rapid growth of io_uring

Posted Jan 24, 2020 21:51 UTC (Fri) by axboe (subscriber, #904)
In reply to: The rapid growth of io_uring by Sesse
Parent article: The rapid growth of io_uring

As soon as the splice stuff is integrated, you'll have just that. When I initially wrote splice, at the same time I turned sendfile() into a simple wrapper around it. So if you have splice, you have sendfile as well.


to post comments

The rapid growth of io_uring

Posted Sep 2, 2021 12:36 UTC (Thu) by awkravchuk (guest, #154070) [Link] (8 responses)

How exactly did you do that for the general case? splice(2) requires that one of the fds is a pipe, so it looks like sending e.g. disk file to TCP socket wouldn't be possible.

The rapid growth of io_uring

Posted Sep 2, 2021 14:02 UTC (Thu) by farnz (subscriber, #17727) [Link] (7 responses)

There's special cases in the kernel (but not, it appears, in the manpage) to allow file to anything splicing by allocating a secret internal pipe. Look at fs/splice.c, function splice_direct_to_actor for the code.

The rapid growth of io_uring

Posted Sep 2, 2021 16:49 UTC (Thu) by awkravchuk (guest, #154070) [Link] (6 responses)

Great stuff, thanks!
Is there a way to use this from userspace? I'm not an actual kernel hacker, just trying io_uring out :)

The rapid growth of io_uring

Posted Sep 2, 2021 17:14 UTC (Thu) by farnz (subscriber, #17727) [Link] (5 responses)

Looks like no way to use this from userspace.

However, in the io_uring case, you should be able to build a splice-based sendfile yourself using a pipe you create via pipe(2) to act as the buffer. Or do similar via fixed buffers in the ring instead of splice.

The rapid growth of io_uring

Posted Sep 2, 2021 17:40 UTC (Thu) by awkravchuk (guest, #154070) [Link]

That's what I thought. Thank you for clarification!

The rapid growth of io_uring

Posted Nov 3, 2023 4:03 UTC (Fri) by leo60228 (guest, #167812) [Link] (3 responses)

This doesn't work if you want to have multiple calls with different offsets in flight at once, though, does it? If there's a partial read/write, the next splice would use the wrong data, and I can't think of a way to avoid that without only having one splice in flight at a time (which kind of defeats the point, at least for my application).

The rapid growth of io_uring

Posted Nov 3, 2023 12:07 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

If you're building your own equivalent of this trick in io_uring, you can have multiple splices in flight at once; you'd be using two linked SQEs, one of which splices input into the pipe, and the other of which splices the pipe into the output. Offset tracking is in your hands at this point.

The kernel's trick is simply to create the pipe for you if you don't provide one, and you're doing sync I/O from a file to something not-a-file.

The rapid growth of io_uring

Posted Nov 3, 2023 22:19 UTC (Fri) by leo60228 (guest, #167812) [Link] (1 responses)

I tried implementing that. What I mean is that, for example, if you were trying to copy 0x8000 byte chunks at a time, and the first SQE splicing from the input file to the pipe only copied 0x4000 bytes, the linked SQE would still try to read 0x8000 bytes from the pipe and potentially get the wrong data. Additionally, I'm not sure there's a guarantee that these linked SQEs would be atomic (i.e. if SQE A was linked to SQE B, and SQE C was linked to SQE D, the order A, C, B, D could be allowed and result in data being written to the wrong offsets in the output).

Thinking about it more, I suppose this could be solved by creating a large number of pipes, and making sure that no two SQEs using the same pipe are in flight at the same time. I'm concerned that having many pipes could result in its own performance issues, but it'd probably be fine...?

The rapid growth of io_uring

Posted Nov 5, 2023 10:12 UTC (Sun) by farnz (subscriber, #17727) [Link]

You don't have one pipe for all splices; you have one pipe per splice. If the first SQE copies 0x4000 bytes, then the linked SQE can only copy 0x4000 bytes out of the pipe, because there's only 0x4000 bytes in there to copy out. This is exactly what the kernel trick is - create a temporary pipe for the splice to use, so that you're always splicing in and out of pipes.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds