|
|
Log in / Subscribe / Register

Mazzoli: How fast are Linux pipes anyway?

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 4, 2022 14:19 UTC (Sat) by atnot (guest, #124910)
In reply to: Mazzoli: How fast are Linux pipes anyway? by martin.langhoff
Parent article: Mazzoli: How fast are Linux pipes anyway?

The core of the issue is really with the ownership of the buffers in the write() api.

write() takes the address of a buffer that the user owns. Because it can not make any assumptions about how long that data is going to remain valid after the syscall, it has no choice but to copy. Because the memory was allocated ahead of time by the user, it can't really make any smart decisions about page size either. Then on read, it has to do the same thing again.

I guess for pipes specifically, one could imagine a flag which would make write() block until there is a corresponding read() call on the other side of the pipe, which would eliminate one copy. But with differing buffer sizes, non-blocking IO and other considerations I presume that'd be a lot of complexity that's unlikely to be worth it, especially considering the fact that you apparently can't even compute fizzbuzz fast enough to completely saturate a pipe like this. [1]

In that sense I'd say that the premise is wrong and write() is actually already plenty fast enough by default.

[1] In scenarios where this does matter there are usually already plenty of better, specialized solutions like mmap, XDP, dpdk, netfilter, etc.


to post comments

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 5, 2022 17:10 UTC (Sun) by Wol (subscriber, #4433) [Link] (9 responses)

> write() takes the address of a buffer that the user owns. Because it can not make any assumptions about how long that data is going to remain valid after the syscall, it has no choice but to copy.

What about COW? (Might be tricky I know, but just COW the page containing the buffer.)

Cheers,
Wol

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 5, 2022 18:58 UTC (Sun) by atnot (guest, #124910) [Link] (8 responses)

I don't think that would help. CoW is only faster if the memory isn't actually written to again. But applications don't really keep buffers around unmodified for posterity, nor would there be any real way for them to know when it's okay to reuse them. So in practice, you're likely to just end up with the page that contains the buffer being immediately written to again, at which point you're at the status quo again except with additional page faults and fragmentation. There's not really any way to solve this without deviating significantly from the write() API, which is probably one reason people keep inventing new ways of doing it.

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 8, 2022 1:08 UTC (Wed) by willy (subscriber, #9762) [Link]

Worse than that, write() would have to invalidate the TLB entry for the page in question (in order to make COW work). TLB invalidation are slower than copies.

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 9, 2022 5:09 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (6 responses)

Why can't we just mmap the pipe? It's ultimately "just" a buffer in kernelspace, is it really that hard to add a userspace mapping for it?

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 9, 2022 5:12 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

(Clarifying: I am aware that you cannot mmap pipes. I'm asking why this restriction shouldn't/can't be lifted.)

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 9, 2022 21:07 UTC (Thu) by anton (subscriber, #25547) [Link]

Peter Syrowatka added mmap() for pipes in his Diplomarbeit (master's thesis, in German).

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 9, 2022 15:01 UTC (Thu) by willy (subscriber, #9762) [Link] (3 responses)

What does it mean to mmap() a pipe?

Let's suppose I have a pipefd and addr = mmap(offset=0, length=1M). Then I call read(4kB) on pipefd. Does the pipe shuffle down so that *addr now refers to what was at 4kB, or does it still have a reference to what was at 0 when I called mmap()?

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 10, 2022 17:32 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

Whichever is easier. As long as it's consistent and well-documented, userspace can figure out the rest.

However, I should point out that, if the pipe does not shuffle down, then you need to add an API for telling userspace the current read/write offsets, so that userspace knows where to begin reading or writing. Regardless, you also want an API for setting (or at least advancing) those offsets, so that userspace can emulate read/write calls. Therefore, you might as well not bother with the shuffling-down logic and just provide full get/set support for the offsets.

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 10, 2022 17:36 UTC (Fri) by willy (subscriber, #9762) [Link] (1 responses)

Haven't you just reinvented shared memory?

Mazzoli: How fast are Linux pipes anyway?

Posted Jun 10, 2022 17:55 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

Well gee, I thought that's what vmsplice(2) was trying to do in the first place.

But I think the big difference is this: If you e.g. create a file in /dev/shm (or any tmpfs) and just keep writing more and more data to it, it'll get bigger and bigger indefinitely, so you have to periodically seek to zero at both ends (and/or truncate it). Pipes are effectively ring buffers, so they don't have this problem. Your consumer can just call read, and doesn't have to know anything about your fancy mmap nonsense.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds