|
|
Subscribe / Log in / New account

Zero-copy networking

Zero-copy networking

Posted Jul 4, 2017 12:18 UTC (Tue) by sorokin (guest, #88478)
Parent article: Zero-copy networking

Linux people should keep an eye on what Microsoft do. Microsoft had zero copy networking for ages. I believe they had had I/O completion ports (queues where notifications that I/O operation is completed are pushed) even before Linux got epoll support.

In Microsoft they realized that locking pages in memory is expensive. And in Windows 8 they come up with an API called Registered I/O. It requires an application to register buffers in advance. Then I/O operations just use these registered buffers and therefore don't need to lock any pages.

I believe in Linux kernel people should just skip designing zero-copy operations altogether and just implement Registered I/O.


to post comments

Zero-copy networking

Posted Jul 4, 2017 18:49 UTC (Tue) by einstein (subscriber, #2052) [Link] (1 responses)

Good point - In this case, Linux devs ought to look at borrowing worthwhile features even from crappy OSes.

Zero-copy networking

Posted Dec 19, 2017 22:23 UTC (Tue) by immibis (subscriber, #105511) [Link]

The Windows I/O stack has always felt lightyears ahead of Linux to me.

Just because several components are tied together in one product with annoying marketing, doesn't mean they all suck. I'm sure the kernel I/O developers had nothing to do with the start screen.

Zero-copy networking

Posted Jul 5, 2017 13:17 UTC (Wed) by abatters (✭ supporter ✭, #6932) [Link]

I don't know Windows, but this is exactly what I was thinking. Register the buffers ahead of time to save the overhead of pinning and unpinning the pages over and over again. See also: SCSI generic (drivers/scsi/sg.c) mmap()ed I/O.

Zero-copy networking

Posted Jul 5, 2017 14:35 UTC (Wed) by clameter (subscriber, #17005) [Link] (2 responses)

Linux has had zero copy networking for more than 10 years. Use the RDMA subsystem to send messages. The RDMA subsystem can even receive messages(!!!). The RDMA subsystem can not only perform remote DMA operations but also send and receive datagrams.

Zero-copy networking

Posted Jul 6, 2017 21:52 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (1 responses)

It has even worked for userland for a while. HAProxy successfully makes use of splice() to perform zero-copy transfers between sockets (receive AND send).

Also it seems to me that this send(MSG_ZEROCOPY) is not much different from doing a vmsplice().

Zero-copy networking

Posted Jul 11, 2017 0:00 UTC (Tue) by klempner (subscriber, #69940) [Link]

The entire point of MSG_ZEROCOPY is the notification that the kernel is done so the memory can be unpinned and potentially freed/reused. This isn't a problem for HAProxy's socket to splice pipe to socket doesn't have that problem because in that case the memory never has a userspace address.

The fundamental problem with application to vmsplice pipe to TCP socket is that you don't know when the pages in question are done and can be freed/modified, and if you modify them you're liable to start, say, leaking crypto keys out to the network if that memory gets reused before the TCP stack is done with it.

Zero-copy networking

Posted Jul 5, 2017 21:27 UTC (Wed) by k8to (guest, #15413) [Link]

Async I/O a la completion ports shouldn't be news to anyone, given that this was the standard means of doing I/O on VMS in the 1980s. The downsides are that's it's tricker to get right (there's a lot more opportunity for creating bugs in the application code), and that reaping the benefits means receiving data in an interface that looks essentially nothing like sockets.

Those are prices that must be paid for completely maximizing throughput in high transaction count scenarios, but they're an awkward price for most users.

The registered I/O tweak is relatively recent and somewhat informative however, driven as it is by modern hardware requirements instead of ancient design concerns.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds