User: Password:
|
|
Subscribe / Log in / New account

OLS: A proposal for a new networking API

OLS: A proposal for a new networking API

Posted Jul 23, 2006 15:55 UTC (Sun) by dps (subscriber, #5725)
Parent article: OLS: A proposal for a new networking API

One of the limitations of poll(2) and select(2) is that to deteremine which file descritpitors met the conditions you need to use a loop. If there are a large number of sockets and high preformance is critical is arguably suboptimal. Swithcing to SIGIO is not a complete solution becuase two fd's changing generate only one SIGIO, unless I have misunderstood something.

An equivialnet interface which more directly indicate the file descriptions that met the conditions would be useful.

One could propose implementing zero copy I/O by marking the pages that read() or write() refer to copy on write, using them directly in kernel space, and giving those that scribble on those pages a copy. I can see that read(2) might need to know this happenned and perform a copy after all. Disclaimer: I have not investigated the limitations of real hardware or size of any mm changes required.


(Log in to post comments)

OLS: A proposal for a new networking API

Posted Jul 23, 2006 17:16 UTC (Sun) by cventers (guest, #31465) [Link]

> One could propose implementing zero copy I/O by marking the pages that
> read() or write() refer to copy on write, using them directly in kernel
> space, and giving those that scribble on those pages a copy. I can see
> that read(2) might need to know this happenned and perform a copy after
> all. Disclaimer: I have not investigated the limitations of real
> hardware or size of any mm changes required.

Something like this was already proposed. The trouble is that the faulting
process is fairly expensive, and once you have to do the copy _plus_ the
TLB flushing you've just spent measurably more time than you would have
just doing the copy in the first place.

Copy on write is good in some places. During a fork, you have to
invalidate the TLB anyways, so it doesn't hurt too much to implement CoW
there (especially since many of the pages won't ever be copied, either due
to the application calling execve() or the application being a daemon like
Apache wherein every child only has some portion of non-shared data).

But playing tricks with virtual memory elsewhere (such as in the
networking hot path) is a really bad idea.

Further discussion: http://kerneltrap.org/node/6506

OLS: A proposal for a new networking API

Posted Jul 23, 2006 19:10 UTC (Sun) by busterb (subscriber, #560) [Link]

There are a number of poll/select alternatives in various operating systems that work around this limitation by returning a list of pointers to only the descriptors that have received an event. On Linux, see epoll(4). libevent is a nice library that abstracts away OS-specific mechanisms such as these.

OLS: A proposal for a new networking API

Posted Jul 24, 2006 13:01 UTC (Mon) by kleptog (subscriber, #1183) [Link]

This is where real-time signals come in. They queue, so if two signals were sent you receive two (mostly). You can also arrange to have data attached to the signal so you know who sent it.

I say mostly because there's one caveat: while the signals do queue, they don't queue indefinitly. I think linux cuts the queue at 32. Once the queue is full, the program has to go back to polling the sockets, which is what you're trying to avoid.

I beleive that's the reason they never caught on.

OLS: A proposal for a new networking API

Posted Aug 4, 2006 4:43 UTC (Fri) by efexis (guest, #26355) [Link]

...or even swapping with a fresh page? For example, you allocate a 4k page, construct the message, and 'send' it. At that moment, the page table entry that the user process sees is swapped with the one the kernel/driver sees. The user process now has an empty page at that location, ready to write the next message into, and the kernel has access to the page to send to the device.

Same with recieving; device writes page to memory, kernel figures out where it has to go, and adds it to the virtual address space of that process where the process has requested it. That page is now allocated to that process, so the next read would come into a different address. There's no need for copy-on-write, as a page is no longer needed after being passed along, so an empty page in it's place would suffice.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds