The first complaint is not that significative, IMHO. First off, Linux is quite efficient at syscalls compared to many other Unixes, and where on some of those other Unixes, syscalls are to be avoided like the plague, on Linux you get to worry about this only in the most extreme cases. But also, select/poll/epoll (and any other mechanism which retrieves a number of events at a time) have this property of having a lower and lower syscall overhead as the load increases: the more events are returned, the more time is spent between calls to select/poll/epoll (in order to process those events), and they are thus called less and less often, with big chunks of events returned each time.
And again, for the Nth time, the ring buffer does have a syscall every so many events (not unlike a select/poll/epoll syscall every so many events received)! There is a difference when the load is low, as the application can still call kevent_commit() once per N events received, instead of just getting fewer events per call as with select/poll/epoll, but arguably, this lower overhead is only useful at high load, and it disappear there.
The ring buffer scheme has a bad smell to me, in that it reminds me of notification via realtime signals, which could overflow the signal queue and required the application to support this "overload" and have another path of code to handle it. What happens when an event arrives and the ring buffer is full (the application is slow to process events, which is likely in a high-load situation)? Do we need to have another path of code in the application? Ironically, this would occur at high loads, which is precisely what we're pushing that ring buffer for! Something like epoll, IMHO, having no ceiling (the readiness information is kept in the fd structures in the kernel, if I remember correctly, so you can never run out of space), has the advantage of simplicity, which is not to be sneered at. A single code path means less to debug, smaller code size and less branching.
In short, I suspect that a large enough "maxevents" parameter to epoll_wait() might yield identical performance results to using the kevent ring buffer, possibly with simpler code too.
The second complaints is kind of fair, although that's never been that much of a problem for my applications. Whether it's readable or writable, as well as a pointer to my own data to quickly find the corresponding context data to process the event is quite enough, in my case. Maybe there's space for improvement, I would need to be pointed at examples. Also, note that most applications will want to have an abstraction over this platform-specific code, so that they can substitute a more portable version for non-Linux platforms, having an interface that's too radically different or difficult to emulate might just go unused, but that's a bit of a judgement call. Let's hear more of that point, I'd say.
For the complaint about thread cancellation, I'm with Linus. There is actually a case where this could be a problem, when using edge-triggered event with epoll, but it could probably be made to behave correctly, still (checking for cancellation before pulling the events from the file descriptors, say). select and poll are perfectly safe from this (select/poll do not change the state of the file descriptors at all, so another thread calling them again would get the events again).
The lack of support for futexes can seem annoying, but in reality, isn't much of a problem, and integrating them would actually be a lot of trouble for application developers (going back to having to emulate things on non-Linux platforms, again). The biggest thing that it would be handy for would be for semaphores, which is basically a counter (protected by a mutex, or using processor-specific atomic instructions) with a condition variable, and it would certainly be doable to make a semaphore that uses a pipe instead of a condition variable. In the simplest case, the pipe can also be used directly as a semaphore, and it should be possible to reduce the number of syscalls with a more complex implementation (although I suspect pthread_cond_broadcast(), which I think is called in sem_post(), also issues a syscall every time it is called).
Otherwise, most other uses of futexes in the kind of server application that would use event multiplexing would be similar to that of spinlocks in the kernel, not blocking for long period of time, so being able to process other events while waiting for them just wouldn't be so useful.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds