Kevents and review of new APIs
Posted Aug 24, 2006 9:07 UTC (Thu) by pphaneuf
Parent article: Kevents and review of new APIs
Linus had mentioned kqueue before.
I said before how I don't think systems calls are really the issue. First off, if a process is going to sleep, it has to make a system call. Second, for an event gathering API, as long as you can fetch many events at once (like epoll_wait() and (eurgh) select()/poll() allows one to do), you can manage to make it a single system call per iteration of the main loop, where a lot of work (and many, many system calls!) will be done in-between each of those calls.
Now, something much more profitable would be a way to limit the number of events in the first place. Many protocols have fixed block size, a way to only trigger a readability event when a full block has arrived, or a writability event when there is space for X bytes instead of 1 would be taking a higher level approach to reducing event delivery overhead, simply by delivering less events.
This is exactly how epoll was so much more efficient than select()/poll(), not by employing radical new ways of communicating between kernel and userspace (all three take pointer to a userspace buffer and fiddle with it), but by reducing the complexity of the processing itself by keeping some state in the kernel. epoll doesn't take any input at event dispatch time, and returns only the interesting events.
The potentially nasty behaviour of overfilling the ring buffer is reminding me of what would happen with realtime signals when it was out of space for events: you'd have to fall back to using select(), so you ended up having a lot of complexity, with a special "overload" code path and all the bugs and testing headache that would come out of it. Oh well, at least it didn't explode and overwrite earlier event data, like this proposal looks like it'll do!
I really wonder how much of a problem epoll registration and unregistration is (the only issue Ulrich's paper mentioned about epoll), considering how many system calls are usually involved in accepting a new connection. You have to accept the connection, set it to non-blocking, possibly turn off Nagle (depending on the protocol), maybe set TCP_CORK, maybe getsockname() to find out which virtual server we're supposed to act as, maybe set the priority if we need some particular ToS (streaming media servers come to mind as a high-performance ToS-aware kind of application)... Oh, and an epoll_ctl(). Is that going to kill performance, really?
I find epoll rather satisfactory, at the moment, from a performance point of view. My main problem is reducing the number of copies in bulk transfers, where Ulrich's paper was interesting, and can be helped at the moment with APIs such as sendfile() and splice(). Also, reducing the number of events themselves would help performance of some servers.
I think that there are also aspects of API usability, which would be a good time to improve. When I'll be able to implement an asynchronous DNS resolver that doesn't fork or use threads (to wait for replies? how silly) and works without too much help from the application, it'll be a great day.
to post comments)