They're just guessing
Posted Sep 2, 2006 9:30 UTC (Sat) by slamb
In reply to: Kevents and review of new APIs
Parent article: Kevents and review of new APIs
The exact issue that was raised by epoll in Ulrich's paper was the overhead of
registering a file descriptor with the epoll_ctl() call before getting the events, and I was
wondering what would he have otherwise. Just getting all the events would be highly
I haven't seen this paper (got a link?), but I'd say there are three options:
- make assumptions - like that because read() returned EWOULDBLOCK you want to know
when it next becomes available for write
- abandon level-driven polling. Edge-driven polling should let you set your notification
preferences to READ|WRITE and leave it there, even if it's available and you don't currently want
to consume it.
- accept a list of changes at the same time as the blocking call. Of course, this is the BSD
way, so the Linux people have to do something different.
This article makes it painfully obvious to me that the Linux developers as a whole are just
going back to a kevent-like system after mocking it when creating epoll. Well, now
they're finding that the complexity of
those other event types is worth it, and that their system call overhead is too high. Probably
should have listened the first time. If sheer
numbers of system calls is the problem, it's obvious that in level-driven notification applications,
the FreeBSD approach of passing in all your change notifications at the same time of blocking is
better than the unnecessary system calls of epoll_ctl. (Do the Linux people only care
about edge-driven stuff? Perhaps that's reasonable, but I don't see it stated anywhere.)
This bizarre extreme of trying to eliminate all system calls by using a ring buffer...well, I
agree with your comment that it sounds
exactly like the signal-based polling mistake, and your comment in an earlier thread that some sort of blocking call is clearly
necessary. Maybe it is true that it's the copying of event buffers is significant, but I haven't seen
benchmark numbers that demonstrate this is superior, so
again it seems that they're just guessing. That's a poor reason for throwing out what someone
has already done in favor of a much more convoluted and error-prone interface.
I'm glad to see Andrew Morton's voice of reason, both on needing a clear justification for
going against the existing FreeBSD interface and on the documentation. The latter is a
serious problem with Linux interfaces in general. Look at inotify - they have section 2 manual
pages for the system calls but no section 4 manual page for the whole interface. That's
the system calls are completely obvious; the section 4 manual page is needed to actually
describe what the constants and structure elements mean, among other things.
if I write an asynchronous DNS resolver library, I should have a way to be notified
when a file descriptor is ready or a timeout expires without having to cooperate with other
code ... ome libraries, like Qt, libevent and such can do that, but the big problem is that it's a
very basic functionality, and it's worthless if it's not standard (if my library registers its events
with Qt, but the main program uses libevent, nothing happens and my library never gets its
events) ... the point here is to make one that will be good enough to be integrated as the Linux
event API and be integrated in the glibc, so it can be relied upon.
I've always liked liboop for this purpose. It's more usable
by libraries because it is general - you can plug it in to Qt's event loop, glib's event loop,
libevent, etc. You don't have to make the sort of assumptions you're talking about to use it. I'd
strongly prefer a well-maintained, liboop-like library to one in glibc like you're talking about.
Largely because I would like my code to run on FreeBSD as well, and because it doesn't require
waiting for the Qt and glib people to rebase their stuff on it. It doesn't even require other people
to use it, though it'd sure be nice.
to post comments)