Posted Aug 3, 2006 13:31 UTC (Thu) by pphaneuf
Parent article: Toward a kernel events interface
One thing that has always bothered me with the existing event delivery mechanisms on Unix/Linux was that they did not allow for distributing events to multiple components of application.
But this has been discussed before by Linus and others, a long time ago.
Now, I don't like to cite the Win32 too often, but they have a lot of experience with centralized event delivery, as almost all Win32 programs end up having to deal with that eventually. Specifically, on that platform, you can associate a callback to messages (events), that will be called to handle the message. It is done in a different way than the API Linus had proposed: ccallbacks are really associated with "windows", one can creates an invisible window (those are rather lightweigth objects) to handle events, the GetMessage() function does not automatically call the callbacks (there is a separate DispatchMessage() function to do that, so it can be done selectively by the application getting the messages), one can also get only the messages for a given window (overriding the handling of other windows). There are also a few more message types, including, most significantly, timer messages.
Distribution of timers is also important. As I mentioned an equally long time ago, a timer object that would be represented as a file descriptor would be very useful. As things are, the code calling the event delivery function has sole control over the timeout.
What does all this give us? Well, a classic example is making an asynchronous DNS resolver that is almost as easy to use as gethostbyname(). A DNS resolver is interested in two events, one where it waits for a reply on a file descriptor, and another where it has a timeout after which it either resends its requests or fails. An hypothetical asynchronous resolver on Win32 could simply create an invisible window, bind its socket readiness events to it, register a timer event for the retransmits/failure, send its requests and return. It could take a simple function pointer as a callback that it would call upon finishing its work, and this would all "simply happen". If it returned the handle to its invisible window, you could then make a synchronous version by simply implementing an event loop that filters on the window handle until the callback got called (if you wanted to use it the "normal" asynchronous way, you'd simply ignore the return value, which is easy to do).
This is simply impossible to do on Linux right now without ridiculous overhead (all we're ). Assuming a distributed event dispatching like Linus proposed, one would still need a thread or a subprocess to write to a file descriptor to implement the timeout! A timer file descriptor object would be quite the easy and orthogonal extension, but it just goes hand-in-hand with distributed event dispatching. There are some distributed event dispatchers, like the GLib mainloop, libevent and others, but unification is key.
Note that epoll's file descriptors could be seen as similar to the invisible window, but they have several differences. It is missing the "call the callback" part of Linus' proposal, which is key. If they had this, they could be used in a hierarchy, but could be rather inefficient, as the callback for a lower level epoll would get called, then dispatch its own events, possibly on multiple levels. How would libraries know with which epoll handle register their own? There would need to be a "wait for events on all epoll handles of this process", but then, this would absolutely require the callback mechanism to be centralized (what would the main application event loop do with events meant for the asynchronous DNS resolver?). And the timers are still missing...
The key issue is having a unified event dispatching. The Portland Project is doing some work on unifying mainloops, but mostly in the area of GUI toolkits, but I think this should be a system-level facility, not just for GUI programs (their needs are just much more obvious, as a library tries to popup a GTK+ dialog in a Qt program without falling apart). We could all simply use libevent, say, but the problem is getting everyone to switch boat. A mechanism that would be originating from the kernel and/or the glibc people, with promises of better performance, could be key to adoption.
to post comments)