User: Password:
|
|
Subscribe / Log in / New account

Kevents and review of new APIs

Kevents and review of new APIs

Posted Aug 25, 2006 9:18 UTC (Fri) by pphaneuf (subscriber, #23480)
In reply to: Kevents and review of new APIs by vmole
Parent article: Kevents and review of new APIs

In what situation does one opens files in rapid succession, without a sockets being more or less involved at the same time? Consider that open() itself is synchronous and blocking, and you've already got a good bit of overhead. If you start thinking about an "async open", then I'll refer you to this.

I would remind you that a big way to decrease the overhead of event handling is to have less events in the first place, thus the usefulness of interest sets and such (think of X11's event_mask, as on a remote display, event dispatching can be slow). At some point, you'll have to make a syscall to tell the kernel if yes or no you want events for a given file descriptor. At best, it could be specified through flags when opening, but I don't think this is an issue worth addressing.

I think what Ulrich was referring to was more when you want to "fiddle" with the interest set rather frequently. For example, picture a user-space port forwarder, where it starts waiting for readability on both sockets, but upon receiving a packet, will add writability of the other socket to its interest set (maybe remove readability of the socket where the packet arrived, for flow control), etc...

This particular situation is a case where edge-triggered events (well supported by epoll, by the way) are indicated, as you can set the interest set of both sockets to read and write, and leave them as-is, keeping track of the state of the sockets in user-space.

Lastly, what other "local events" are there?


(Log in to post comments)

Kevents and review of new APIs

Posted Aug 31, 2006 20:05 UTC (Thu) by vmole (guest, #111) [Link]

Async I/O includes things other than open(), ya know. If it takes longer to deal with the completion of the async_read() than it would to just call read() in the first place, async i/o becomes pointless.

Something that opens lots of files without using sockets: compilers. Consider some auditing system that wants to know every time a file is accessed.

And while sockets may be involved, a web server, news server or IMAP server might open, read, and possibly write a *lot* of files for one socket instance.

As for other local events, consider a piece of shared memory, with a master process that wants to know anytime one of the children writes to it. Yeah, it can be done with an atomic counter and polling, but an event would be much cleaner. And I've had a need for this.

Designing a general kernel events mechanism around the limitations of socket open() seems shortsighted.

Kevents and review of new APIs

Posted Sep 1, 2006 12:20 UTC (Fri) by pphaneuf (subscriber, #23480) [Link]

I think you misunderstood what kevent is for. kevent isn't concerned with all sorts of events, but rather on the very specific general types of events that can wake a process up. Just about all of those events are file descriptor events, due to the Unix design ("everything is a file" or close enough to do).

The exact issue that was raised by epoll in Ulrich's paper was the overhead of registering a file descriptor with the epoll_ctl() call before getting the events, and I was wondering what would he have otherwise. Just getting all the events would be highly inefficient.

To deal point by point with your reply, Linux-AIO uses a single file descriptor to get notifications on all the operations it does (reads and writes, on all other file descriptors, be them files, sockets or other). This file descriptor can most likely be put in an epoll interest set. An auditing system (such as exists already in the form of Dazuko, inotify and such) would most likely deliver its events on a file descriptor (which you can put in epoll and get notified when those events arrive). Web, news and IMAP servers could use Linux-AIO (covered earlier), but normal filesystem-based file descriptor are "always readable", even when they aren't, so you usually don't want to use them in a event mechanism like kevent or epoll (being always "ready", they make your application busy-spin, eating 100% CPU).

Processes communicating bulk data through shared memory often use a Unix domain socket to notify the other process that it should get the data. X11's MITSHM extension, for example, but simpler systems that just write a single byte (enough to make the file descriptor go "readable") are also seen. Unix domain sockets involve more copies for bulk data, but writing a single byte to wake the other process up is very cheap. If the notification is one-way only, a pipe is enough.

You also missed a few other interesting cases. Central processing of signals and timeouts are two others. Signals can also be dealt with the "single byte written on a pipe" trick, from the signal handler, deferring the work to the other end of the pipe. Timeouts can be dealt with, well, the timeout parameter of epoll_wait(), of course.

The main problem I have with epoll is still that it doesn't centralise the event dispatching for libraries. A new API should include a callback function when events arrive, which would get called without needing cooperation between unrelated pieces of code. For example, if I write an asynchronous DNS resolver library, I should have a way to be notified when a file descriptor is ready or a timeout expires without having to cooperate with other code. Right now, code in a library has to provide a way to let the code that will be doing the call to epoll_wait know that it has a specific timeout or that if it gets an event on a certain file descriptor, it should pass it on.

Some libraries, like Qt, libevent and such can do that, but the big problem is that it's a very basic functionality, and it's worthless if it's not standard (if my library registers its events with Qt, but the main program uses libevent, nothing happens and my library never gets its events). These libraries already do a good job, but the point here is to make one that will be good enough to be integrated as the Linux event API and be integrated in the glibc, so it can be relied upon.

They're just guessing

Posted Sep 2, 2006 9:30 UTC (Sat) by slamb (guest, #1070) [Link]

The exact issue that was raised by epoll in Ulrich's paper was the overhead of registering a file descriptor with the epoll_ctl() call before getting the events, and I was wondering what would he have otherwise. Just getting all the events would be highly inefficient.

I haven't seen this paper (got a link?), but I'd say there are three options:

  1. make assumptions - like that because read() returned EWOULDBLOCK you want to know when it next becomes available for write
  2. abandon level-driven polling. Edge-driven polling should let you set your notification preferences to READ|WRITE and leave it there, even if it's available and you don't currently want to consume it.
  3. accept a list of changes at the same time as the blocking call. Of course, this is the BSD way, so the Linux people have to do something different.

This article makes it painfully obvious to me that the Linux developers as a whole are just guessing. They're going back to a kevent-like system after mocking it when creating epoll. Well, now they're finding that the complexity of those other event types is worth it, and that their system call overhead is too high. Probably should have listened the first time. If sheer numbers of system calls is the problem, it's obvious that in level-driven notification applications, the FreeBSD approach of passing in all your change notifications at the same time of blocking is better than the unnecessary system calls of epoll_ctl. (Do the Linux people only care about edge-driven stuff? Perhaps that's reasonable, but I don't see it stated anywhere.)

This bizarre extreme of trying to eliminate all system calls by using a ring buffer...well, I agree with your comment that it sounds exactly like the signal-based polling mistake, and your comment in an earlier thread that some sort of blocking call is clearly necessary. Maybe it is true that it's the copying of event buffers is significant, but I haven't seen benchmark numbers that demonstrate this is superior, so again it seems that they're just guessing. That's a poor reason for throwing out what someone has already done in favor of a much more convoluted and error-prone interface.

I'm glad to see Andrew Morton's voice of reason, both on needing a clear justification for going against the existing FreeBSD interface and on the documentation. The latter is a serious problem with Linux interfaces in general. Look at inotify - they have section 2 manual pages for the system calls but no section 4 manual page for the whole interface. That's worthless - the system calls are completely obvious; the section 4 manual page is needed to actually describe what the constants and structure elements mean, among other things.

if I write an asynchronous DNS resolver library, I should have a way to be notified when a file descriptor is ready or a timeout expires without having to cooperate with other code ... ome libraries, like Qt, libevent and such can do that, but the big problem is that it's a very basic functionality, and it's worthless if it's not standard (if my library registers its events with Qt, but the main program uses libevent, nothing happens and my library never gets its events) ... the point here is to make one that will be good enough to be integrated as the Linux event API and be integrated in the glibc, so it can be relied upon.

I've always liked liboop for this purpose. It's more usable by libraries because it is general - you can plug it in to Qt's event loop, glib's event loop, libevent, etc. You don't have to make the sort of assumptions you're talking about to use it. I'd strongly prefer a well-maintained, liboop-like library to one in glibc like you're talking about. Largely because I would like my code to run on FreeBSD as well, and because it doesn't require waiting for the Qt and glib people to rebase their stuff on it. It doesn't even require other people to use it, though it'd sure be nice.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds