LWN.net Logo

Edge-triggered interfaces are too difficult?

The new epoll interface was covered here back in October, 2002. The epoll system calls offer a significant performance improvement for applications which must frequently poll large numbers of file descriptors. It does so by performing the setup work only once, and then trapping new I/O events as they occur.

One aspect of the epoll interface is that it is edge-triggered; it will only return a file descriptor as being available for I/O after a change has happened on that file descriptor. In other words, if you tell epoll to watch a particular socket for readability, and a certain amount of data is already available for that socket, epoll will block anyway. It will only flag that socket as being readable when new data shows up.

Edge-triggered interfaces have their own advantages and disadvantages. One of their disadvantages, as epoll author Davide Libenzi has discovered, would appear to be that many programmers do not understand edge-triggered interfaces.. Additionally, most existing applications are written for level-triggered interfaces (such as poll() and select()) instead. Rather than fight this tide, he has sent out a new patch which switches epoll over to level-triggered behavior. A subsequent patch makes the behavior configurable on a per-file-descriptor basis.

The end result is a more flexible epoll interface that can be more easily used in existing applications. The patch has not been merged as of this writing, but there does not seem to be any reason why it shouldn't be. After all, epoll has not yet appeared in a stable kernel release; now is the best time to be making improvements to the interface.


(Log in to post comments)

Edge-triggered interfaces are too difficult?

Posted Mar 13, 2003 17:30 UTC (Thu) by dneto (guest, #4954) [Link]

I'm no kernel hacker, but it seems to me that edge-triggered
interfaces make race conditions more likely.

Something like this can get you into deadlock.

get_all_pending_data(fd);
// Oops, data arrives here...
while ( ! data_arrives(fd) ) {
get_all_pending_data(fd);
}


It's best to allow overlap with a level-sensitive interface:

while ( block_until_data_pending(fd) ) {
get_all_pending_data(fd);
}

Edge-triggered interfaces are too difficult?

Posted Mar 14, 2003 8:59 UTC (Fri) by dank (guest, #1865) [Link]

I've been programming with edge-triggered
interfaces for some time now (sigio, epoll)
and I love 'em. The paradigm is

for (;;) {
get next event on any monitored fd
handle that event
}

where handling an event means e.g. reading from the
associated fd until there's no more to read,
using nonblocking mode.
Piece o' cake. The big plus for me (besides the
blazing speed) is that I *never have to reset
my interest mask*. I've written a lot of nonblocking
I/O engines, and I never got good at computing the
interest mask. So not having to do it anymore
is a big relief.

The only big surprise for new
users is that they *really do have to read
until there's no more to read*, otherwise
the next event will never come.
It's these new users, and people porting stuff
from Solaris, who will most appreciate the
level-triggered approach, which is more forgiving
(it doesn't care how much you read, etc).

See http://www.kegel.com/c10k.html for my
notes on the subject.

Edge-triggered interfaces are too difficult?

Posted Mar 15, 2003 6:17 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

> for (;;) {
> get next event on any monitored fd
> handle that event
> }

Is that exactly the same code the parent says might ends up in race condition? In particular, what should happen if an event arrives exactly at the time *after* you handle all events and before you start another monitoring? It seems like that to resolve this race, the kernel may keep a "marker" to know whether an unreported event has occurred. Then we have another problem that the user may get the event during the "handle that event" phase. And the kernel need to check that the "marker" is real before returning to the user. While solvable, it seems not much less work compared to level triggered interface.

On the other hand, is the user-land side really "a piece of cake" compared to level triggered semantics? You need exactly the same code, except that in the "handle that event" part the program can read any event as oppose to read all events?

Edge-triggered interfaces are too difficult?

Posted Mar 16, 2003 8:07 UTC (Sun) by Ross (subscriber, #4065) [Link]

By "edge triggered" they don't mean that you don't get an event if you don't check at the right time. They mean that you only get told about an event once, rather than every time.

So, if data arrived right after you checked the socket, you wouldn't know about the new event. No problem. You would be notified about it next time you checked.

I don't understand dneto's example at all.

A well-written select() application will already treat notifications like one-time events and do as much input as possible. Not doing so just means more round-trips through the event loop.

Edge-triggered interfaces are too difficult?

Posted Mar 16, 2003 9:04 UTC (Sun) by IkeTo (subscriber, #2122) [Link]

> So, if data arrived right after you checked the socket, you
> wouldn't know about the new event. No problem. You would be
> notified about it next time you checked.

I've made it rather clear in my last post that this is doable, although not exactly easy. Do you notice what the kernel must do, under the current device driver interface, for this whole thing to work like this? Let's have a short account:

1. When an event occurs in the hardware, the device driver calls poll_wait() to tell the kernel that some event has occurred.

2. The kernel now call the poll() method of the device driver---no matter whether anybody is calling select right now! Otherwise a race condition might occur.

3. The kernel must remember what event occurred, so that if select is called later, it can be returned right away.

4. When select is called, the remembered events must be checked, and if they are the ones the user is waiting for, they must be returned without waiting for the device to call poll_wait() again.

5. When a user actually do a read, write or anything that affect the events (say ioctl()), the poll() method must be called again to update the current status of the device. Otherwise the device may report a "change of readability" event even though the event is related to something that occurred before the last read. Alternatively, in the above point, the select call must call poll() to confirm that the events are still there before returning to the user.

Who in the business of OS development will write such an interface when in fact a level triggered interface requires...

1. On select, the poll() method of the device is called to see if the event waited for is already happening, and if so return right away. Otherwise it wait until the device driver calls poll_wait(), and repeat.

2. When hardware event occurs, the device driver will call poll_wait(), and at this time everybody waiting on poll_wait() will be woken up.

..., and that's it?! The interface might make sense on an interface like epoll or even poll, where the kernel is adviced that the user might call poll() on a set of fds. Then it might reduce the number of times that the kernel needs to call the poll() method of the device driver. But for select(), where the user can suddenly call, and after each invocation nothing is left in the kernel, it seems stupid.

select() actually IS edge triggered

Posted Mar 14, 2003 14:08 UTC (Fri) by paulsheer (guest, #3925) [Link]

Yes, select() for some devices IS edge triggered.
If select() marks a file descriptor available for
read/writeing (on some devices on some OS's) it WON'T mark
it again. You HAVE to read/write from it if you requested
for event notificatin for that file descriptor.

Therefore, in my select_tut man page (see recent
man page package from Linux doc), I make it clear
never to ask for event notification on a file
descriptor unless you intend to respond with a
read or write. Some devices work level triggered,
but programs that count on this may break.

The programs that do not use select() properly
should be fixed.

The whole point of the new system call (or it
ought to be the whole point) is that old select()
programs can simply #ifdef HAVE_EPOLL and get
the benefit or O(better) file descriptor event
handling -- with little or no other changes.

(This has nothing to do with what the standards
might say. Most programs are coded to run on
machines, not on standards :-)


select() actually IS edge triggered

Posted Mar 15, 2003 8:18 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

> (This has nothing to do with what the standards
> might say. Most programs are coded to run on
> machines, not on standards :-)

Except perhaps to determine who is broken and thus must change. :-)

BTW, I see the implementation of Linux select() as to call a "poll method" of the device driver at the beginning to ask what is the current device status, and then, if it is not ready, to wait on a wait queue which is woken up by various device driver routines via the poll_wait() interface. Upon wake up the device is checked again for ready operations. Under this implementation I see no reason why a device can be edge-triggered: the device driver simply has to do much more work to make it edge-trigger without introducing a lot of race conditions.

On the other hand, if there are really device out there that *is* edge triggered on select(), then the advice that "one must read/write the fd if you select() it" is not sufficient. Instead, the advice should be "one must set the fd to non-blocking and read/write all the ready fds until it returns -1 with errno=EAGAIN or errno=EINTR, or you no longer need to read and write the fd, in which case you must no longer select() on them". Personally I've never seen a single scenario when the latter is needed. The former is always needed, for the simple reason that if you never read/write a fd that you select(), then once the fd is ready, it is always ready and break select(), so select() becomes a very expensive form of busy-polling.

select() actually IS edge triggered

Posted Mar 16, 2003 8:12 UTC (Sun) by Ross (subscriber, #4065) [Link]

Yep, if you are using select() to do nonblocking operations without setting the socket to nonblocking mode, your code will break. Believe me, I've done it :) It won't happen predictably, but it can happen. Basically treat the output from select() as a hint, and then try the operation until you get EAGAIN (or EWOULDBLOCK or whatever), and be prepared to handle the case where the very first call returns that error.

select() actually IS edge triggered

Posted Mar 16, 2003 8:41 UTC (Sun) by IkeTo (subscriber, #2122) [Link]

> Yep, if you are using select() to do nonblocking operations
> without setting the socket to nonblocking mode, your code will
> break.

Seen it only when I've got many threads/processes waiting for the same socket/fd. Then all threads wake up but only one can get the event that occurred. This is even documented in the accept() system call (perhaps because it always bite people making a multiprocess server). Of course you must respect the guarantee of select: only the first read() or write() is guaranteed to be non-blocking, and they may not write all the data you want to write, or read the whole buffer that you give it. Then you must wait for select() to tell you that it is ready again. But if you do respect them, I have never seen it failed, despite that I use it rather regularly. Perhaps the advice is a good safe-guard, but if any type of device or file has that behaviour (without saying that it does not work with select), then it is broken and should be reported as bugs.

BTW, your problem is just the opposite to the problem that paulsheer mentioned. What he says is that select might not return even if some fd is ready---because it has been returned in previous call of select. You say just the opposite: select might return even if no fd is ready. This should never happen, unlike the problem mentioned by paulsheer, which seems to be a gray area in the man page.

select() actually IS edge triggered

Posted Mar 16, 2003 19:00 UTC (Sun) by Ross (subscriber, #4065) [Link]

Yes, I have seen this happen "in real life" on several systems including Linux. The information given by select() is just a "snapshot" in time. You seem to be claiming that this is something new with edge triggered notification. It's not.

You point out the case I'm talking about: accept(), but the problem isn't caused by reading or writing more than once. In fact, if you use blocking sockets, there is no way to avoid the problem.

Specifically, I was talking about connections which are canceled between the call to select() and the call to accept(). The problem in general is that things can change between those two system calls.

If a connection is made to a server with a blocking listen socket the server will be notified of that new connection by select(). But the client can then cancel the connection before the server calls accept(). If there are no other waiting connections, the server will block until another connection is made. This is a classic race condition like the signal right before select() race and the alarm() without sigsetjmp() race). Most systems do not prevent this from happening (though a few will keep dead sockets in the accept queue so they can be handled as errors by accept).

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds