LWN.net Logo

Kernel events without kevents

Kernel events without kevents

Posted Mar 15, 2007 13:07 UTC (Thu) by pphaneuf (guest, #23480)
Parent article: Kernel events without kevents

This is nothing short of fantastic! We're getting really close to my litmus test of being to implement an asynchronous DNS resolver without threads that simply calls a callback when a request is done. I have been wishing for exactly this for a while now (note that I didn't ask for signals, because I know how to fake those myself, but now I get them for free! huzzah!).

All that's missing is a way to make it so that epoll_wait() automatically calls my callback when its event is tripped, just like Linus described back in 2000. :-)

There's still a bit of the "library problem", where a library still has to cooperate with the application that linked with it in order to have its events processed, but it's not too bad. With this and epoll, a library could create its own epoll fd, put all its things in there, return it to the application and tell it to call a certain function when it is readable. Even if epoll_wait() did automatically called back event handlers, an application would have to pass it an epoll fd to register its events with anyway.

On Windows, this is handled with a hidden window that has a WNDPROC, and it's one notch better, because the application doesn't have to know anything. It's like there was no epoll_create, and that the other epoll syscalls used a single list of events per process. But we're close enough that I think we're just about good here, at least for the next little while...


(Log in to post comments)

signals without signal stacks

Posted Mar 15, 2007 15:31 UTC (Thu) by pjones (subscriber, #31722) [Link]

It's better than that - with signalfd(), you can use e.g. malloc(), printf(), and backtrace() while handling a signal.

That's really a huge win. It'll also mean things like Xorg won't need to inject crap like VT_CHANGE handling into its event loop from a signal handler.

signals without signal stacks

Posted Mar 15, 2007 15:43 UTC (Thu) by pphaneuf (guest, #23480) [Link]

You could do malloc() and printf() with the "write a byte on a pipe in the signal handler" trick, but backtrace()? It won't be like doing it from a signal handler, it'll just give the stack trace of where you read() from the signal handler fd, no? That, again, would be just like the pipe trick.

Not that it's not neat, it saves a whole lot of coding, having to save things on the side so you can look at them later, but it basically operates the same way. I didn't focus on those very much, because handling signals in a library, through the pipe trick or not, is just a bad idea, IMHO (the application hooks the same signal, and then everything goes to hell in a handbasket).

But the timer is something that didn't really need coordination with an application (from the point of view of a library), but had to, because, well, that's the way it was. I had some idiotic tricks that worked, but were just disgusting (a thread started from my library to handle timers and write to a pipe to wake up the main loop, eww!). Well, not anymore!

signals without signal stacks

Posted Mar 15, 2007 16:11 UTC (Thu) by pjones (subscriber, #31722) [Link]

Well, it's not _always_ a bad idea. Consider the SIGPIPE vs db4 problem. If you have a library using both db4 and a network, you sometimes need to block signals - but you still want them to be raised to the caller. With signalfd(), you can do a fairly simple callback system sanely, which you really can't do as cleanly with the old-style signal/sigaction interfaces.

(yeah, arguably you can raise(), but you still have to have a doesn't-do-much signal handler deep in a library, which can get really ugly really fast)

Kernel events without kevents

Posted Mar 15, 2007 19:29 UTC (Thu) by mtaht (✭ supporter ✭, #11087) [Link]

Um, er, epoll can call your callback with only a tiny bit of wrapping

see http://boston.conman.org/2007/03/08

Kernel events without kevents

Posted Mar 15, 2007 21:01 UTC (Thu) by bronson (subscriber, #4806) [Link]

http://svn.u32.net/io/trunk/ works pretty well too but I haven't gotten around to documenting it yet... if ever...

Kernel events without kevents

Posted Mar 16, 2007 7:03 UTC (Fri) by pphaneuf (guest, #23480) [Link]

Thanks, it seems interesting, I'll definitely have a look, since I'm in the process of making a similar edge-triggered wrapper like this, but with the added twist of multithreaded (a limited number of threads, so that more events can be handled in a given amount of time, to use multicore systems and such while still being event driven).

Kernel events without kevents

Posted Mar 16, 2007 11:55 UTC (Fri) by bronson (subscriber, #4806) [Link]

I agree, multicore is here to stay. That code lets me run one epoll poller per thread and one thread per core (plus a few maintenance threads). I haven't tried it under serious load yet so there may be a few small bugs left to wiggle out, and the poller selection is utterly hacked (it's a todo item), but it works for me so far.

Feel free to mail me at bronson at domain rinspin.com.

Kernel events without kevents

Posted Mar 16, 2007 6:59 UTC (Fri) by pphaneuf (guest, #23480) [Link]

Indeed, and I very much love epoll for that, but for that to work between a library and an application (with the library putting things into the epoll fd, and the application being the one calling epoll_wait()), they pretty much have to use the same tiny bit of wrapping.

After that, it can totally be done, but if you make your library for, say, libevent, and someone tries using it in a Qt program, it's a pain.

Kernel events without kevents

Posted Mar 16, 2007 2:59 UTC (Fri) by wahern (subscriber, #37304) [Link]

Huh? I've been using asynchronous DNS resolvers for years:

ADNS
C-Ares
UDNS

My core event loop is libevent, which handles callbacks for signals, timers and I/O readiness. I
currently use C-Ares for sending and receiving raw DNS messages, and my lookup API in my
async meta-API library libevnet (since C-Ares tries to mirror the useless gethostbyname
interface). In libevnet you can ask for an MX+A record, and it will ultimately always get back A
(and/or AAAA if you specified) records suitable for sending mail. And this can be expanded
upon, so that you can take ask the library to do the smart thing:

s = socket_open(&socket_defaults);
socket_name_init(&n, "google.com", "smtp", LOOKUP_IN_MX|LOOKUP_IN_A|LOOKUP_IN_AAAA)
tv.tv_sec = 5;
socket_connect(s, &n, &my_callback, my_arg, &tv);

Kernel events without kevents

Posted Mar 16, 2007 10:49 UTC (Fri) by pphaneuf (guest, #23480) [Link]

I was using that as an example of a library that uses multiple file descriptors and has its own timeouts. And yes, it is already possible, my issue is that it's just so clunky.

For example, ADNS has two calls to fiddle with your select()/poll() parameters before and after, so if you use something else, you have to hack a bit to know its file descriptors. In particular, it doesn't match well with an API where you register the interests only once, which is the case of every single new API (because it's fundamentally more efficient than starting from scratch every time).

C-Ares cuts on the hacking a small bit, since it gives out the file descriptors it's interested in up-front, rather than having to go through some an array of struct pollfd. But it's still oriented toward a "from scratch every time" API like select()/poll(), so you have to remember what it said last time, and tweak your interest set accordingly. Not my idea of fun and painless, but I've done it, and I've lived through it.

UDNS restricts itself to using a single file descriptor, in an attempt to make this integration easier, but this comes at the cost of not being able to do TCP queries (which are required when a response is too large, which is actually fairly common for MX queries of large sites). So, it arguably crippled itself functionally in order to do what I said, still leaving timeout management to deal with (but that part is easy, at least).

With timerfd, one can use epoll_create() in a library, return that to the application and tell them quite simply "when this fd is readable, call this function here", and that's it. The main application can use select(), poll(), or whatever it feels like using, it doesn't have to deal with anything in its timeout management, it's all reduced to a single bit of information: is this fd readable?

Xlib is also like that (through ConnectionNumber()), which makes it very easy to deal with, but that' a bit easier, since it doesn't have timeouts and really just has the one file descriptor to deal with. Hence my using asynchronous DNS as an example with multiple descriptors (if you don't punt on the TCP queries) and timeouts.

Kernel events without kevents

Posted Mar 24, 2007 1:20 UTC (Sat) by slamb (guest, #1070) [Link]

That's a problem of poor library interfaces, not poor kernel interfaces. And in the case of C-Ares, it's not even true. Look at ARES_OPT_SOCK_STATE_CB.

Kernel events without kevents

Posted Mar 24, 2007 1:11 UTC (Sat) by slamb (guest, #1070) [Link]

I'm not sure what you're asking for. What new kernel interface do you need for asynchronous, single-threaded DNS resolution? There are several such libraries already, and they work fine for me.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds