The new timerfd() API

By Jonathan Corbet
September 25, 2007

The timerfd() system call was added in the 2.6.22 kernel. The core idea behind timerfd() - allowing a process to associate a file descriptor with timer events - is not controversial, but the implementation of this idea did, belatedly, raise a few eyebrows. In particular, Michael Kerrisk pointed out that timerfd() was inconsistent with (and less powerful than) the existing timer-related system calls, and, besides, the 2.6.22 version did not even work as advertised. After a fair amount of discussion, it became clear that the issues with this system call would not be worked out in the 2.6.23 time frame. So the 2.6.23-rc7 prepatch disabled timerfd() altogether in an attempt to prevent application developers from using an API which is going to change.

Prompted by all of this, Davide Libenzi (the creator of the original timerfd() system call) has posted a proposal for a revised timerfd() API. The single system call has turned into three different calls with a few new features.

Under the new API, an application wanting to create a file descriptor for timer events would make a call to:

    int timerfd_create(int clockid);

Where clockid describes which clock should be used; it will be either CLOCK_MONOTONIC or CLOCK_REALTIME. The return value will, if all goes well, be the requested file descriptor.

A timer event can be requested with:

    int timerfd_settime(int fd, int flags, const struct itimerspec *timer,
			struct itimerspec *previous);

Here, fd is a file descriptor obtained from timerfd_create(), and timer gives the desired expiration time (and re-arming interval value, if desired). This time is normally a relative time, but if the timer sets the TFD_TIMER_ABSTIME bit in flags, it will be interpreted as an absolute time instead. If previous is not NULL, the pointed-to structure will be filled with the previous value of the timer. This ability to obtain the previous value is one of the features which was lacking in the original timerfd() implementation.

That implementation also had no way for an application to simply ask what the current value of the timer was. The new API provides a function for querying a timer non-destructively:

    int timerfd_gettime(int fd, struct itimerspec *timer);

This system call will store the current expiration time (if any) associated with fd into timer.

The read() interface is essentially unchanged. A process which reads on a timer file descriptor will block if the timer has not yet expired. It will then read a 64-bit integer value indicating how many times the timer has expired since it was last read. A timer file descriptor can be passed to poll(), allowing timers to be handled in an applications main event loop.

Responses to the new API proposal have been muted at best; hopefully this silence means that developers are happy with the new system calls. The alternative is that this iteration of timerfd() will not be reviewed any more extensively than its predecessor was. As things stand, the new set of system calls looks likely to be merged for 2.6.24.

Index entries for this article
Kernel	timerfd()
Kernel	User-space API

Clarity fix

Posted Sep 27, 2007 9:43 UTC (Thu) by mlawren (guest, #10136) [Link]

The following sentence confused me:

"That implementation also had no way..."

Perhaps it would be better as:

"The old implementation also had no way..."

The new timerfd() API

Posted Sep 27, 2007 23:42 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

> The alternative is that this iteration of timerfd() will not be reviewed any more extensively than its predecessor was.

Just make sure the manpage is written before the patch is merged?

The new timerfd() API

Posted Oct 3, 2007 21:10 UTC (Wed) by mkerrisk (subscriber, #1978) [Link]

> > The alternative is that this iteration of timerfd() will not
> > be reviewed any more extensively than its predecessor was.
>
> Just make sure the manpage is written before the patch is merged?

http://thread.gmane.org/gmane.linux.kernel/584510

Mark the API beta

Posted Sep 28, 2007 14:14 UTC (Fri) by addw (guest, #1771) [Link]

Someone recently made a comment that a non trivial program should be written[**] to use the new API before it became set in stone. I certainly think that this would be a good idea. Some API additions are obviously right, but with most of them I expect that this is not so until it has been tested in real usage.

Perhaps new APIs should be flagged beta until this has been done and the result discussed. The reason that we don't like changing APIs is because it will break code "out there" - ie code that we don't know about. If someone wants to use a beta API then they take it upon themselves to check that the kernel API has not changed.

Kernel developers should be encouraged to listen to application developers who use beta APIs.

[**] or something existing adapted to use it.

Error conditions, other considerations

Posted Sep 28, 2007 22:24 UTC (Fri) by filker0 (guest, #31278) [Link] (2 responses)

I have not followed this, and my quick tracing of articles didn't lead to answers to a few questions that I have.

What error is returned if a non-timer fd is used in a call to timerfd_settime() or timerfd_gettime()?
What error is returned (if any) when a timer fd is passed to close()?
Are the timer fds unique across the entire system?
Are timer fds inherited or duplicated across a fork? Exec?
Are timers destoryed when the process ends?
Can one process set a timer for another?

If the answer to #6 is "Yes", it would introduce a nice IPC mechanism that I could see being useful in GUI, simulation, and automated test software. Of course, the answer to #6 depends on #s 3 and 4. Also, certain of these would have security implications as well.

I suppose I ought to get a recent kernel source distro, apply the proposed patches, then search out the man page, look at the implementation, and see if I can find any flaws in the API that should be addressed.

Error conditions, other considerations

Posted Oct 3, 2007 21:18 UTC (Wed) by mkerrisk (subscriber, #1978) [Link] (1 responses)

The right place to ask these (good) questions is of the developer, on the kernel mailing list. But, here goes:

> 1. What error is returned if a non-timer fd is used in a
> call to timerfd_settime() or timerfd_gettime()?

I have not tested this, but it should give EINVAL.

> 2. What error is returned (if any) when a timer fd is passed to close()?

That is not an error. See the man page:
http://thread.gmane.org/gmane.linux.kernel/584510

> 3. Are the timer fds unique across the entire system?

No.

> 4. Are timer fds inherited or duplicated across a fork? Exec?

Yes and yes.

> 5. Are timers destroyed when the process ends?

The file descriptor is closed. If some other process has a file descriptor (because of fork(), for example), then I believe the timer should continue to exist. This should be tested.

> 6. Can one process set a timer for another?

See 5.

Thanks for the pointer

Posted Oct 4, 2007 1:50 UTC (Thu) by filker0 (guest, #31278) [Link]

Thanks for the response. I'd not seen the manpage before. I'm not on the kernel mailing list due
to lack of time right now.

I did some development on an embedded kernel some years ago that had a feature similar to
these timers. They could be annonymous (not visible to other tasks) or named. Named timers
persisted from creation to the end of time (well, until you shut down or reset the box) -- it was
an embedded system, we didn't need to delete them. A named timer could be created and set,
then closed. Any task could open the timer and do a blocking read, which would block the task
until it expired, or a non-blocking read which would return the number of days/hours/minutes/
seconds/ticks until the next expiration, additional reads would return the next expiration after
that, and so on. A blocked task could be awaken from a read by an AST (asynchronous system
trap, this was more RSX than Unix like). You set the timer by writing to it (it was a binary
structure that contained timer information). Timers could be set using relative or absolute time,
and could have single or repeating events, and you could have up to 8 events of any type queued
up in a timer. Each event structure also had a chain of completion and cancellation callbacks
that could, in theory, be arbitrarily long. Since it was an embedded system, we provided no real
protection on these things, and if you scheduled a callback from your task and then terminated,
the callback would still get called unless you cancelled it before termination.

It was a very useful and somewhat nifty system for inter-task synchronization in a mostly
asynchronous application (a high end video terminal).

Anonymous timers were pretty much the same except that they were destroyed when your task
exited and were not visible to other tasks. We didn't have fork(), nor exec() as Unix/Linux has
them, so nothing ever got inherited. All the code on the system was always there (we ran out of
ROM), and a task was just a thread of execution that was started with a system call.

Anyway, thanks again for the response.