Improving the kernel timers API

By Jonathan Corbet
October 9, 2017

The kernel's timer interface has been around for a long time, and its API shows it. Beyond a lack of conformance with current in-kernel interface patterns, the timer API is not as efficient as it could be and stands in the way of ongoing kernel-hardening efforts. A late addition to the 4.14 kernel paves the way toward a wholesale change of this API to address these problems.

It is worth noting that the kernel has two core timer mechanisms. One of those — the high-resolution timer (or "hrtimer") — subsystem, is focused on near-term events where the timer is expected to run to completion. The other subsystem is just called "kernel timers"; it offers less precision but is more efficient in situations where the timer will probably be canceled before it fires. There are many places in the kernel where timers are used to detect when a device or a network peer has failed to respond within the expected time; when, as usual, the expected response does happen, the timer is canceled. Kernel timers are well suited to that kind of use. The work at hand focuses on that second type of timer.

Kernel timers are described by the timer_list structure, defined in <linux/timer.h>:

    struct timer_list {
	unsigned long		expires;
	void			(*function)(unsigned long);
	unsigned long		data;
	/* ... other stuff elided ... */
    }

The expires field contains the expiration time of the timer (in jiffies); on expiration, function() will be called with the given data value. It is possible to fill in a timer_list structure manually, but it is more common to use the setup_timer() macro:

    void setup_timer(timer, function, data);

There are a number of issues with this API, as argued by Kees Cook. The data field bloats the timer_list structure unnecessarily and, as an unadorned unsigned long value, it resists any sort of type checking. It is not uncommon for callers to cast pointer values to and from this value, for example. For these reasons, it is far more common in current kernel APIs to dispense with the data field and, instead, just pass a pointer to the relevant structure (the timer_list structure in this case) to the callback. Likely as not, that structure is embedded within a larger structure containing the information the callback needs anyway, so a simple container_of() call can replace the casting of the unsigned long value.

As might be expected, though, Cook has concerns about this API that go beyond matching the current kernel style. One of those is that a buffer overflow in the area of a timer_list structure may be able to overwrite both the function pointer and the data passed to the called function, allowing arbitrary calls within the kernel. That, naturally, makes timer_list structures interesting to attackers, and explains why Cook has been trying to harden timers for a while. The prototype of the timer callback, containing a single unsigned long argument, is also evidently an impediment to "future control flow integrity work". It would be better if the callback had a unique prototype that was visibly different from all of the other kernel functions taking an unsigned long argument.

Cook has been working on changes to the timer interface for a while in an attempt to address these issues. The core idea is simple: get rid of the data value and just pass the timer_list structure to the timeout function. The actual transition, though, is complicated by the existence of 800 or so setup_timer() call sites in the kernel now. Trying to change them all at once would not be anybody's idea of fun, so a phased approach is needed.

In this case, Cook has introduced a new function for the initialization of timers:

    void timer_setup(struct timer_list *timer, void (*callback)(struct timer_list *),
		     unsigned int flags);

For the time being, timer_setup() simply stores a pointer to timer in the data field. Note that the prototype of the callback has changed to expect the timer_list pointer.

With that function in place, calls to setup_timer() can be replaced at leisure, as long as each corresponding timer callback function is adjusted accordingly. For the most part, as can be seen in this example, the changes are trivial. Many timer callbacks already were casting the data value to a pointer to the structure they needed; they just need a one-line change to obtain that from the timer_list pointer instead. A new from_timer() macro has been added to make those conversions a bit less verbose.

The addition of timer_setup() was merged just prior to the 4.14-rc3 release — rather later in the release cycle than one would ordinarily expect to see the addition of new interfaces. The purpose of this timing was clear enough: it clears the way for the conversion of all of those setup_timer() calls, a task which, it is hoped, will be completed for the 4.15 kernel release. Once that is done, the underlying implementation can be changed to drop the data value and the setup_timer() interface can be removed entirely. At the end, the kernel will be equipped with a timer mechanism that is a little bit more efficient, more like other in-kernel APIs, and easier to secure.

Index entries for this article
Kernel	Security/Kernel hardening
Kernel	Timers

Improving the kernel timers API

Posted Oct 10, 2017 16:48 UTC (Tue) by jhoblitt (subscriber, #77733) [Link] (8 responses)

Another win for weakly typed languages...

Seriously, why does C allow casting between non-pointer and pointers types? It would be nice if there was a gcc pragma to disable that.

Improving the kernel timers API

Posted Oct 10, 2017 20:25 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Seriously, why does C allow casting between non-pointer and pointers types?

Because in a lot of the environments where people still think using C makes sense, sometimes you need to be able to compose an integer from part of a pointer, or a pointer from parts of integers.

Improving the kernel timers API

Posted Oct 11, 2017 7:17 UTC (Wed) by walken (subscriber, #7089) [Link] (4 responses)

> Seriously, why does C allow casting between non-pointer and pointers types?

So that you can implement malloc().

Improving the kernel timers API

Posted Oct 11, 2017 9:57 UTC (Wed) by sdalley (subscriber, #18550) [Link] (3 responses)

Use of malloc should never require a cast, if it is prototyped to return a pointer to void.

The under-the-hood implementation of malloc might require casting between integers and pointers, although I've always managed with just pointers and pointer arithmetic.

Use of C casts is almost always a sign of poor programming and potential bugs, and/or lingering heritage of code before prototypes or const qualifiers etc were a thing. The need for casts very often goes away with correct prototyping, or judicious use of a union.

Improving the kernel timers API

Posted Oct 20, 2017 19:53 UTC (Fri) by Jandar (subscriber, #85683) [Link]

> implement malloc()

> Use of malloc

Improving the kernel timers API

Posted Oct 12, 2018 3:07 UTC (Fri) by samiam95124 (guest, #120873) [Link] (1 responses)

There is nothing preventing you from encapsulating malloc() specific to your application. Thus:

struct timer_list {

...

}

struct timer_list *get_timer_list(void) { return (struct timer_list*) malloc(sizeof(timer_list)); }

etc.

I come from a strongly typed language background, so I use that paradigm a lot in my code.

The reason why you cast from malloc, even though it returns void, is so you have some idea what type is being converted into, not just leaving it implicit.

PS if you use the get_xxx_entry() and put_xxx_entry() paradigm in your code, you create a centerpoint for the allocation and deallocation of types, and can do advanced things based on that. For example, I typically count my structures by type, +1 for each get() and -1 for each put(). This allows you to assert() for things like underflow (more puts than gets), and non-zero counts after clean up/shutdown routines.

In addition, probably because I distrust storage allocators, I keep most of my structures in a free list on put(), and pass them out again in get(). This reduces fragmention, as well as supporting the idea that if you use a certain data structure once, you are probably going to use them again (and again). Finally, using a single #ifdef flag you can turn off reuse/recycle in the get()/put() routines and use that to trace down use after dispose and similar bugs. If you turn off recycle and your bug goes away, there you go.

Improving the kernel timers API

Posted Oct 12, 2018 19:10 UTC (Fri) by sdalley (subscriber, #18550) [Link]

Hmm, that's all rather neat and tidy. You're using the cast in a self-documenting way.

And I share your preference for pre-allocating free lists for specific things. This kind of thing comes into its own in hard-realtime programming, when you can't afford to have non-deterministic mallocs inserting random delays in high priority threads.

Improving the kernel timers API

Posted Oct 12, 2017 6:02 UTC (Thu) by error27 (subscriber, #8346) [Link]

The funny thing is that I had just added support for tracking function pointers stored as unsigned long to Smatch. I wonder if any other code does this or it can be removed?

http://repo.or.cz/smatch.git/commitdiff/5fe8df0add883597c...

It was part of infrastructure to link function pointers to their data arg...

Improving the kernel timers API

Posted Oct 12, 2017 7:46 UTC (Thu) by iq-0 (subscriber, #36655) [Link]

> Seriously, why does C allow casting between non-pointer and pointers types? It would be nice if there was a gcc pragma to disable that.

Because it lacks generics and generic types and most uses are in some way of storing or passing along some unknown thing. Some uses might be handled through 'union's, but those are hardly better and won't work in truly generic decoupled code.

The fact that some uses in the kernel are extremely sensitive to how large some variables are only makes it worse.

Improving the kernel timers API

Posted Oct 10, 2017 18:07 UTC (Tue) by josh (subscriber, #17465) [Link] (3 responses)

At the risk of bikeshedding, it seems odd and confusing to redesign setup_timer and call the new interface timer_setup.

timer_setup()

Posted Oct 10, 2017 18:28 UTC (Tue) by corbet (editor, #1) [Link] (2 responses)

It needed a new name so it could coexist with the old interface, and subsystem_action() is the preferred pattern these days. I'm not sure what a better choice would have been.

timer_setup()

Posted Oct 10, 2017 20:09 UTC (Tue) by dc123 (guest, #117760) [Link] (1 responses)

Is there a convention for marking deprecated APIs in the kernel?

Maybe a comment added to the deprecated function definition would help folks sort out the difference between timer_setup() and setup_timer() more effectively.

timer_setup()

Posted Oct 10, 2017 20:11 UTC (Tue) by corbet (editor, #1) [Link]

There is a __deprecated marker, but it is kind of deprecated itself because it creates too much warning noise. In this case, the old interface should simply disappear in 4.15, so the opportunity for confusion should be somewhat limited.

Automating conversion to the new kernel timers API

Posted Oct 12, 2017 10:09 UTC (Thu) by jnareb (subscriber, #46500) [Link] (1 responses)

I wonder if it would be possible to convert old API to new API automatically, for example using Coccinelle: http://coccinelle.lip6.fr/ .

Automating conversion to the new kernel timers API

Posted Oct 19, 2017 6:21 UTC (Thu) by kees (subscriber, #27264) [Link]

Of the 900ish timer callsites, 600ish can (and will) be converted with Coccinelle. However, the rest need special care. One thing blocking better Coccinelle handling is shown in the conversion linked in the article: having the timer callback defined in a different file than the setup_timer(). Those I had to get creative with.

Not mentioned in the article is the even earlier API, init_timer(), which left the function and data fields to be assigned separately, open-coded. This is all getting cleaned up too. And finally, there were init_*_timer(), setup_*_timer(), and TIMER_*_INITIALIZER() variants which all just set flags on the timer. Those are also being removed.