Improving the kernel timers API
It is worth noting that the kernel has two core timer mechanisms. One of those — the high-resolution timer (or "hrtimer") — subsystem, is focused on near-term events where the timer is expected to run to completion. The other subsystem is just called "kernel timers"; it offers less precision but is more efficient in situations where the timer will probably be canceled before it fires. There are many places in the kernel where timers are used to detect when a device or a network peer has failed to respond within the expected time; when, as usual, the expected response does happen, the timer is canceled. Kernel timers are well suited to that kind of use. The work at hand focuses on that second type of timer.
Kernel timers are described by the timer_list structure, defined in <linux/timer.h>:
struct timer_list {
unsigned long expires;
void (*function)(unsigned long);
unsigned long data;
/* ... other stuff elided ... */
}
The expires field contains the expiration time of the timer (in jiffies); on expiration, function() will be called with the given data value. It is possible to fill in a timer_list structure manually, but it is more common to use the setup_timer() macro:
void setup_timer(timer, function, data);
There are a number of issues with this API, as argued by Kees Cook. The data field bloats the timer_list structure unnecessarily and, as an unadorned unsigned long value, it resists any sort of type checking. It is not uncommon for callers to cast pointer values to and from this value, for example. For these reasons, it is far more common in current kernel APIs to dispense with the data field and, instead, just pass a pointer to the relevant structure (the timer_list structure in this case) to the callback. Likely as not, that structure is embedded within a larger structure containing the information the callback needs anyway, so a simple container_of() call can replace the casting of the unsigned long value.
As might be expected, though, Cook has concerns about this API that go
beyond matching the current kernel style. One of those is that a buffer
overflow in the area of a timer_list structure may be able to
overwrite both the function pointer and the data passed to the called
function, allowing arbitrary calls within the kernel. That, naturally,
makes timer_list structures interesting to attackers, and explains
why Cook has been trying to harden timers
for a while. The
prototype of the timer callback, containing a single unsigned long
argument, is also evidently an impediment to "future control flow
integrity work
". It would be better if the callback had a unique
prototype that was visibly different from all of the other kernel functions
taking an unsigned long argument.
Cook has been working on changes to the timer interface for a while in an attempt to address these issues. The core idea is simple: get rid of the data value and just pass the timer_list structure to the timeout function. The actual transition, though, is complicated by the existence of 800 or so setup_timer() call sites in the kernel now. Trying to change them all at once would not be anybody's idea of fun, so a phased approach is needed.
In this case, Cook has introduced a new function for the initialization of timers:
void timer_setup(struct timer_list *timer, void (*callback)(struct timer_list *),
unsigned int flags);
For the time being, timer_setup() simply stores a pointer to timer in the data field. Note that the prototype of the callback has changed to expect the timer_list pointer.
With that function in place, calls to setup_timer() can be replaced at leisure, as long as each corresponding timer callback function is adjusted accordingly. For the most part, as can be seen in this example, the changes are trivial. Many timer callbacks already were casting the data value to a pointer to the structure they needed; they just need a one-line change to obtain that from the timer_list pointer instead. A new from_timer() macro has been added to make those conversions a bit less verbose.
The addition of timer_setup() was merged just prior to the
4.14-rc3 release — rather later in the release cycle than one would
ordinarily expect to see the addition of new interfaces. The purpose of
this timing was clear enough: it clears the way for the conversion of all
of those setup_timer() calls, a task which, it is hoped, will be
completed for the 4.15 kernel release. Once that is done, the underlying
implementation can be changed to drop the data value and the
setup_timer() interface can be removed entirely. At the end,
the kernel will
be equipped with a timer mechanism that is a little bit more efficient,
more like other in-kernel APIs, and easier to secure.
| Index entries for this article | |
|---|---|
| Kernel | Security/Kernel hardening |
| Kernel | Timers |
Posted Oct 10, 2017 16:48 UTC (Tue)
by jhoblitt (subscriber, #77733)
[Link] (8 responses)
Seriously, why does C allow casting between non-pointer and pointers types? It would be nice if there was a gcc pragma to disable that.
Posted Oct 10, 2017 20:25 UTC (Tue)
by mpr22 (subscriber, #60784)
[Link]
Because in a lot of the environments where people still think using C makes sense, sometimes you need to be able to compose an integer from part of a pointer, or a pointer from parts of integers.
Posted Oct 11, 2017 7:17 UTC (Wed)
by walken (subscriber, #7089)
[Link] (4 responses)
So that you can implement malloc().
Posted Oct 11, 2017 9:57 UTC (Wed)
by sdalley (subscriber, #18550)
[Link] (3 responses)
The under-the-hood implementation of malloc might require casting between integers and pointers, although I've always managed with just pointers and pointer arithmetic.
Use of C casts is almost always a sign of poor programming and potential bugs, and/or lingering heritage of code before prototypes or const qualifiers etc were a thing. The need for casts very often goes away with correct prototyping, or judicious use of a union.
Posted Oct 20, 2017 19:53 UTC (Fri)
by Jandar (subscriber, #85683)
[Link]
!=
> Use of malloc
Posted Oct 12, 2018 3:07 UTC (Fri)
by samiam95124 (guest, #120873)
[Link] (1 responses)
struct timer_list {
...
}
struct timer_list *get_timer_list(void) { return (struct timer_list*) malloc(sizeof(timer_list)); }
etc.
I come from a strongly typed language background, so I use that paradigm a lot in my code.
The reason why you cast from malloc, even though it returns void, is so you have some idea what type is being converted into, not just leaving it implicit.
PS if you use the get_xxx_entry() and put_xxx_entry() paradigm in your code, you create a centerpoint for the allocation and deallocation of types, and can do advanced things based on that. For example, I typically count my structures by type, +1 for each get() and -1 for each put(). This allows you to assert() for things like underflow (more puts than gets), and non-zero counts after clean up/shutdown routines.
In addition, probably because I distrust storage allocators, I keep most of my structures in a free list on put(), and pass them out again in get(). This reduces fragmention, as well as supporting the idea that if you use a certain data structure once, you are probably going to use them again (and again). Finally, using a single #ifdef flag you can turn off reuse/recycle in the get()/put() routines and use that to trace down use after dispose and similar bugs. If you turn off recycle and your bug goes away, there you go.
Posted Oct 12, 2018 19:10 UTC (Fri)
by sdalley (subscriber, #18550)
[Link]
And I share your preference for pre-allocating free lists for specific things. This kind of thing comes into its own in hard-realtime programming, when you can't afford to have non-deterministic mallocs inserting random delays in high priority threads.
Posted Oct 12, 2017 6:02 UTC (Thu)
by error27 (subscriber, #8346)
[Link]
http://repo.or.cz/smatch.git/commitdiff/5fe8df0add883597c...
It was part of infrastructure to link function pointers to their data arg...
Posted Oct 12, 2017 7:46 UTC (Thu)
by iq-0 (subscriber, #36655)
[Link]
Because it lacks generics and generic types and most uses are in some way of storing or passing along some unknown thing. Some uses might be handled through 'union's, but those are hardly better and won't work in truly generic decoupled code.
The fact that some uses in the kernel are extremely sensitive to how large some variables are only makes it worse.
Posted Oct 10, 2017 18:07 UTC (Tue)
by josh (subscriber, #17465)
[Link] (3 responses)
Posted Oct 10, 2017 18:28 UTC (Tue)
by corbet (editor, #1)
[Link] (2 responses)
Posted Oct 10, 2017 20:09 UTC (Tue)
by dc123 (guest, #117760)
[Link] (1 responses)
Maybe a comment added to the deprecated function definition would help folks sort out the difference between timer_setup() and setup_timer() more effectively.
Posted Oct 10, 2017 20:11 UTC (Tue)
by corbet (editor, #1)
[Link]
Posted Oct 12, 2017 10:09 UTC (Thu)
by jnareb (subscriber, #46500)
[Link] (1 responses)
Posted Oct 19, 2017 6:21 UTC (Thu)
by kees (subscriber, #27264)
[Link]
Not mentioned in the article is the even earlier API, init_timer(), which left the function and data fields to be assigned separately, open-coded. This is all getting cleaned up too. And finally, there were init_*_timer(), setup_*_timer(), and TIMER_*_INITIALIZER() variants which all just set flags on the timer. Those are also being removed.
Improving the kernel timers API
Improving the kernel timers API
Seriously, why does C allow casting between non-pointer and pointers types?
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
Improving the kernel timers API
It needed a new name so it could coexist with the old interface, and subsystem_action() is the preferred pattern these days. I'm not sure what a better choice would have been.
timer_setup()
timer_setup()
There is a __deprecated marker, but it is kind of deprecated itself because it creates too much warning noise. In this case, the old interface should simply disappear in 4.15, so the opportunity for confusion should be somewhat limited.
timer_setup()
Automating conversion to the new kernel timers API
Automating conversion to the new kernel timers API
