The case of the prematurely freed SKB

By Jonathan Corbet
February 28, 2017

CVE-2017-6074 is the vulnerability identifier for a use-after-free bug in the kernel's network stack. This vulnerability is apparently exploitable in local privilege-escalation attacks. The problem, introduced in 2005, is easily fixed, but it points at a couple of shortcomings in the kernel development process; as a result, it would not be surprising if more bugs of this variety were to turn up in the near future.

One of the network subsystem's core data structures is the sk_buff structure, which often goes by the name SKB. This complex structure contains and describes a packet for its entire life cycle in the kernel. As with many kernel data structures, SKBs are reference-counted. Whenever some kernel function takes a reference to an SKB, it increments the internal reference count; the reference is released by decrementing that count. As long as the count is nonzero, a reference to the SKB exists somewhere in the kernel, so the structure itself needs to continue to exist.

Normal kernel conventions call for the existence of a pair of functions, ending in _get() and _put(), to acquire and release a reference to a data structure. Thus, for example, a reference to a kref structure is obtained with kref_get() and released with kref_put(). The networking layer does provide skb_get() which, as expected, increments the reference count on an SKB. There is also an skb_put(), but it has nothing to do with reference counts at all; instead, it increments some internal pointers to reflect that some data has been added to the packet. To release a reference to an SKB, one calls one of:

    void kfree_skb(struct sk_buff *skb);
    void consume_skb(struct sk_buff *skb);
    void dev_kfree_skb(struct sk_buff *skb);
    void  __kfree_skb(struct sk_buff *skb);

Both kfree_skb() and consume_skb() will decrement the reference count and, if the result is zero, free the structure. They differ only in how they affect the various network statistics; kfree_skb() implies that a packet was dropped, while consume_skb() implies that it was used in some way. The macro dev_kfree_skb(), intended for use in drivers, turns into a call to consume_skb().

__kfree_skb() is different: it frees the SKB unconditionally without even looking at the reference count. Needless to say, this looks like a hazardous thing to do with a reference-counted data structure. And, indeed, that is exactly where the DCCP protocol code went wrong: it called __kfree_skb() on an SKB that, with the right sequence of steps, had another reference on it, leading to a classic use-after-free vulnerability. The fix is a simple switch to consume_skb() instead.

A comment next to the __kfree_skb() implementation says: "This is an internal helper function. Users should always call kfree_skb". A quick grep shows 94 call sites to __kfree_skb() in the kernel, 46 of which are in device drivers. It is not entirely implausible to think that maybe one or two of those other calls might represent a similar bug. Clearly some auditing is indicated here, but it might also be time to review this somewhat irregular reference-counting API, which seems designed to create just this kind of vulnerability. There are also many places in the code where the SKB reference count is directly manipulated without using the accessor functions; those, too, seem relatively likely to harbor bugs.

It's worth noting that this API is far from new. The SKB reference count was added for the 2.2 kernel in 1999, and the distinction between kfree_skb() and __kfree_skb() was added with it, though the former was the only caller of the latter then. The 2.4 kernel saw more __kfree_skb() calls within the core networking code, but none in drivers. The slow spread of potentially dangerous __kfree_skb() calls is the result of an API decision made nearly twenty years ago.

Al Viro once said: "Bugs are like mushrooms - found one, look around for more." In this case, it is instructive to look at this fix to the TCP subsystem. Masayuki Nakagawa noticed a use-after-free problem in the TCP code resulting to an ill-advised call to __kfree_skb(). The problem was fixed (by calling kfree_skb() this time) — in 2007, just over ten years ago. Might this fix have been the sort of mushroom that called out for a wider search?

When the DCCP protocol implementation was merged in 2005, much of it was, shall we say, heavily influenced by the existing TCP code. So the function that had not yet been fixed by Nakagawa, called tcp_rcv_state_process(), found a strong echo in the new function dccp_rcv_state_process(); that, naturally, is the function that was patched, in a nearly identical way, to fix CVE-2017-6074. In other words, a mushroom was indeed found, but nobody thought to look in the copy-and-pasted DCCP code for a similar fungal invasion. An opportunity to fix a privilege-escalation vulnerability was lost ten years ago.

This particular vulnerability was eventually unearthed with the syzkaller fuzzer, and it is now fixed. Fuzzing is a valuable tool for the identification of problems like this. But it cannot replace the simple process of looking over the kernel code in search of bugs that are similar to those that have been recently identified. While our community values code contributions and the industry tends to support developers well, we fall down when it comes to this sort of mushroom hunting, with the result that relatively easily identified bugs remain in the code for a decade or more. It seems unlikely that the black-hat community is not making the effort to look for echoes of recently fixed bugs.

Over the last year or two, there have been signs that the kernel community is getting more serious about improving security. Some hardening work is finally making its way into the mainline, and tools like syzkaller are unearthing longstanding issues. But this vulnerability shows a couple of areas where the kernel community could be doing a lot better than it is: designing safer internal APIs and looking for repeated buggy patterns. If we truly want to be the most secure system out there, both the community and the companies that support it may want to think about how these shortcomings can be rectified.

Index entries for this article
Kernel	Security/Vulnerabilities
Security	Linux kernel

The case of the prematurely freed SKB

Posted Mar 1, 2017 7:24 UTC (Wed) by marcH (subscriber, #57642) [Link] (5 responses)

> So the function that had not yet been fixed by Nakagawa, called tcp_rcv_state_process(), found a strong echo in the new function dccp_rcv_state_process();

Can't something like coccinelle search for such "semantic copy/paste" based on the very first fix and... find mushrooming bugs?

The case of the prematurely freed SKB

Posted Mar 2, 2017 8:25 UTC (Thu) by wsa (guest, #52415) [Link] (4 responses)

Yes, coccinelle is awesome for finding other mushrooms. But picking them (=fixing bugs) requires a lot more attention.

I used to do that. When I received a bugfix in my subsystem which looked like a generic issue to me, I wrote a coccinelle script to see how widespread it was. However, fixing it automatically was seldomly possible. Usually, the coccinelle generated patches needed manual changes and a really careful review in order to avoid regressions. The code in question was often not touched for a while, so getting it upstream without being able to test it was slow (rightfully, I'd say).

That all easily exceeded the free-time capacity I had for hacking such stuff. I tried to get funding for it but failed ("no predictable deliverables"). So, I stopped writing cocci scripts because knowing where mushrooms are and then not being able to pick them gets frustrating.

The case of the prematurely freed SKB

Posted Mar 3, 2017 15:48 UTC (Fri) by grmd (guest, #4391) [Link] (3 responses)

> knowing where mushrooms are and then not being able to pick them gets frustrating

Would it have been possible to identify likely errors and raise them for others to consider?

The case of the prematurely freed SKB

Posted Mar 3, 2017 17:23 UTC (Fri) by wsa (guest, #52415) [Link] (2 responses)

Well, identifying the real problems from false positives is one big chunk of the work already. coccinelle is awesome for finding points of interest, but they all need proper review and second thought. And posting lists with false positives is not a good idea IMO.

The case of the prematurely freed SKB

Posted Mar 3, 2017 18:15 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

> And posting lists with false positives is not a good idea IMO.

Lists not. Maintainers why not?

The case of the prematurely freed SKB

Posted Mar 3, 2017 19:34 UTC (Fri) by wsa (guest, #52415) [Link]

Frankly, didn't think of that. Has pros and cons, but yeah, might be an option for some cases.

The case of the prematurely freed SKB

Posted Mar 1, 2017 11:01 UTC (Wed) by knuto (subscriber, #96401) [Link]

Note that *all* the 46 calls to __kfree_skb in drivers in v4.10 are in *cxgb* = Chelsio drivers,
so at least in the driver space the problem is "contained".

The case of the prematurely freed SKB

Posted Mar 2, 2017 9:29 UTC (Thu) by oshepherd (guest, #90163) [Link] (9 responses)

I realise that it's a convention now and probably impossible and more dangerous to change, but... I can't help but feel _get and _put postfixes for increasing and decreasing the reference count are truly atrocious naming.

AddRef/DecRef, Retain/Release. There are just so many better options for naming ref counting functions. Bonus points if your AddRef/Retain function returns a pointer to the object you're increasing the ref count on, so you can do it while assigning...

The case of the prematurely freed SKB

Posted Mar 2, 2017 21:32 UTC (Thu) by neilbrown (subscriber, #359) [Link] (5 responses)

"Retain" and "AddRef" are each 6 characters. "get" is 3 characters.
"get" wins easily.

I agree that having "get" return the pointer is a win.

The case of the prematurely freed SKB

Posted Mar 3, 2017 8:47 UTC (Fri) by oshepherd (guest, #90163) [Link]

A 3 character length increase doesn't bother me compared to the overhead of thinking "skbuf_get? get what?"

The case of the prematurely freed SKB

Posted Mar 3, 2017 13:14 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

What about inc and dec?

The case of the prematurely freed SKB

Posted Mar 3, 2017 18:18 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

> "Retain" and "AddRef" are each 6 characters. "get" is 3 characters. "get" wins easily.

Are you being serious? Granted: https://martinfowler.com/bliki/TwoHardThings.html

but... 3 characters winning over 6, really?

The case of the prematurely freed SKB

Posted Mar 3, 2017 21:29 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

> but... 3 characters winning over 6, really?

Yes, really. These things are used a lot. Common words tend to be shorter than uncommon word, and for good reason. Less wasted space for example.
It doesn't really matter what the letters are as long as they are used consistently.
If you made a convention that "thing_foo" incremented the ref count and returned the pointer, and "thing_bar" decremented the refcount and discarded when it became zero, then we would all add "_foo" and "_bar" to our language fairly quickly and there would be no confusion.
I could argue that would be better than the current situation where we mostly use "get" and "put", but they don't always mean quite the same thing.

When new people come to the code and see "get" and "put" it might seem like it makes it easy for them. But it can just as easily lead them to think they understand something that they don't. Using more letters might just give them more confidence that the meaning of the word is "obvious", which is probably isn't.

The case of the prematurely freed SKB

Posted Mar 3, 2017 21:46 UTC (Fri) by marcH (subscriber, #57642) [Link]

Then it hurts me to say this but you want acronyms. They're short AND sure not to be too tainted by plain English preconceptions.

(It hurts me because I live in a world of massive abuse of acronyms)

The case of the prematurely freed SKB

Posted Mar 6, 2017 12:41 UTC (Mon) by robbe (guest, #16131) [Link] (1 responses)

You did not state *why* you think get/put is that bad. What would someone unfamiliar with the kernel conventions think these functions do?

The case of the prematurely freed SKB

Posted Mar 6, 2017 23:13 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

One could think of it as getting or putting a count into the counter rather than getting a reference and the returning it. I had the same problem for the longest time with the power symbol. I saw it as 0 for a closed circuit and 1 for an open circuit. Exactly backwards.

The case of the prematurely freed SKB

Posted Mar 9, 2017 13:27 UTC (Thu) by bcopeland (subscriber, #51750) [Link]

> I realise that it's a convention now and probably impossible and more dangerous to change, but... I can't help but feel _get and _put postfixes for increasing and decreasing the reference count are truly atrocious naming.

My understanding is that this convention goes way back. At least 6th edition UNIX used iget() and iput() to maintain reference counts (e.g. see p.85 in http://www.lemis.com/grog/Documentation/Lions/book.pdf). So it not only (probably) predates people using "retain" as the verb here, but it also predates Linux.