The case of the prematurely freed SKB
One of the network subsystem's core data structures is the sk_buff structure, which often goes by the name SKB. This complex structure contains and describes a packet for its entire life cycle in the kernel. As with many kernel data structures, SKBs are reference-counted. Whenever some kernel function takes a reference to an SKB, it increments the internal reference count; the reference is released by decrementing that count. As long as the count is nonzero, a reference to the SKB exists somewhere in the kernel, so the structure itself needs to continue to exist.
Normal kernel conventions call for the existence of a pair of functions, ending in _get() and _put(), to acquire and release a reference to a data structure. Thus, for example, a reference to a kref structure is obtained with kref_get() and released with kref_put(). The networking layer does provide skb_get() which, as expected, increments the reference count on an SKB. There is also an skb_put(), but it has nothing to do with reference counts at all; instead, it increments some internal pointers to reflect that some data has been added to the packet. To release a reference to an SKB, one calls one of:
    void kfree_skb(struct sk_buff *skb);
    void consume_skb(struct sk_buff *skb);
    void dev_kfree_skb(struct sk_buff *skb);
    void  __kfree_skb(struct sk_buff *skb);
Both kfree_skb() and consume_skb() will decrement the reference count and, if the result is zero, free the structure. They differ only in how they affect the various network statistics; kfree_skb() implies that a packet was dropped, while consume_skb() implies that it was used in some way. The macro dev_kfree_skb(), intended for use in drivers, turns into a call to consume_skb().
__kfree_skb() is different: it frees the SKB unconditionally without even looking at the reference count. Needless to say, this looks like a hazardous thing to do with a reference-counted data structure. And, indeed, that is exactly where the DCCP protocol code went wrong: it called __kfree_skb() on an SKB that, with the right sequence of steps, had another reference on it, leading to a classic use-after-free vulnerability. The fix is a simple switch to consume_skb() instead.
A comment next to the
__kfree_skb() implementation says: 
"This is an internal helper function. Users should always call
kfree_skb
".  A quick grep shows 94 call sites to
__kfree_skb() in the kernel, 46 of which are in device
drivers.  It is not entirely implausible to think that maybe one or two of
those other calls might represent a similar bug.  Clearly some auditing is
indicated here, but it might also be time to review this somewhat irregular
reference-counting API, which seems designed to create just this kind of
vulnerability.  There are also many places in the code where the SKB
reference count is directly manipulated without using the accessor
functions; those, too, seem relatively likely to harbor bugs.
It's worth noting that this API is far from new. The SKB reference count was added for the 2.2 kernel in 1999, and the distinction between kfree_skb() and __kfree_skb() was added with it, though the former was the only caller of the latter then. The 2.4 kernel saw more __kfree_skb() calls within the core networking code, but none in drivers. The slow spread of potentially dangerous __kfree_skb() calls is the result of an API decision made nearly twenty years ago.
Al Viro once said: "Bugs are like
mushrooms - found one, look around for more.
"  In this case, it is
instructive to look at this
fix to the TCP subsystem.  Masayuki Nakagawa noticed a use-after-free
problem in the TCP code resulting to an ill-advised call to
__kfree_skb().  The problem was fixed (by calling
kfree_skb() this time) — in 2007, just over ten years ago.  Might
this fix have been the sort of mushroom that called out for a wider search? 
When the DCCP protocol implementation was merged in 2005, much of it was, shall we say, heavily influenced by the existing TCP code. So the function that had not yet been fixed by Nakagawa, called tcp_rcv_state_process(), found a strong echo in the new function dccp_rcv_state_process(); that, naturally, is the function that was patched, in a nearly identical way, to fix CVE-2017-6074. In other words, a mushroom was indeed found, but nobody thought to look in the copy-and-pasted DCCP code for a similar fungal invasion. An opportunity to fix a privilege-escalation vulnerability was lost ten years ago.
This particular vulnerability was eventually unearthed with the syzkaller fuzzer, and it is now fixed. Fuzzing is a valuable tool for the identification of problems like this. But it cannot replace the simple process of looking over the kernel code in search of bugs that are similar to those that have been recently identified. While our community values code contributions and the industry tends to support developers well, we fall down when it comes to this sort of mushroom hunting, with the result that relatively easily identified bugs remain in the code for a decade or more. It seems unlikely that the black-hat community is not making the effort to look for echoes of recently fixed bugs.
Over the last year or two, there have been signs that the kernel community
is getting more serious about improving security.  Some hardening work is
finally making its way into the mainline, and tools like syzkaller are
unearthing longstanding issues.  But this vulnerability shows a couple of
areas where the kernel community could be doing a lot better than it is:
designing safer internal APIs and looking for repeated buggy patterns.  If we
truly want to be the most secure system out there, both the community and
the companies that support it may want to think about how these
shortcomings can be rectified.
| Index entries for this article | |
|---|---|
| Kernel | Security/Vulnerabilities | 
| Security | Linux kernel | 
      Posted Mar 1, 2017 7:24 UTC (Wed)
                               by marcH (subscriber, #57642)
                              [Link] (5 responses)
       
> So the function that had not yet been fixed by Nakagawa, called tcp_rcv_state_process(), found a strong echo in the new function dccp_rcv_state_process(); 
Can't something like coccinelle search for such "semantic copy/paste" based on the very first fix and... find mushrooming bugs? 
     
    
      Posted Mar 2, 2017 8:25 UTC (Thu)
                               by wsa (guest, #52415)
                              [Link] (4 responses)
       
I used to do that. When I received a bugfix in my subsystem which looked like a generic issue to me, I wrote a coccinelle script to see how widespread it was. However, fixing it automatically was seldomly possible. Usually, the coccinelle generated patches needed manual changes and a really careful review in order to avoid regressions. The code in question was often not touched for a while, so getting it upstream without being able to test it was slow (rightfully, I'd say). 
That all easily exceeded the free-time capacity I had for hacking such stuff. I tried to get funding for it but failed ("no predictable deliverables"). So, I stopped writing cocci scripts because knowing where mushrooms are and then not being able to pick them gets frustrating. 
     
    
      Posted Mar 3, 2017 15:48 UTC (Fri)
                               by grmd (guest, #4391)
                              [Link] (3 responses)
       
Would it have been possible to identify likely errors and raise them for others to consider? 
     
    
      Posted Mar 3, 2017 17:23 UTC (Fri)
                               by wsa (guest, #52415)
                              [Link] (2 responses)
       
     
    
      Posted Mar 3, 2017 18:15 UTC (Fri)
                               by marcH (subscriber, #57642)
                              [Link] (1 responses)
       
Lists not. Maintainers why not? 
     
    
      Posted Mar 3, 2017 19:34 UTC (Fri)
                               by wsa (guest, #52415)
                              [Link] 
       
     
      Posted Mar 1, 2017 11:01 UTC (Wed)
                               by knuto (subscriber, #96401)
                              [Link] 
       
 
     
      Posted Mar 2, 2017 9:29 UTC (Thu)
                               by oshepherd (guest, #90163)
                              [Link] (9 responses)
       I realise that it's a convention now and probably impossible and more dangerous to change, but... I can't help but feel _get and _put postfixes for increasing and decreasing the reference count are truly atrocious naming.
 AddRef/DecRef, Retain/Release. There are just so many better options for naming ref counting functions. Bonus points if your AddRef/Retain function returns a pointer to the object you're increasing the ref count on, so you can do it while assigning...
      
           
     
    
      Posted Mar 2, 2017 21:32 UTC (Thu)
                               by neilbrown (subscriber, #359)
                              [Link] (5 responses)
       
I agree that having "get" return the pointer is a win. 
 
     
    
      Posted Mar 3, 2017 8:47 UTC (Fri)
                               by oshepherd (guest, #90163)
                              [Link] 
       
     
      Posted Mar 3, 2017 13:14 UTC (Fri)
                               by mathstuf (subscriber, #69389)
                              [Link] 
       
     
      Posted Mar 3, 2017 18:18 UTC (Fri)
                               by marcH (subscriber, #57642)
                              [Link] (2 responses)
       
Are you being serious? Granted: https://martinfowler.com/bliki/TwoHardThings.html 
but... 3 characters winning over 6, really? 
 
     
    
      Posted Mar 3, 2017 21:29 UTC (Fri)
                               by neilbrown (subscriber, #359)
                              [Link] (1 responses)
       
Yes, really.  These things are used a lot.  Common words tend to be shorter than uncommon word, and for good reason.  Less wasted space for example. 
When new people come to the code and see "get" and "put" it might seem like it makes it easy for them.  But it can just as easily lead them to think they understand something that they don't. Using more letters might just give them more confidence that the meaning of the word is "obvious", which is probably isn't. 
 
     
    
      Posted Mar 3, 2017 21:46 UTC (Fri)
                               by marcH (subscriber, #57642)
                              [Link] 
       
(It hurts me because I live in a world of massive abuse of acronyms) 
     
      Posted Mar 6, 2017 12:41 UTC (Mon)
                               by robbe (guest, #16131)
                              [Link] (1 responses)
       
     
    
      Posted Mar 6, 2017 23:13 UTC (Mon)
                               by mathstuf (subscriber, #69389)
                              [Link] 
       
     
      Posted Mar 9, 2017 13:27 UTC (Thu)
                               by bcopeland (subscriber, #51750)
                              [Link] 
       
My understanding is that this convention goes way back.  At least 6th edition UNIX used iget() and iput() to maintain reference counts (e.g. see p.85 in http://www.lemis.com/grog/Documentation/Lions/book.pdf).  So it not only (probably) predates people using "retain" as the verb here, but it also predates Linux. 
     
    The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
so at least in the driver space the problem is "contained".
The case of the prematurely freed SKB
      The case of the prematurely freed SKB
      
"get" wins easily.
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
It doesn't really matter what the letters are as long as they are used consistently.
If you made a convention that "thing_foo" incremented the ref count and returned the pointer, and "thing_bar" decremented the refcount and discarded when it became zero, then we would all add "_foo" and "_bar" to our language fairly quickly and there would be no confusion.
I could argue that would be better than the current situation where we mostly use "get" and "put", but they don't always mean quite the same thing. 
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
The case of the prematurely freed SKB
      
 
           