If you don't understand why using a possibly NULL pointer _before_ checking it is an obvious bug, the joke is on you. And it isn't even a joke - it is sad.
The bug itself is extremely trivial and glaringly obvious. There is no need to involve the C standard to understand it. Suggesting a "compiler bugs" for this, I am sorry I have to say it, is quite silly.
As I already said, the exploitation of the bug however is creative and impressive - my hat is off for finding the bug and exploiting it to whoever did it.
Posted Jul 17, 2009 20:34 UTC (Fri) by corbet (editor, #1)
[Link]
There is an interesting aspect to this that nobody has mentioned: I have seen reviewers advising the removal of NULL pointer checks with the reasoning that, should a NULL pointer ever come about, the resulting kernel oops is just as useful as a "this pointer was null!" message. In the mean time, the overhead of the check can be avoided.
The idea, clearly, is that the memory management hardware can be counted on to do the NULL check so there's no need to do it in software too. But if that address can be mapped, that reasoning clearly does not hold. I don't doubt there are other situations like this one in the kernel code; for most systems, disallowing mappings at zero seems like a reasonable step to take.
Removing NULL checks
Posted Jul 17, 2009 21:02 UTC (Fri) by corbet (editor, #1)
[Link]
I had to dig a bit, but here's an example of developers being advised that some NULL checks are not needed. In retrospect, that doesn't seem like the best advice...
Removing NULL checks
Posted Jul 17, 2009 21:06 UTC (Fri) by stevenb (guest, #11536)
[Link]
Anyway, the "it appears to be a GCC bug in the end." bit in the LWN story about this is unjustified IMHO.
Removing NULL checks
Posted Jul 17, 2009 22:39 UTC (Fri) by nix (subscriber, #2304)
[Link]
It would be an excessive optimization, if it wasn't that GCC has a way to
turn it off... which the kernel is not using.
Removing NULL checks
Posted Jul 17, 2009 22:56 UTC (Fri) by mikov (subscriber, #33179)
[Link]
I don't think that disabling this option for a kernel compile is reasonable. It is not reasonable to have to check every new optimization for every new version of a compiler, assuming the optimizations themselves are valid.
What is reasonable on the other hand is to know C and to write valid C code, which doesn't break when a butterfly flaps its wings. I still can't understand why everybody is making such a big deal over such a trivial bug. It is similar to relying on the value of "(a++) + (a++)" - basic C stuff that hopefully everybody knows by now.
If this was an user app nobody would think twice about it. So, what, the kernel is an excuse for sloppy coding?? I would hope that the opposite was true...
Removing NULL checks
Posted Jul 17, 2009 23:33 UTC (Fri) by stevenb (guest, #11536)
[Link]
Posted Jul 17, 2009 21:44 UTC (Fri) by mikov (subscriber, #33179)
[Link]
I think there are three completely orthogonal issues here, which seems to be confusing people:
1. The kernel is (mostly) written in C and uses a general purpose C compiler (even more than one, if you count Intel). So the kernel should follow the C rules (and the C standard), even if on the surface it seems that these rules don't always make sense in the kernel context. For example, under DOS dereferencing a NULL pointer would not generate any kind of error. That doesn't mean that it is a good practice to remove NULL pointer checks for all reads. It is still undefined behavior according to the C standard and any C compiler for DOS is free to apply the same (correct!) optimization.
I think what people don't realize is that such optimizations are not only valid, but necessary. While there are the few exceptional cases like this one where they seem to get in the way, they are useful in the majority of cases.
2. Whether the _tun_ pointer should ever be expected to be NULL in this point. I have no idea, but if it is supposed to have been checked earlier, it is reasonable not to check again and this is a simple logical bug.
3. Whether the kernel should trap NULL pointer accesses in hardware for kernel code. Again I don't know, but personally I prefer that it did, unless there is good reason not to. Is there one?
Still relying on this can be risky, for example:
int * a = NULL;
x = a[10000];
The first page may be unmapped, but we are not accessing it.
Removing NULL checks
Posted Jul 17, 2009 23:24 UTC (Fri) by nowster (subscriber, #67)
[Link]
This would never have happened if the kernel were written in some modern language more recent than C. I propose rewriting the kernel in INTERCAL.
Removing NULL checks
Posted Jul 19, 2009 11:13 UTC (Sun) by eupator (guest, #44581)
[Link]
I suspect you may underestimate the modernism of C or the classicism of INTERCAL.
Removing NULL checks
Posted Jul 18, 2009 19:09 UTC (Sat) by darwish07 (subscriber, #49520)
[Link]
You have a nice memory: I remember being advised to strip a BUG_ON line since the kernel OOPS mechanism will handle it - http://lkml.org/lkml/2008/2/13/71
I'm not sure if (or how) can this be exploited though ...
Removing NULL checks
Posted Jul 20, 2009 10:03 UTC (Mon) by makomk (guest, #51493)
[Link]
Hopefully, that shouldn't be exploitable. I'm pretty sure the kernel
policy is that userspace shouldn't be able to trigger BUG_ONs
deliberately, so removing one in favour of just dereferencing the null
pointer ought to be safe assuming there isn't a bug elsewhere. (Of course,
if there is, it could turn it into a security vulnerability.)
Removing NULL checks
Posted Jul 20, 2009 12:46 UTC (Mon) by spender (subscriber, #23067)
[Link]
Andrew Morton is wrong, a BUG_ON is in no way equivalent to just dereferencing a null. What if I have my page mapped at NULL?
If that author adds one more call to smack_netlabel and forgets to check for sock->sk == NULL before calling it, instead of just causing an OOPS he's now potentially created a vulnerability which can result in arbitrary code execution.
-Brad
Nonsense. This is trivial stuff.
Posted Jul 18, 2009 1:12 UTC (Sat) by spender (subscriber, #23067)
[Link]
Bug that causes a trivial oops (only terminates the 'exploit' process) is not security relevant.
Bug that results in arbitrary code execution due to a gcc optimization is both clever and important.
Or are you stuck in the mindset that all bugs are equal?
People aren't involving the C standard or compiler optimizations to talk about a bug, they're talking about how something which by the appearance of the source is unexploitable can be turned exploitable.
-Brad
Nonsense. This is trivial stuff.
Posted Jul 18, 2009 2:02 UTC (Sat) by mikov (subscriber, #33179)
[Link]
Bug that results in arbitrary code execution due to a gcc optimization is both clever and important.
I would not qualify a bug as clever. Perhaps you mean the exploit, and yes, it is indeed very clever.
The code optimization angle is not really that important. I am sure that there are many other exploits that depend on how GCC generates its code, so this is no different. For example exploits that depend on the layout of variables in the stack. It is exactly the same thing.
In this case however people are getting confused because the explanation for the exploit depends on parts of the C language which are not so well known.
People aren't involving the C standard or compiler optimizations to talk about a bug, they're talking about how something which by the appearance of the source is unexploitable can be turned exploitable.
Nonsense. Any bug is potentially exploitable - you simply never know. No experienced developer would decide that a bug is 100% un-exploitable, especially one with an invalid pointer dereference. Of course it is not obvious how to exploit it, but it never is.
All crashing bugs are very very important.
Nonsense. This is trivial stuff.
Posted Jul 19, 2009 0:55 UTC (Sun) by xilun (subscriber, #50638)
[Link]
A good reviewer will not rule out possibility of exploitation, because: 1) it is undefined behavior, 2) NULL pointer dereferencing had been exploited before. A good reviewer will certainly _not_ think just by reading the source that this is unexploitable.
How might we eliminate undefined behaviour?
Posted Jul 20, 2009 0:41 UTC (Mon) by xoddam (subscriber, #2322)
[Link]
Question for language lawyers:
If the behaviour of a program is undefined, we can't rule out the risk that it is exploitable.
So how might we move towards eliminating undefined behaviour from a given piece of software?
It would be nice if the compiler was able to produce meaningful warnings everywhere that the program might ever do anything undefined.
Of course this is impossible in the general case because the task of static analysis is NP-complete (mostly because the program's inputs are not known to the compiler).
However there is surely a way to produce *many* such meaningful warnings. In particular, it must be possible wherever the compiler is presently able to exploit the undefined-ness of an operation to make an optimisation.
Of course if we were to attempt this immediately we would start with a huge, huge number of false positives due to race conditions and the like which are for the most part not bothersome in practice.
If it were then possible to gradually reduce these warnings by meaningful annotations and the judicious introduction of memory barriers and relevant runtime checking, any progress would then be most reassuring.
How might we eliminate undefined behaviour?
Posted Jul 20, 2009 1:41 UTC (Mon) by mikov (subscriber, #33179)
[Link]
A poster in The Register (Steve 105) gave a good example of why a warning in this case, while it does seem desirable at first sight, is probably not a good idea in general:
Surely the reason for the optimisation is (among other things) code like this:
If foo gets inlined into bar, the compiler can spot that the null pointer check in the inlined code is unnecessary and remove it. This is a most excellent optimisation (granted, in this example foo and bar do so little work that other optimisations may render it unnecessary).
I think GCC would do good if it reported this a warning iff using the pointer and checking the pointer are in the same original function - that is the optimization didn't appear as a result of other optimizations like global inlining, etc.
That would have caught our case. I am not sure how difficult it would to implement this distinction though - I suspect quite.
How might we eliminate undefined behaviour?
Posted Jul 20, 2009 2:34 UTC (Mon) by xoddam (subscriber, #2322)
[Link]
Point taken; there is some subltety involved :-)
But I do think that we can reasonably ask the compiler to distinguish between cases where an optimisation is *demonstrably* safe (ie. the result can be determined at compile time, as in your inlining case) and cases where the optimisation can result in totally different behaviour between naive and optimised object code.
The particular optimisation that resulted in this particular vulnerability does not follow from eager evaluation of known quantities but from an assumption on the part of the compiler that the program is so correct as to never step outside the language specification. Since the spec leaves so much undefined, and programmers are human, and input is untrusted, this assumption is less than completely valid -- and therefore such optimisations merit at the very least a BIG FAT WARNING.
How might we eliminate undefined behaviour?
Posted Jul 20, 2009 3:01 UTC (Mon) by MisterIO (guest, #36192)
[Link]
I think you're mixing two different things. 1 is the warning or something similar that could be emitted in a clear bug like the one in the original code(and some static checkers should really find that kind of bug), 2 is the gcc's optimization, that has nothing to do with the bug. Sure, the optimization allows the exploitation of the bug, but the warning about the bug doesn't have anything to do with it and, even without the optimization, the bug is still there so the warning makes sense. I'm not saying that gcc should warn about it at all costs, but it's clearly something that a static checker should be able to do. For example it seems like cppcheck has accepted my suggestion to include it among its checks.
How might we eliminate undefined behaviour?
Posted Jul 20, 2009 9:33 UTC (Mon) by stevenb (guest, #11536)
[Link]
Inlining is just one of the problems, but in general "good warnings" and "optimization" don't go well together.
It shouldn't be too hard to warn about this in GCC before inlining in recent GCCs. GCC internals could look like this, for the current trunk (r149803):
* put a new pass after pass_early_warn_uninitialized (which runs before inlining), say pass_null_pointer_check
* look for null-pointer checks in the function
* for every found null-pointer check, see if there is a dereference of the pointer that dominates the check.
Implementing the details is left as an exercise for the reader. This would only warn when optimizing.
The problem is that you could actually need optimizations to get a reliable warning. If you try to warn before doing any optimizations (including inlining) then you wouldn't warn about a snippet like this one:
It shouldn't be very hard to construct cases where you get missed warnings or false positives depending on what optimizations you do before figuring out what to warn about.
If you warn after optimizing (including inlining perhaps) you may get lots of false positives. But if you warn before optimizations, you may not warn for cases that are obvious by inspection.
Nonsense. This is trivial stuff.
Posted Jul 20, 2009 9:38 UTC (Mon) by tialaramex (subscriber, #21167)
[Link]
Bug that causes a trivial oops (only terminates the 'exploit' process) is not security relevant.
-sigh-
Suppose there is a process which is supposed to watch a log and take certain actions (maybe alter a firewall rule) depending on what it sees in the logs. Now suppose there's a bug in that program, all the bug enables me (the attacker) to do is force it to call the read syscall with the log fd, a buf value of my choosing (maybe from bytes in the log) and count of zero.
On the face of it, this seems useless for me as an attacker.
Suppose I find a way to get read() to Oops if the buf value is N-1 for a particular magic value N, regardless of the value of count.
On the face of it, this too seems useless. A real security "expert" from the Internet has just assured us that it's not security relevant. But..
Combine them, and I've disabled the firewall tweaking log file reader. This was defending against brute force attacks on a network service, which I promptly break into.
Nonsense. This is trivial stuff.
Posted Jul 20, 2009 12:23 UTC (Mon) by spender (subscriber, #23067)
[Link]
Your post is the perfect reason why I'm the security expert and you're not.
Your entire "suppose I find a way" argument is based off a non-existent bug. I was talking about the vulnerability I exploited. The only way you're going to turn that into a security problem for this firewall app you've invented is if you can get arbitrary code execution in the firewall app to force it to open a device it never wanted to open in the first place, and then perform an additional poll on the device that it never wanted to do in the first place. If you have arbitrary code execution within the firewall app, you might as well just call kill (or just fail with your arbitrary code execution and crash the process)!