Fun with NULL pointers, part 2

By Jonathan Corbet
July 21, 2009

Fun with NULL pointers, part 1 took a detailed look at the long chain of failures which allowed the kernel to be compromised by way of a NULL pointer dereference. Eliminating that particular bug was a straightforward fix; it was, in fact, fixed before the nature of the vulnerability was widely understood. The importance of this particular problem is, in one sense, relatively small; there are very few distributions which shipped vulnerable versions of the kernel. But this exploit suggests that there could be a whole class of related problems in the kernel; there is a definite chance that similar vulnerabilities could be discovered - if, indeed, they have not already been found.

One obvious problem is that ~~when the security module mechanism is configured into the kernel, the administrator-specified limits on the lowest valid user-space virtual address are ignored~~ security modules are allowed to override the administrator-specified limit (mmap_min_addr) on the lowest valid user-space address. This behavior is a violation of the understanding by which security modules operate: they are supposed to be able to restrict privileges, but never increase them. In this case, the mere presence of SELinux increased privilege, and the policy enforced by most SELinux deployments failed to close that hole (comments in the exploit code suggest that AppArmor fared no better).

Additionally, with security modules configured out entirely, mmap_min_addr was not enforced at all. The mainline now has a patch which causes the map_min_addr sysctl knob to always be in effect; this patch has also been put into the 2.6.27.27 and 2.6.30.2 updates (as have many of the others described here).

Things are also being fixed at the SELinux level. Future versions of Red Hat's SELinux policy will no longer allow unconfined (but otherwise unprivileged) processes to map pages into the bottom of the address space. There are still some open problems, though, especially when programs like WINE are thrown into the mix. It's not yet clear how the system can securely support a small number of programs needing the ability to map the zero page. Ideas like running WINE with root privilege - thus, perhaps, carrying Windows-like behavior a little too far - have garnered little enthusiasm.

There is another way around map_min_addr which also must be addressed: a privileged process which is run under the SVR4 personality will, at exec() time, have a read-only page mapped at the zero address. Evidently some old SVR4 programs expect that page to be there, but its presence helps to make null-pointer exploits possible. So another patch merged into mainline and the stable updates resets the SVR4 personality (or, at least, the part that maps the zero page) whenever a setuid program is run. This patch is enough to defeat the pulseaudio-based trick which was used to gain access to a zero-mapped page.

This change is not enough for some users, who have requested the ability to turn off the personality feature altogether. The ability to run binaries from 386-based Unix systems just lacks the importance it had in, say, 1995, so some question whether the personality feature makes any sense given its costs. Linus answered:

We could probably get rid of that idiotic feature. It's simply not important enough any more. Does anybody really care? At the same time, over years we've grown _other_ personality flags, and some of them are still relevant.

In particular, it seems that the ability to disable address-space randomization (which is a personality feature) is useful in a number of situations. So personality() is likely to stay, but its zero-page mapping feature might go away.

Yet another link in the chain of failure is the removal of the null-pointer check by the compiler. This check would have stopped the attack, but GCC optimized it out on the theory that the pointer could not (by virtue of already having been dereferenced) be NULL. GCC (naturally) has a flag which disables that particular optimization; so, from now on, kernels will, by default, be compiled with the -fno-delete-null-pointer-checks flag. Given that NULL might truly be a valid pointer value in the kernel, it probably makes sense to disable this particular optimization indefinitely.

One could well argue, though, that while all of the above changes are good, they also partly miss the point: a quality kernel would not be dereferencing NULL pointers in the first place. It's those dereferences which are the real bug, so they should really be the place where the problem is fixed. There is some interesting history here, though, in that kernel developers have often been advised to omit checks for NULL pointers. In particular, code like:

    BUG_ON(some_pointer == NULL);
    /* dereference some_pointer */

has often seen the BUG_ON() line removed with a comment like:

If we dereference NULL then the kernel will display basically the same information as would a BUG, and it takes the same action. So adding a BUG_ON here really doesn't gain us anything.

This reasoning is based on the idea that dereferencing a NULL pointer will cause a kernel oops. On its face, it makes sense: if the hardware will detect a NULL-pointer dereference, there is little point in adding the overhead of a software check too. But that reasoning is demonstrably faulty, as shown by this exploit. There are even legitimate reasons for mapping page zero, so it will never be true that a NULL pointer is necessarily invalid. One assumes that the relevant developers understand this now, but there may be a lot of places in the kernel where necessary pointer checks were removed from the code.

Most of the NULL pointer problems in the kernel are probably just oversights, though. Most of those, in turn, are not exploitable; if there is no way to cause the kernel to actually encounter a NULL pointer in the relevant code, the lack of a check does not change anything. Still, it would be nice to fix all of those up.

One way of finding these problems may be the Smatch static analysis tool. Smatch went quiet for some years, but it appears that Dan Carpenter is working on it again; he recently posted a NULL pointer bug that Smatch found for him. If Smatch could be turned into a general-purpose tool that could find this sort of problem, the result should be a more secure kernel. It is unfortunate that checkers like this do not seem to attract very many interested developers; free software is very much behind the state of the art in this area and it hurts us.

Another approach is being taken by Julia Lawall, who has put together a Coccinelle "semantic patch" to find and fix check-after-dereference bugs like the one found in the TUN driver. A series of patches (example) has been posted to fix a number of these bugs. Cases where a pointer is checked after the first dereference are probably a small subset of all the NULL pointer problems in the kernel, but each one indicates a situation where the programmer thought that a NULL pointer was possible and problematic. So they are all certainly worth fixing.

All told, the posting of this exploit has served as a sort of wakeup call for the kernel community; it will, with luck, result in the cleaning up of a lot of code and the closing of a number of security problems. Brad Spengler, the author of the exploit, is clearly hoping for a little more, though: he has often expressed concerns that serious kernel security bugs are silently fixed or dismissed as being denial-of-service problems at worst. Whether that will change remains to be seen; in the kernel environment, many bugs can have security implications which are not immediately obvious when the bug is fixed. So we may not see more bugs explicitly advertised as security issues, but, with luck, we will see more bugs fixed.

Index entries for this article
Kernel	Security/Vulnerabilities
Security	Linux kernel
Security	Vulnerabilities/Privilege escalation

Fun with NULL pointers, part 2

Posted Jul 21, 2009 21:31 UTC (Tue) by spender (guest, #23067) [Link] (14 responses)

"One assumes that the relevant developers understand this now, but there may be a lot of places in the kernel where necessary pointer checks were removed from the code."

I wouldn't be so sure: those mails you link to are from 2008; we had gone through the whole "NULL pointer dereferences can be exploitable" thing back in 2007 when I released my first exploit.

But I hope you're right ;)

-Brad

Fun with NULL pointers, part 2

Posted Jul 21, 2009 21:50 UTC (Tue) by ebiederm (subscriber, #35028) [Link] (13 responses)

No. BUG_ON(ptr == NULL) is generally a bad idea because it is fixing
a problem in the wrong place. Leading to maintenance problems etc.

In some common places like sysfs where the are a lot of users and mistakes are common because of the volume of use of the interface it makes sense.

In general it is better to simply make it impossible for a NULL pointer
to reach someplace. Or to handle that NULL pointer dereference.

Checking for the impossible just makes the code harder to understand.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 21:53 UTC (Tue) by eparis (guest, #33060) [Link] (7 responses)

Clearly you never make mistakes, but when Herbert Xu does, I think it proves that any of us mere mortals can. Writing code defensively is clearly a good idea and one I think all of us should embrace.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 22:11 UTC (Tue) by ebiederm (subscriber, #35028) [Link] (6 responses)

As best I can tell Herbert badly resolved a merge between my changes
to tun.c and his changes to tun.c, and simply had not realized how much
I had changed tun.c.

As for writing code defensively. Doing that routinely doubles your number of bugs.

You have the wrong input should never have been generated in the first place.

You have the wrong BUG_ON in the function.

Bugs per line of code are roughly constant. Therefore you want a design
that uses fewer lines of code.

What made all of this interesting is the fact someone found a way around the defensive measure of mmap_min_addr.

Eric

Fun with NULL pointers, part 2

Posted Jul 22, 2009 7:01 UTC (Wed) by PaXTeam (guest, #24616) [Link]

> What made all of this interesting is the fact someone found a way around the defensive measure of mmap_min_addr.

It's not at all interesting because any feature which is *designed* to be circumventible will be, well, circumvented. From day one it was obvious and publicly discussed that exploit writers would go after the needed capability (that assuming they don't have better bugs to exploit not needing it ;).

Fun with NULL pointers, part 2

Posted Jul 22, 2009 9:10 UTC (Wed) by epa (subscriber, #39769) [Link] (3 responses)

You have the wrong input should never have been generated in the first place.
You have the wrong BUG_ON in the function.
Bugs per line of code are roughly constant. Therefore you want a design that uses fewer lines of code.

Adding the null pointer check does not cause the first bug you mention, although it may help to reveal it. The second item is a coding style issue, not a bug (and your logic is circular; you try to show that adding the BUG_ON is a bad idea, so you cannot assume that in your argument). The third assertion needs some evidence, and even if true cannot be used to show that adding any particular line of code introduces a bug, any more than you could use it to justify removing one particular line of code from the middle of your program. (It is sometimes the case that adding overzealous error checking introduces a bug, but that needs to be shown in each individual case, and you haven't shown it here.)

Are you an idiot or just play one or TV?

Posted Jul 23, 2009 5:35 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

It's not even funny: you are discussing one particular function in it's current incarnation where Eric discusses the whole approach.

Adding the null pointer check does not cause the first bug you mention, although it may help to reveal it.

But it can crash the perfectly working system tomorrow after some kind of refactoring.

The second item is a coding style issue, not a bug (and your logic is circular; you try to show that adding the BUG_ON is a bad idea, so you cannot assume that in your argument).

Today human mistakes are promoted to "code style issue"? News to me...

The third assertion needs some evidence, and even if true cannot be used to show that adding any particular line of code introduces a bug

There were a lot of investigations. Number of bugs per line of code does not change too much when you switch languages, paradigms, etc. It reduces if you do a lot of rafactoring (like kernel developers do) and srinks when you are using different autodetection tools (which are they using too), this change is pretty limited. It'll be somewhat strange to see that BUG_ON is somehow 100 times less buggy then the rest of the code.

any more than you could use it to justify removing one particular line of code from the middle of your program.

Yup. Justification is the same as for the rest of code: if you function will continue to work - why have this line in first place? Redundant comments can be kept around, but actual line of code? Please - a lot of stuff kernel developers are doing is this exact "removal lines of code from the middle of your program".

(It is sometimes the case that adding overzealous error checking introduces a bug, but that needs to be shown in each individual case, and you haven't shown it here.)

And now sound almost adequately. If sometimes overzealous error checking introduces a bug then it's good idea to keep it out of program: asserts, coverity, etc. These things can detect bug but they can not introduce new bug (false positive is not a bug as it's not compiled in actual kernel).

Defensive programming makes sense when you suspect other code has lower quality (userspace, drivers for obscure devices, etc), but to try to protect code from code of equal quality with so-called "defensive programming" is to increase number of bugs!

Are you an idiot or just play one or TV?

Posted Jul 23, 2009 11:15 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)

Why yes, I was talking about this particular case. Doing a null pointer check earlier (whether with BUG_ON(x==NULL) or some other test) would have avoided a local root exploit in this one case. That suggests that the whole approach is not completely useless, even if in other cases it doesn't help.

I am not arguing for 'defensive programming' as sometimes practised, where you cruft up the code with workarounds to silently return or do something random if a caller is buggy. That tends to hide bugs and thus cause buggier software in the long run. (Although even this kind of defensive programming can occasionally be useful, for example in safety-critical systems where the code must keep running no matter what.)

There were a lot of investigations. Number of bugs per line of code does not change too much when you switch languages, paradigms, etc.

That is quite true but you cannot reason as follows: on average, more lines of code means more bugs, therefore an average 101-line program is buggier than an average 100-line program, therefore adding *this particular line of code* to *this particular* program is likely to cause a bug! Of course it depends on the individual case! Some kinds of code such as assertions (runtime checking) and type annotation (compile-time checking) exist only to catch bugs, and it doesn't make sense to say that adding them will tend to make code buggier just based on a general rule about lines of code.

An incorrect BUG_ON(x==NULL) certainly can introduce a bug into a program, but then an incorrect anything can introduce a bug.

+1

Posted Jul 23, 2009 14:57 UTC (Thu) by xoddam (subscriber, #2322) [Link]

+1, for sanity.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 12:27 UTC (Thu) by ortalo (guest, #4654) [Link]

Don't you think a static checker could benefit from BUG_ON()-like information (or ASSERT() or whatever you want to name it) by using the given check/assumption for code analysis.
In this case, the additional line is more an annotation (like those many static checkers introduce) than source code.
As a developper you may legitimately want that such annotations be placed elsewhere than in the middle of code, but then where would you like to place them?

Side note: I really do not buy the constant number_of_fault/loc assumption. But well... that's another debate; and anyway IMHO your argument with respect to code readability and maintanability is totally valid.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 23:32 UTC (Tue) by johill (subscriber, #25196) [Link] (3 responses)

Besides, depending on the platform the compiler could elide the BUG_ON(ptr == NULL) anyway, no? It's often just

if (unlikely(ptr == NULL))
    out_of_line_bug();

or something like that?

Fun with NULL pointers, part 2

Posted Jul 22, 2009 4:12 UTC (Wed) by gmaxwell (guest, #30048) [Link]

Yes, if the compiler can prove that its impossible for the condition to be true it can and will omit the check. Sadly because the compiler's visibility is so narrow it usually can't do this except in the most obvious of situations.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 9:53 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

Well, for most architectures the BUG() functions are __attribute__((noreturn)) or something equivalent. So the compiler cannot assume that the execution flow reaches to after the if (!ptr) check at all, if ptr is NULL. So it should not optimize it out in that case.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 11:23 UTC (Wed) by johill (subscriber, #25196) [Link]

Really? I thought the out_of_line_bug() in my example would have that attribute, but not the BUG() itself. But powerpc for example is different though, it just inserts a trap (twi or such).

Fun with NULL pointers, part 2

Posted Aug 11, 2009 18:50 UTC (Tue) by bdonlan (guest, #51264) [Link]

BUG_ON doesn't fix a problem - it complains about one. It's a bit like
assert(!condition) in userspace - if you actually hit it, the program's
already doomed. The Linux kernel tries to keep going a bit, just because it's
difficult to get debug information after a panic, but this is _not_ fixing a
problem, and a tripped BUG_ON is _by definition_ a bug.

As such, the only disadvantage to extra BUG_ONs is the run-time overhead from
them.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 21:35 UTC (Tue) by i3839 (guest, #31386) [Link] (7 responses)

> has often seen the BUG_ON() line removed with a comment like:
>
> If we dereference NULL then the kernel will display basically the same
> information as would a BUG, and it takes the same action. So adding a
> BUG_ON here really doesn't gain us anything.
>
> This reasoning is based on the idea that dereferencing a NULL pointer
> will cause a kernel oops.

For the Smack example, if you read the code it's clear that the BUG_ON check makes no sense. It is in a static function and all callers either check or dereference the pointer. More generically, adding BUG_ONs everywhere for pointers that shouldn't be NULL is the wrong approach: Either make sure that all callers don't pass in a NULL pointer, or if that isn't possible, add a check and return with an error.

For the other example the kernel better make that it notices it when it's NULL, one way or the other.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 9:15 UTC (Wed) by epa (subscriber, #39769) [Link] (6 responses)

Either make sure that all callers don't pass in a NULL pointer, or if that isn't possible, add a check and return with an error.

Suppose you take the first option; you check all the code that could call your routine and make sure NULL is never passed. Then there is no need for a runtime check, right? In theory this is true but I know I make mistakes, so even after checking all the code I would still put in the check for this 'impossible condition' to stop a small oversight causing a big problem.

It's the same argument for any assertion you put in your code: since they are checking for something that can never ever happen, why bother with them at all? All programmers make mistakes, but the better ones are aware of this fact and take appropriate measures.

(This is not an argument for sloppy 'defensive programming' as sometimes practised, where you insert code to silently return the wrong answer or do something random if the inputs aren't as you expect. You want a macro like BUG_ON or assert() which loudly announces the problem, not hides it.)

Fun with NULL pointers, part 2

Posted Jul 22, 2009 10:19 UTC (Wed) by i3839 (guest, #31386) [Link] (1 responses)

Making sure that a pointer is never NULL is easy when it is passed to a static function. For public functions it is harder and more checks are needed, but if all callers are limited to one directory it's still not hard.

If you pass along a pointer and say "hey, do something with this", in general that code did something with it before, otherwise the second function could be called directly.

For pointers within structures it's much harder though, then it helps to have clean code with clear lifetime rules. If it's messy enough you can make things more messy by adding NULL checks everywhere. If this adds more unexpected error paths then it's not a good way forwards though. If it's unclear if something can be NULL or not then in the long term it's much better to clean up the code. Like assertions BUG_ON is good while developing a new piece of code to make sure all assumptions hold. When the code gets more into shape and becomes more stable there should be not much need for such checks anymore.

An exception is low level library like code which can be used by many users and which wants to prevent them making the same bloody mistakes. It's also a good way to prevent corruptions from spreading around.

The further away the code giving the input is from the code using it the more checking is necessary.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 11:14 UTC (Wed) by epa (subscriber, #39769) [Link]

Making sure that a pointer is never NULL is easy when it is passed to a static function. For public functions it is harder and more checks are needed, but if all callers are limited to one directory it's still not hard.

It's not hard, but I know that I am able to make mistakes on even the simplest tasks. I also know that I am not quite the world's worst programmer or the most careless, so if I can screw up, others can too. That's why even after you have carefully gone through the code and made sure that 'obviously', NULL can never be passed, it is still a good idea to include a check just in case, to stop a minor oversight becoming a major bug (such as the root exploit discussed in the article).

The best option is for the compiler to statically ensure non-null pointers; tools like Splint let you be certain that getting null is impossible.

If it's unclear if something can be NULL or not then in the long term it's much better to clean up the code.

Absolutely agreed. If it's unclear whether NULL is allowed, then BUG_ON(x==NULL) is not the right way, except as a temporary debugging aid. You need to check the code and make a decision - can NULL ever legally be passed here? If the answer is no, then you should make sure all the callers are behaving properly - but then when you've finished put in the BUG_ON check anyway, because even Linus makes mistakes.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 13:25 UTC (Wed) by dgm (subscriber, #49227) [Link] (3 responses)

Looks like a variation of the "end to end principle" (http://en.wikipedia.org/wiki/End-to-end_principle). No matter that callers check for NULL, the callee has still to check it, again.

Also, as someone has pointed out, NULL is just *one* invalid pointer value (even if a common one). The kernel should better be testing to prevent pointers to user space, except when needed.

I don't know enough Linux internals. Should it be possible to make those tests with some memory protection trickery?

Fun with NULL pointers, part 2

Posted Jul 22, 2009 16:03 UTC (Wed) by brianomahoney (guest, #6206) [Link] (2 responses)

Add more checks of all types, we all make mistakes is pure nonsense, it will make the kernel slow and fat, with no real improvement in reliability. it is the mark of a confused and poor programmer who has no sense of design and debugging kernel code.

This bug happened because the author wrote patently idiotic code, using a pointer and THEN checking it. This code, and all others like it need to be fixed so it actually does what it is intended to do. We do not need the kernel full of ah, well, but checks and other kruft.

The secondary, and very worrying thing is GCC silently dropping the check, we do not need optimizations like that without a STRONG warning, but I guess the motivations of the GCC developers are different here, and they really cannot win, there is constant pressure to improve compiled code and not to introduce more baby-minding warnings.

Perhaps COVERTY can help here, but we need more and better analysis, and focused bug-fixing bu good developers not the injection of confused well meaning tests.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 23:26 UTC (Wed) by i3839 (guest, #31386) [Link]

> This bug happened because the author wrote patently idiotic code, using a pointer and THEN checking it.

My understanding was that two people worked on the same code and that this was a result of a bad merge, though I could be confusing this with another case.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 8:09 UTC (Thu) by dgm (subscriber, #49227) [Link]

Except that the kernel is not designed, it evolves. Linus has pointed it out several times. At most, some features are initially designed, but from there on, it's all evolution.

Maybe you write perfectly designed code that works wonderfully and will never need an update, fix or improvement. I centainly don't. I make mistakes, that's why my code is full of assertions, and I try to manage impossible situations in a sane way, just in case.

With regards to fat and slow... Any check can be made optional, by simply putting it between a pair of #ifdefs. Maybe somebody will apreciate a more rugged, even if slower kernel for certain servers that are exposed to the outside World. An important point would be to measure how much slower that kernel would be.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 21:36 UTC (Tue) by christian.convey (guest, #39159) [Link] (6 responses)

Doesn't Coverity already scan the kernel periodically for this kind of thing? (scan.coverity.com)

If so, how was this missed? Did one someone intentionally tell Coverity to ignore the warning about checking for a NULL *after* referencing it?

Fun with NULL pointers, part 2

Posted Jul 22, 2009 8:47 UTC (Wed) by msmeissn (subscriber, #13641) [Link] (5 responses)

Well, Coverity scans the kernel.

They issued a press release that they found the issue (REVERSE_INULL I
would guess).

Just someone needs to read and investigate all the reports, and
its quite an amount and not so funny work.

Ciao, Marcus

Fun with NULL pointers, part 2

Posted Jul 22, 2009 12:13 UTC (Wed) by nick.lowe (guest, #54609) [Link] (4 responses)

http://blog.coverity.com/posts/general/would-you-like-to-...

Fun with NULL pointers, part 2

Posted Jul 22, 2009 12:51 UTC (Wed) by spender (guest, #23067) [Link] (3 responses)

Now that the blog was posted, I can mention that the tun.c bug was found by Coverity months ago and reported, but it went unfixed.

Couple that with the information from this study:
http://www.usenix.org/events/usenix09/tech/full_papers/gu...

that 40% of the issues reported by the static checkers aren't being triaged.

-Brad

Fun with NULL pointers, part 2

Posted Jul 22, 2009 20:52 UTC (Wed) by hmh (subscriber, #3838) [Link] (2 responses)

Is the information that the checker produces actually getting to the developers that can do something about it?

That seems to be a valid question...

Fun with NULL pointers, part 2

Posted Jul 22, 2009 21:04 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

this is actually a very interesting presentation. videos are available at
http://www.usenix.org/events/usenix09/tech/ (I believe they are available to everyone, not just usenix members, please let me know if that is not the case)

there are a number of trends (not very surprising in retrospect)

if you have a false positive, the chances of anyone paying attention to further reports in that section of the code drop drasticly.

if you have a maintainer who looks at one report, the chances of them dealing with all of them in that area go up significantly

if a fix is produced as a result of a report, the chances of all the reports for a given area being looked at and fixed go up drasticly

if there is not an active maintainer of that section of code the chances of the reports being looked at go down drasticly.

a large percentage of real bugs and vunerabilities have had reports on that section of the code (note that this is _not_ the same thing as saying that a large percentage of reports have been found to cause real bugs and vunerabilities)

there is a significant push from kernel developers to avoid rushing to clean up the code to silence the warnings, there is a large enough correlation between these reports and more significant bugs that many of them want the entire section where reports are found to be examined, there is a significant chance that there is a more significant and subtle bug lurking in that area.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 2:36 UTC (Thu) by spender (guest, #23067) [Link]

The videos require being a USENIX member.

-Brad

Incorrect statements:

Posted Jul 21, 2009 21:37 UTC (Tue) by eparis (guest, #33060) [Link] (3 responses)

"One obvious problem is that, when the security module mechanism is configured into the kernel, the administrator-specified limits on the lowest valid user-space virtual address are ignored."

WRONG. Flat wrong. There are differences in what is/was required when using SELinux and not using SELinux, but it was never ignored.

"The mainline now has a patch which causes the map_min_addr sysctl knob to always be in effect; this patch has also been put into the 2.6.27.27 and 2.6.30.2 updates"

This is true, but wrong in context. The knob causes mmap_min_addr to be applied with !CONFIG_SECURITY. But the whole paragraph was about having it set....

Incorrect statements:

Posted Jul 21, 2009 22:06 UTC (Tue) by corbet (editor, #1) [Link] (2 responses)

Hmm, I misread the patch and blew it. Not the first time. The text has been tweaked somewhat and is hopefully less fictional now; sorry for the confusion.

Incorrect statements:

Posted Jul 21, 2009 23:12 UTC (Tue) by eparis (guest, #33060) [Link] (1 responses)

Less fictional (but I still wouldn't call it perfect) *smile*

I did try to specifically address the issue of the differences between SELinux and non-SELinux mmap_min_addr use in a blog I created today.

http://eparis.livejournal.com/606.html

From the point of view of an authenticated and logged in user SELinux is doing a worse job and I clearly want to fix this. From the point of view of a remote attack or in light of people who run dumb things (run wine), SELinux was taking a better approach.

I'm hoping to fix those dumb things.

Incorrect statements:

Posted Jul 23, 2009 2:45 UTC (Thu) by spender (guest, #23067) [Link]

We have a word for those "dumb things" that grant more privilege than the system normally would: "vulnerability." So where's the CVE?

-Brad

Fun with NULL pointers, part 2

Posted Jul 21, 2009 22:04 UTC (Tue) by mjthayer (guest, #39183) [Link] (5 responses)

Presumably if a NULL pointer can be a valid pointer under certain circumstances then checks for NULL to determine invalid pointers would be wrong under the same circumstances, and the right thing would be to avoid using pointers which have not been initialised to their final value. Note that gcc *can* spot and warn about that if the pointer is not initialised to NULL first.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 22:18 UTC (Tue) by sourcejedi (guest, #45153) [Link] (4 responses)

Actually, I don't think it is valid for kernel code to dereference NULL. The reason being, it would be a __user address, which should be accessed using copy_from_user() or similar.

Fun with NULL pointers, part 2

Posted Jul 21, 2009 23:01 UTC (Tue) by mjthayer (guest, #39183) [Link] (3 responses)

That speaks for checking pointers against the user-kernel boundary rather than against NULL. Slightly less efficient I know, but more effective defensively.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 11:50 UTC (Wed) by etienne_lorrain@yahoo.fr (guest, #38022) [Link] (2 responses)

And the i386 has a very nice assembly instruction for that, which perfectly fits: "bound".
The nice thing is that it is very short, and never mess with the jump instruction prediction subsystem because if not inside boundaries, an exception is generated.
I just know one single software using that assembly instruction, and it works as intended.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 2:20 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (1 responses)

Not really all that nice:

Intel's designers added the bound instruction to allow a quick check of the range of a value in a register. This is useful in Pascal, for example, which checking array bounds validity and when checking to see if a subrange integer is within an allowable range. There are two problems with this instruction, however. On 80486 and Pentium/586 processors, the bound instruction is generally slower than the sequence of instructions it would replace:
     cmp     reg, LowerBound
     jl      OutOfBounds
     cmp     reg, UpperBound
     jg      OutOfBounds
On the 80486 and Pentium/586 chips, the sequence above only requires four clock cycles assuming you can use the immediate addressing mode and the branches are not taken; the bound instruction requires 7-8 clock cycles under similar circumstances and also assuming the memory operands are in the cache. A second problem with the bound instruction is that it executes an int 5 if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5 interrupt handler routine to print the screen. Therefore, if you execute a bound instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5 handler with one of your own, pressing the PrtSc key will transfer control to your bound instruction handler. Although there are ways around this problem, most people don't bother since the bound instruction is so slow.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 8:54 UTC (Thu) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

Nowadays, if you are not in a tight loop written in assembly, you no more count the number of cycles of instructions but the amount of time it takes to load it into the layer 1 memory cache, and the time to reload the previous cache line after executing your instruction, it is basically proportional to the size of the instruction.
The two cmp solution needs 16 bytes (in protected mode) if the out-of-bound handler is within 256 bytes of the test, and 32 bytes if not: that is a complete cache line.
The bound solution needs 8 bytes, mostly because it does not encode the out-of-bound address handler.
The difference loading the other 24 bytes is a lot more significant than the 4 cycles difference.
Even the failed branch prediction you will probably get is more important - even the fact that you have polluted the branch prediction cache is probably more important.
The default INT5 screen print handler is not accessible under Linux, BIOS is not mapped and the APIC is configured differently, if I remember well you have a SIGBUS exception in user mode and something as easy to trap/abort in kernel mode.

Interesting blog

Posted Jul 21, 2009 23:42 UTC (Tue) by jamesmrh (guest, #31622) [Link] (1 responses)

Btw, someone has started posting security analysis of bugs which have been reported or fixed upstream:

http://xorl.wordpress.com/

I don't know who it is, but for anyone wanting to help in this area, please check it out.

Interesting blog

Posted Jul 22, 2009 0:09 UTC (Wed) by spender (guest, #23067) [Link]

I just posted a demo exploit for the SGI GRU vulnerability mentioned on the blog:
http://grsecurity.net/~spender/exploit_demo.c
Total development time: 1 hour.

-Brad

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:11 UTC (Wed) by bartoldeman (guest, #4205) [Link] (11 responses)

The null page is only really *needed* for vm86() -- for 16-bit protected mode code an LDT with a constant non-zero base offset for all segments can be used and a pure 16-bit protected mode program (I think many use vm86() though!) that runs in Wine does not even know it.

But for vm86() there is no easy and fast solution: every program that uses it (Wine, DOSEMU, X when it's been changed to run without root privs, a few others) will have to use CPU emulation. Now mapping the zero page needs CAP_SYS_RAWIO but that by itself is a much higher capability. That is, CAP_SYS_RAWIO allows direct I/O to hardware, surely giving root-like capabilities, even if the kernel is watertight. That way, one could argue for CAP_MAY_MAP_ZERO or something like that.

A technical solution would be Brad's UDEREF, or the Linux 2.0 kernel method of using segment limits, where user space needs to be accessed via FS and there is no direct kernel address space via CS/DS/ES below 3GB (keep in mind, vm86() is i386 only) but those segment limits aren't so popular anymore and I'm not sure how they interact with modern virtualization environments.

Another way would be to sandbox vm86() completely so it has the zero page,
which can be alias-mapped elsewhere, accessible -- after all inside vm86() 16-bit code you can't do system calls. But that would mean to change the page protections on every vm86() call, exit, and also on every interrupt (unmap 0 as soon as possible as an IRQ happens to be effective). Too complex, I think.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:26 UTC (Wed) by bartoldeman (guest, #4205) [Link]

Correction about Wine:
http://www.nabble.com/Re%3A-vm86-mode-is-not-supported-p2...

Alexandre Julliard:
Win16 applications don't need vm86, and work just fine on 64-bit.

OK, that means that with some LDT tricks Win16 apps may work in Wine. I'm not 100% sure but I know the LDT offset trick works for DPMI applications in DOSEMU which I tested a bit last year. Wine can also execute some DOS apps, and those won't be possible in i386 without zero mapping unless Wine acquires a CPU emulator or someone can think of another solution.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:53 UTC (Wed) by jreiser (subscriber, #11027) [Link]

The null page is only really *needed* for vm86()...That is just not true [except of course for arguments regarding universality, or emulating every access to page 0 via SIGSEGV, etc.] I have written programs where having page 0 is essential (for elegance, or ease of maintenance, or time performance, or space performance, etc.), and I object to those programs becoming non-functional.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:54 UTC (Wed) by spender (guest, #23067) [Link] (8 responses)

UDEREF is a PaX feature, developed entirely by the PaX Team -- not by me.

You are correct though that the real problem is any invalid userland access in general. There are many bugs that have existed and will exist that involve (or can involve) userland in ways that simply preventing mappings within the first 64kb won't stop. For instance, in the exploit I developed today that I linked to in this thread, I don't have to worry about how I'll inject arbitrary code into the kernel -- all I need is to get it to execute my already-existing code in userland (allowing for all the auditing/SELinux/AppArmor/LSM disabling code to be reused easily).

With a one-byte write of of 0 (a value I don't control) to an address my technique allows me to 100% reliably control, on x86 I set the highest byte in a function pointer belonging to a module to 0, turning it into a userland address -- game over.

-Brad

Fun with NULL pointers, part 2

Posted Jul 22, 2009 4:54 UTC (Wed) by bojan (subscriber, #14302) [Link] (7 responses)

> You are correct though that the real problem is any invalid userland access in general.

Given the above and the fact that solutions to this type of bug exist (if I understand UDEREF correctly), why is the kernel not being patched so we don't see this ever again? Or am I being overly naive?

Yes, you are overly naive...

Posted Jul 22, 2009 5:42 UTC (Wed) by khim (subscriber, #9252) [Link] (6 responses)

Given the above and the fact that solutions to this type of bug exist (if I understand UDEREF correctly), why is the kernel not being patched so we don't see this ever again? Or am I being overly naive?

The sad truth is that UDEREF is history now. The only architecture where it was useful slowly goes away: most architectures never had segments and Intel/Amd "lost" them recently: x86-64 does not have full segments in 64-bit mode... Segments only retained one attribute: base address, nothing else - good for TLS, not enough for UDEREF...

Yes, you are overly naive...

Posted Jul 22, 2009 6:56 UTC (Wed) by PaXTeam (guest, #24616) [Link] (5 responses)

Note that the fundamental problem in all this NULL deref misery is the lack of userland/kernel virtual address space separation. UDEREF/i386 simulates it by using the IA-32 segmentation logic, but there're certainly other ways to do the same, say address space ID tags in the TLB. Unfortunately AMD had butchered the segmentation logic without providing an alternative (it's not only about security, virtualization vendors weren't that happy either).

Yes, you are overly naive...

Posted Jul 22, 2009 8:02 UTC (Wed) by gmaxwell (guest, #30048) [Link] (2 responses)

How terrible for performance would it be to make all userspace accessible memory no-execute on kernel entrance? (or otherwise achieve the same result, like making it unreachable and then faulting it back in with NX-set).

Yes, you are overly naive...

Posted Jul 22, 2009 17:25 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

this would be _very_ expensive. changing the page tables is relativly expensive (especially if you have to do a lot of them)

Yes, you are overly naive...

Posted Jul 22, 2009 18:53 UTC (Wed) by PaXTeam (guest, #24616) [Link]

well, the actual page table manipulation would not be that expensive, with some tradeoffs you can reduce it to changing a few top-level page table entries and a single TLB flush, which would be a few hundred cycles or so.

however there's more cost to this: TLB repopulation which would inevitably occur after returning to userland. that is the real expense as we're talking about up to hundreds of TLB entries on modern CPU cores, each potentially missing in the data cache and incurring hundreds of cycles.

Yes, you are overly naive...

Posted Jul 22, 2009 11:13 UTC (Wed) by ajb (subscriber, #9694) [Link] (1 responses)

According to wikipedia newer cpus have an 'address space' tag which can be used to avoid flushing the TLB on VM switches. Could also be used for kernel vs user mode? It would require cooperating with the VM monitor though. (I wish VMs could be nested - it would be nice to be able to run untrusted code on EC2 without using QEMU. ).

Although that doesn't obviously help for the cases where the kernel needs to access user mode memory; one would still have to change the permissions to access it and then change them back.

NKX-bit

Posted Jul 22, 2009 23:37 UTC (Wed) by i3839 (guest, #31386) [Link]

Seems like the execute (and perhaps some other) memory permissions should be split up into user and kernel versions, that would solve this particular problem at least. That said, if they just got used to a separate no-execute bit, it may take a really long time before they introduce a no-kernel-execute bit (or ring0, whatever).

Not NULL assertions

Posted Jul 22, 2009 10:14 UTC (Wed) by iq-0 (subscriber, #36655) [Link]

It is argued that adding assertions to your code helps you describe and optionally enforce your contract (design/programming by contract).

I agree that the default BUG_ON(..) is the wrong way to go, but removing it removes a piece of information, namely that we assume the pointer to be not null. I think that a better solution would be to introduce:

ASSERT(expr) and possibly ASSERT_NOT_NULL(ptr) or ASSERT_VALID_PTR(ptr) which also checks for error pointers. They perfectly describe what we expect from our input parameters and you with a DEBUG_ASSERTIONS configuration option you would activate these additional checks (which for production systems would not be enabled by default).

The other thing I personally encountered during some kernel programming was that it's often not clear if a pointer could be NULL at a given point. Idealistically one would always write input and output documentation for functions, but I don't think that would be realistic and it would get outdated soon (or people make the wrong assumptions because the documentation said so, but forgot to mention the out of memory error pointer).
Most tricky bits however ware the hooks one writes in the device drivers, which are called by some other parts of the kernel. There are far fewer of those and changes there are always done very carefully for there are many users of those callbacks (both calling and providing). So a changes any changes here are much more well thought through and given the care that must be taken the updating (and given the symantics change, any resulting code that has to be modified to handle the new symantics) would make the documentation update pretty minor. Idealisticly you should also be able to introduce (preferably with one simple config option) a way that a debug proxy function would be inserted or a debug hook be called before calling the actual hook function. This would probably mean that calling a hook or registering a hook should be made more explicit, which would be a good thing (especially calling a hook, registering is very frequent, but calls are much more sporadic).

This of course assumes that some people would be willing to run test-kernels which are easily 5% slower than the production versions, though they should be more secure and at least as stable as soon as the first wrinkles are ironed out.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 12:08 UTC (Wed) by MisterIO (guest, #36192) [Link]

A check for a situation where a pointer dereference is followed by a check for NULL on that pointer will be present in the next release of cppcheck(1.35).

Fun with NULL pointers, part 2

Posted Jul 22, 2009 14:57 UTC (Wed) by error27 (subscriber, #8346) [Link]

One thing that is delaying smatch is that I'm cycling through Africa right now :P. The smatch web page is out of date. I gave up trying to ssh in and fix it from internet cafes, but now I'm in South Africa maybe it's possible.

The new link for smatch is: http://repo.or.cz/w/smatch.git/

One issue with printing an error after a dereference is that some macros do needless checking.

But it's easy enough to write a script for that. I don't have electric tonight but hopefully I will be able to post a script in a two or three days. I wrote a first draft of the script yesterday and it did find a bug.

Suggestion

Posted Jul 24, 2009 1:31 UTC (Fri) by bojan (subscriber, #14302) [Link] (5 responses)

It would be great if LWN could engage Brad and PaXTeam to write an article in which they lay out what/how the kernel should be changed so that we don't see bugs like this exploited any more.

Suggestion

Posted Jul 24, 2009 8:58 UTC (Fri) by Trou.fr (subscriber, #26289) [Link] (4 responses)

I concur. Some well detailed explanation about the current shortcomings of the kernel, and how to they could possibly fixed. Without ranting ;)

Suggestion

Posted Jul 24, 2009 13:04 UTC (Fri) by corbet (editor, #1) [Link] (3 responses)

You're not the only one to think about such things...I just noticed that LinuxFR.org has posted an interview with Brad. Only possible problem is that it's in French...

Suggestion

Posted Jul 24, 2009 14:35 UTC (Fri) by spender (guest, #23067) [Link] (1 responses)

The original English version was just posted as a comment to the article:
http://linuxfr.org/2009/07/24/25761.html

-Brad

Suggestion

Posted Jul 24, 2009 14:37 UTC (Fri) by corbet (editor, #1) [Link]

Cool, that speeds the reading process for some of us. FWIW, here's a direct link to the comment.

Suggestion

Posted Jul 24, 2009 23:29 UTC (Fri) by eparis123 (guest, #59739) [Link]

An older interview with the other half of the Pax team on June:

http://www.phrack.org/issues.html?issue=66&id=2#article