|
|
Subscribe / Log in / New account

Linux 2.6.30 exploit posted

From:  spender-AT-grsecurity.net (Brad Spengler)
To:  full-disclosure-AT-lists.grok.org.uk
Subject:  Linux 2.6.30+/SELinux/RHEL5 test kernel 0day, exploiting the unexploitable
Date:  Thu, 16 Jul 2009 22:26:45 -0400
Message-ID:  <20090717022645.GA26706@grsecurity.net>
Archive‑link:  Article

Title says it all, exploit is at:
http://grsecurity.net/~spender/cheddar_bay.tgz

Everything is described and explained in the exploit.c file.
I exploit a bug that by looking at the source is unexploitable;
I defeat the null ptr dereference protection in the kernel on 
both systems with SELinux and those without.
I proceed to disable SELinux/AppArmor/LSM/auditing

Exploit works on both 32bit and 64bit kernels.

Links to videos of the exploit in action are present in the exploit 
code.

Greets to vendor-sec, 
-Brad



----- End forwarded message -----



----- End forwarded message -----
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/


(Log in to post comments)

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 13:42 UTC (Fri) by trasz (guest, #45786) [Link]

By looking at the source, this is an obvious coding error - kernel first dereferences a pointer, and after that checks whether it's NULL. Where is the compiler bug there?

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:24 UTC (Fri) by regala (guest, #15745) [Link]

no, you don't understand. this pointer should not be NULL, and having it NULL in the code executed after the check is the security matter. Thus, the exploit has first to cause this ptr to be NULL, and it has to rely on the fact that gcc wrongly moves this check away assuming it is not necessary, being done after dereferencing the pointer, which may be wrong in certain loads, or in SMP, preemptible configs.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:39 UTC (Fri) by trasz (guest, #45786) [Link]

This code wouldn't be valid in SMP or in preemptible kernel either, unless some locking was added, which would completely change the situation and would probably prevent the problematic optimization from being applied. Other possible way would be to make the variable volatile. Anyway, this is a programming error.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:56 UTC (Fri) by spender (guest, #23067) [Link]

Well, since my videos show me exploiting the bug on multiple SMP and preemptible kernels, that's false. If you would read the exploit source you'd see this.

-Brad

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:08 UTC (Fri) by trasz (guest, #45786) [Link]

Erm, let me restate what I've written above - the kernel code mentioned above wouldn't be valid in any case, including SMP and preemptible kernel, contrary to what Regala said in the comment before.

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 8:51 UTC (Sat) by Ross (guest, #4065) [Link]

There is no point in checking if it is NULL _after_ you have used it. As soon as you have dereferenced a NULL pointer you can't predict what the program will do, and there is no point in blaming the compiler about it. GCC doesn't claim to allow this and it is not required to by any of the C standards.

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 16:59 UTC (Sat) by xilun (guest, #50638) [Link]

Given this source code, gcc has all rights to remove the check, and you obviously don't understand what do and does not imply concurrency, SMP, and preemptible configs.

The real bug is allowing to map 00000000. This can never be right.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 13:18 UTC (Sun) by PaXTeam (guest, #24616) [Link]

> The real bug is allowing to map 00000000. This can never be right.

how else would you be able to run v8086 mode tasks then?

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:28 UTC (Fri) by Trou.fr (subscriber, #26289) [Link]

because if the check had not been optimized away, you wouldn't have been able to pass it to exploit the vulnerability.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 20:43 UTC (Fri) by stevenb (guest, #11536) [Link]

Then the author of the code should have made the pointer volatile, or he/she should protect it with a lock. The optimization that GCC did, appears to be valid (but IANALL).

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 17:55 UTC (Fri) by packet (guest, #5202) [Link]

There is no compiler bug here. As paragraph 6.5.3.2.4 of the C99 standard
says, "If an invalid value has been assigned to the pointer, the behavior
of the unary * operator is undefined." Undefined here means "behavior,
upon use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements"
(paragraph 3.4.3.1). Judging by the standard it seems to be legal to
optimize the null pointer guard away, as it has no side-effects in the
case of a valid pointer value. In the case of this kernel bug
this "undefined behavior" unfortunately makes the bug exploitable which
it presumably would not have been had the guard not been optimized away.

Yes, this is a compiler bug.

Posted Jul 18, 2009 14:31 UTC (Sat) by xoddam (subscriber, #2322) [Link]

Uh, the zero pointer value is not an 'invalid value' if there's readable memory at address zero. The standard declines to define correct behavior for 'invalid' pointers because there are many ways for a pointer to be 'invalid' in some sense or another and several ways for an implementation to handle them (return garbage, return zero, deliver a signal, halt)

The exploit works because in this case, null is NOT an invalid value, but the compiled code behaves as though it is (and as though the compiler knows how the dereferencing of an invalid value will be handled ... which it doesn't). So yes, it's a compiler bug.

Yes, this is a compiler bug.

Posted Jul 18, 2009 17:20 UTC (Sat) by xilun (guest, #50638) [Link]

So if you move the compiler out of the C standard respecting equation then this is a compiler bug? Makes absolutely no sense. This is a C compiler, not a "I would prefer it do that" compiler. You should better define your own langage where dereferencing a NULL pointer is NOT undefined.

The platform is defined by the compiler and the runtime. The compiler definitely has the right to consider that dereferencing a NULL pointer is something that is always undefined, because, well, the standard precisely says that. There is no story of mapping or not mapping pages at this point : NULL pointer dereferencing an undefined behavior is, and NULL pointer dereferencing an undefined behavior stays. A NULL pointer has been dereferenced, the behavior is undefined (and it's not very surprising that an undefined behavior as per a standard can be an exploitable security hole). This is as simple as that.

The bug also is in allowing to map address 0, which can't be a sane way to serve a sane purpose.

Yes, this is a compiler bug.

Posted Jul 18, 2009 22:53 UTC (Sat) by mstefani (guest, #31644) [Link]

> The bug also is in allowing to map address 0, which can't be a sane way to serve a sane purpose.
If you happen to need to run a DOS program in Wine or sometimes even a Win32 application that still uses some DOS calls then you need access to the memory at 0x00000000.
http://wiki.winehq.org/PreloaderPageZeroProblem

Yes, this is a compiler bug.

Posted Jul 19, 2009 0:07 UTC (Sun) by xilun (guest, #50638) [Link]

I could argue that supporting program written for broken systems is not a sane purpose ;)

Yes, this is a compiler bug.

Posted Jul 19, 2009 8:00 UTC (Sun) by packet (guest, #5202) [Link]

Footnote 84 of the C99 standard clearly says so: "Among the invalid
values for dereferencing a pointer by the unary * operator are a null
pointer, an address inappropriately aligned for the type of object
pointed to, and the address of an object after the end of its lifetime."
Before that sentence, only one exception is mentioned: (&*E == E) is
true, even in the case of a null pointer. IMHO the compiler can legally
do just about anything in the case of dereferencing a null pointer. The
lowest addressable object in C lives at address 1*sizeof(object). If null
was allowed as a valid pointer, there would be no pointer value to check
for invalid pointers. Sadly, this means that at least one byte (in the C
sense: a byte is a char) of the address space cannot be legally addressed
in C. If you consider this a bug, it's a language bug, but not a compiler
bug.

Linux 2.6.30 exploit posted

Posted Jul 24, 2009 15:39 UTC (Fri) by kubarebo (guest, #59791) [Link]

I think that this exploit is a cascade failure initiated by the ill-conceived gcc patch. If you look at the original gcc patch submitted by Cygnus, you'll read the following:

"If all paths to a null pointer test use the same pointer in a memory load/store, then the pointer must have a non-null value at the test. Otherwise, one of the load/stores leading to the test would have faulted."

The assumption that "load/stores [that use the null pointer] [...] would have faulted" is generally wrong.

On many common architectures, virtual memory location 0 can be mapped and used. Most "small" embedded platforms allow its use. x86 surely does, too.

I think that the optimization was relatively ill conceived. The memory load/store should not be assumed to fail unless the underlying architecture universally guarantees that, or at least the target does. In this case, the target definitely does not guarantee anything of the sort, and the optimization is simply broken. I don't think theres much more to that. As a workaround, all linux kernel builds should disable the particular optimization (if possible).

I deal daily with code where the compiled C code dereferences NULL pointers simply because the underlying architecture supports it, and not letting you do it wastes one byte/word of RAM. On chips with only 256 bytes of RAM you may not want that. On x86 in real mode, the IDT starts at address 0 and some odd tasks like copying the IDT have to start at 0.

Methinks that the NULL pointer concept should be purged from the C standard, it is really up to the developer to ensure that a pointer is valid, and using magic values to indicate invalid pointers is very much environment-dependent. It has no place in a rather platform-agnostic language standard, IMHO.

Linux 2.6.30 exploit posted

Posted Jul 24, 2009 20:44 UTC (Fri) by nix (subscriber, #2304) [Link]

GCC isn't ever going to compile code for platforms with only 256 bytes of
RAM. It's always been targetted at systems larger than that.

The C Standard disagrees that it's up to the developer to ensure a pointer
is valid: it doesn't just ban null pointers but also things like dividing
pointers by two, XORing pointers, shoving pointers off the ends of arrays
and so on. Attempting to dictate the semantics of pointers in these
situations would be catastrophic for optimization: even simple code motion
would probably have to be banned.

A simple flag (and a target flag) that says 'this compilation
specifically/this target usually has nonfaulting accesses to (void *)0' is
all that's needed, and we have that already.

Linux 2.6.30 exploit posted

Posted Apr 13, 2011 17:23 UTC (Wed) by Blaisorblade (guest, #25465) [Link]

I agree that the flag is enough.
Anyway, IIRC, by the C standard (6.5.8.5), comparing pointers to different objects gives rise to undefined behavior, while in C++ it gives an unspecified result [1]. Therefore, while it would be valid for GCC (in C) to produce a crashing program, it would be probably of little use.

[1] http://stackoverflow.com/questions/4909766/is-it-unspecif...

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 13:44 UTC (Fri) by clugstj (subscriber, #4020) [Link]

After reading his source, I've got to say that Brad needs some serious psychiatric help.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 13:58 UTC (Fri) by regala (guest, #15745) [Link]

how nice of you...

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:02 UTC (Fri) by spender (guest, #23067) [Link]

I guess you didn't watch the cheddar bay video and don't find the humor in all of it ;)

-Brad

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:57 UTC (Fri) by clugstj (subscriber, #4020) [Link]

Brad, you seem to be seething with anger toward anyone who doesn't see how awesome of a genius you are. Delighting in other's mistakes to make yourself seem better than them. I'm sorry that I don't see the Splender of Brad.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:42 UTC (Fri) by drag (guest, #31333) [Link]

Personally I don't give a shit how angry or insane he may appear to be. He did a good job finding the exploit and did a good thing being public about it.

So he deserves not only to be thanked for finding the bug, but he should be thanked for being honest about it. There is money to be made in 0-day exploits and he could of profited from it financially before going public, if he ever decided to go public.

So that is something that I can appreciate quite a bit.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 5:34 UTC (Sun) by mingo (subscriber, #31122) [Link]

So he deserves not only to be thanked for finding the bug, but he should be thanked for being honest about it. There is money to be made in 0-day exploits and he could of profited from it financially before going public, if he ever decided to go public.

The timing suggests that he noticed the fix to the NULL dereference, not the bug. He could have found the original bug in Febrary and could have gone public about it but he (like others who reviewed that code) didn't notice the (obvious in hindsight) bug.

What he did was to demonstrate that a thought-to-be-unexploitable NULL dereference kernel crash can be exploited due to a cascade of failures in other components: compiler [to a lesser degree] and SELinux [to a larger degree].

This was useful, as .31 could have been released with the fix but there was no -stable back-port tag for the fix to make it into .30. Also, perhaps more importantly in terms of practical impact, the two cascading failures in SELinux and GCC were also worth fixing and could avoid (or at least reduce the probability of) future exploits.

(This still leaves open the theoretical possibility of him having known about the original networking bug (introduced in February) and having exploited it - only going public with the exploit once the NULL dereference fix went upstream. I don't think this happened.)

So the disclosure was useful, but, to be fair to the original poster, also not fully friendly. It maximized his own gain out of it, regardless of the consequences. Posting a zero-day exploit in the middle of the summer holiday season might be seen reckless and irresponsible by someone who happened to be on vacation in that time-frame.

Regarding the suggestion of a personality disorder by the original poster, the observation sounds plausible - but indeed irrelevant as you point out. The field of finding exploits is unforgiving: you spend days, weeks, months and years reading the worst possible code people can throw out, with just a few instances of something real big being found.

In that time you don't actually modify the code you read in any public way, you don't interact with people and you don't socialize with those developers. You don't even relate to the code in any personal way - you try to find certain patterns of badness. While developers have happy users (we hope ;-) exploit finders have few if any positive feedback.

This, almost by definition, distorts the personality and creates a false sense of superiority: if only I were allowed to hack this code, I'd clearly do such a better job. And they call this Linus a genius while he allows such obvious crap. Morons!.

So yes, the somewhat childish attitude and messaging, the hatred, the self-promoting PR, the exaggeration, the sense of superiority and the narcissism are all pretty normal in that field of activity. Compounded with some inevitable level of paranoia most likely as well, and perhaps, if there's weak morals, there might also be the constant financial lure of the criminal side mixed with the fear of not risking to go too far to become a felon.

Plus such patterns draw external attacks (mixed with the emotional, defensive attitude from developers when one out of ten thousand commits per kernel cycle turns out to be seriously buggy - bringing the worst behavior out of them: initially ridiculing or downplaying the exploit writer) which creates a self-reinforcing cycle of violence that deforms the psyche.

Without sounding patronizing, IMHO those are forces strong enough to bend steel, let alone the human psyche. I think that such expoit-finding work should be done in an organized, in (perhaps government) sponsored setups, with proper safeguards and humane work conditions. It's useful to society at large and it's a petty that it's currently done in such an unstructured, random way, burning through and bending good and smart people fast.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 8:02 UTC (Sun) by dlang (guest, #313) [Link]

actually, I believe that the fix for this will be in 2.6.30.2, which is due to be released any time now (potentially over the weekend). the preliminary patches were released on thursday or friday, moments before Greg K-H left on a trip, with the expectation being that if no problems were found with them the -stable release would happen in a couple of days.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 9:08 UTC (Sun) by MisterIO (guest, #36192) [Link]

I think that if you standardize too much this kind of work, you're gonna strip most of the fun that people who search bugs feel doing it. I don't see any problem if they exaggerate, or if they're narcisists or childish. It's actually funnier if it's like this. And after all, by your reasoning, how would you classify some of Torvalds' incinerating posts? I find them funnier and more entertaining to read than their hypothetical thechnically equivalent but more formal replacement.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 13:54 UTC (Sun) by spender (guest, #23067) [Link]

If you can't laugh at yourself, others will do it for you.

A static checker could have found the bug. What makes you think a blackhat can't find bugs when they're introduced? They want to actually compromise machines that have unfixed vulnerabilities, you know. And they don't post their findings online, especially in such a nice presentation as mine.

The fact that in my free time I was able to spend 5 minutes and figure out the particular bugfix that was ignored for its security implications was in fact exploitable isn't really relevant here. I'm not the person you need to worry about (unless all you really do worry about is embarrassing public disclosure of embarrassing vulnerabilities like the SELinux one).

BTW, as that hugely embarrassing SELinux vulnerability is currently being brushed under the carpet as an "errata" I've gone ahead and submitted a CVE request for it myself. The previous do_brk() bypass of mmap_min_addr received a CVE in 2007, this case should be no different. An advisory will follow.

While I'm here, just a side-note on why I won't ever be cooperating in ways you might prefer in the future:

On June 2nd, I sent a mail in private to Jakub Jelink, discussing some problems with FORTIFY_SOURCE I encountered when evaluating its usefulness for the kernel (by doing the actual porting work, marking allocators with the appropriate attributes, and implementing a few other tricks of my own) and found it to be very poor, only 40% coverage in the kernel, basically missing everything but the most trivial of cases that didn't need protection in the first place. Specifically one of the things I mentioned was that FORTIFY_SOURCE wasn't able to determine the size of arrays within structures, and given how widely structures are used in the kernel, having proper bounds checking on their elements is pretty important (quoted from his reply):
> I have a structure in grsecurity, struct gr_arg. It looks like:
>
> +struct gr_arg {
> + struct user_acl_role_db role_db;
> + unsigned char pw[GR_PW_LEN];
> + unsigned char salt[GR_SALT_LEN];
> + unsigned char sum[GR_SHA_LEN];
> + unsigned char sp_role[GR_SPROLE_LEN];
> + struct sprole_pw *sprole_pws;
> + dev_t segv_device;
> + ino_t segv_inode;
> + uid_t segv_uid;
> + __u16 num_sprole_pws;
> + __u16 mode;
> +};
>
> I have a function, called chkpw, its declaration looks like:
> int chkpw(struct gr_arg *entry, unsigned char *salt, unsigned char *sum);
>
> within that function, I do the following:
>
> memset(entry->pw, 0, GR_PW_LEN);
>
> If I put a __builtin_object_size(entry->pw, 0/1) check above that, it's
> always -1.

Here's his reply from Jun 3rd:

"The above description is useless, you need to provide complete (though
preferrably minimal) self-contained preprocessed testcase.
I'm not going to second guess what your code looks like."

Apparently my description was so useless that the next day, on Jun 4th, what gets submitted to gcc?
http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00419.html

No credit, no reply of thanks. This combined with the attempted cover-up of the SELinux vulnerability means I'll be going back to selling vulnerabilities in any Red Hat-related technologies (exec-shield, SELinux, etc) as I used to in the past. $1000 for an exec-shield vulnerability from back in 2003 I think? (I can't seem to find the picture I took of the check with "exec-shield" in the memo line ;)) which is still not fully fixed today. Maybe it was from 2004, judging by this post where I mention doing so: http://lwn.net/Articles/112880/ It was to a legitimate purchaser, who (unfortunately for you I guess) doesn't have a policy of notifying the vendor.

PS: I don't need a lecture on ego or feeling like I can do things better than everyone else, from the very faszkalap kernel hacker who is hated among everyone for those very things.

-Brad

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 17:53 UTC (Sun) by nix (subscriber, #2304) [Link]

This combined with the attempted cover-up of the SELinux vulnerability means I'll be going back to selling vulnerabilities in any Red Hat-related technologies
I take back everything I just said about your not doing things like selling vulnerabilities, then.

(You really do only care about getting your name in lights, don't you? Actual system security obviously comes second or you wouldn't even consider selling vulns.)

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 19:26 UTC (Sun) by vonbrand (guest, #4458) [Link]

Sorry, but I have to agree that a code snippet with a rather vage description is next to useless. And the commit you have issues with could very well be "independent invention" (or, for terminally paranoids, somebody took your snippet and made it into a complete example).

So you found a collection of bugs that in total turn out to be an serious, exploitable vulnerability. Commendments, more power to you! That some pieces (which by themselves alone aren't exploitable) aren't taken too seriously was to be expected, given the above. No "sweeping under the rug" here.

Please consider that there are tens of thousands of changesets flowing into the kernel each release cycle. If a few turn out to have exploitable bugs, it is a huge success ratio. Sure, this is sadly not enough.

Also, not everybody finding and fixing a problem is able to (or even interested in) finding out if the bug was a security problem, and even much less in developing exploit code. That very few bug fixes are labeled "Security risk" is to be expected, no dark coverup to be suspected here.

Linux 2.6.30 exploit posted

Posted Jul 30, 2009 13:57 UTC (Thu) by lysse (guest, #3190) [Link]

> somebody took your snippet and made it into a complete example

Or someone did what I've done myself in the past - tersely pointed out "useless bug report is useless", but then thought "oh, but hang on, what if there *is* a problem there?" and gone digging around themselves until they realised what the issue was and fixed it.

There's always another option, and there's always another way it could have happened.

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 9:50 UTC (Mon) by makomk (guest, #51493) [Link]

The SELinux mmap_min_addr bypass vulnerability... isn't one, exactly. It's
documented behaviour of mmap_min_addr that if you're using SELinux,
mmap_min_addr has no effect and SELinux controls the minimum address.
(It's not documented in Documentation/sysctl/vm.txt though by the looks of
it. Fail.)

Now, Red Hat should set it for robustness reasons, but if they don't it's
not Linux's fault exactly.

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 12:40 UTC (Mon) by spender (guest, #23067) [Link]

Where's this documented behavior you talk about? Here's the documentation for it straight from the configuration help:

config SECURITY_DEFAULT_MMAP_MIN_ADDR
int "Low address space to protect from user allocation"
depends on SECURITY
default 0
help
This is the portion of low virtual memory which should be protected
from userspace allocation. Keeping a user from writing to low pages
can help reduce the impact of kernel NULL pointer bugs.

For most ia64, ppc64 and x86 users with lots of address space
a value of 65536 is reasonable and should cause no problems.
On arm and other archs it should not be higher than 32768.
Programs which use vm86 functionality would either need additional
permissions from either the LSM or the capabilities module or have
this protection disabled.

This value can be changed after boot using the
/proc/sys/vm/mmap_min_addr tunable.

Distros bother to set the /proc/sys/vm/mmap_min_addr. It mattered before when mmap_min_addr was bypassed via do_brk(). It matters now that everyone by default can bypass mmap_min_addr simply from having SELinux enabled.

-Brad

Linux 2.6.30 exploit posted

Posted Jul 22, 2009 10:00 UTC (Wed) by ortalo (guest, #4654) [Link]

Hey... I have rarely read such an interesting comment, especially in association with a real security failure.

First, thanks for the nice wrap up.
Second, as I am involved in security-related teaching activities, would you eventually allow me to present your text to my students for commenting?

Finally, let me express some additional concerns.
Governement-funded or organized vulnerability research may actually already be occuring but not leading to security improvements: think of military-funded organizations or simple selfish (and commercially-compatible) self-protection of big players. I wonder how we could guarantee that such organizations do contribute to overall security. But I totally agree with you that such organized research is still too rare; hence we still rely a lot too much on individual achievements in this area.
Then, there is a deeper question: don't we feel the need for technical vulnerability research because we do not put enough efforts on providing security guarantees (or mechanisms, or properties) in our systems? (And yes, I know I speak to an audience who already certainly does much more than any other one in this area - I would probably not even openly express this concern if I did not know that.)

Linux 2.6.30 exploit posted

Posted Aug 2, 2009 15:30 UTC (Sun) by mingo (subscriber, #31122) [Link]

Second, as I am involved in security-related teaching activities, would you eventually allow me to present your text to my students for commenting?

Sure, feel free!

Finally, let me express some additional concerns. Governement-funded or organized vulnerability research may actually already be occuring but not leading to security improvements: think of military-funded organizations or simple selfish (and commercially-compatible) self-protection of big players. I wonder how we could guarantee that such organizations do contribute to overall security. But I totally agree with you that such organized research is still too rare; hence we still rely a lot too much on individual achievements in this area. Then, there is a deeper question: don't we feel the need for technical vulnerability research because we do not put enough efforts on providing security guarantees (or mechanisms, or properties) in our systems? (And yes, I know I speak to an audience who already certainly does much more than any other one in this area - I would probably not even openly express this concern if I did not know that.)

How much effort we put into various fields is largely supply-demand driven.

Firstly, the main drive in the 'fix space' is towards problems that affect people directly.

A bug that crashes people's boxes will get prime-time attention. A missing feature that keeps people from utilizing their hardware or apps optimally too gets a fair shot and all the market forces work on them in a healthy way.

'Security issues' is not included in that 'direct space' - the ordinary user is rarely affected by security problems in a negative way.

So computer security has become a field that is largely fear-driven: with a lot of artificial fear-mongering going on, with frequent innuendo, snake-oil merchants and all the other parasitic tactics that can get the (undeserved) attention (and resources) of people who are not affected by those issues.

I think it's difficult to see where the right balance is, given how hard it is to measure the security of a given system.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 18:12 UTC (Fri) by willezurmacht (guest, #58372) [Link]

Inferiority complex right there. Drugs can help you, keep fighting the good fight. Cheddar Bay, the battle never forgotten, the deadliest of all catches. Crabs? Yeah, mang.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 17:53 UTC (Fri) by sharms (guest, #57357) [Link]

I think it was hilarious fyi, the fprintfs and ascii graphics were a thing of beauty

Linux 2.6.30 exploit posted

Posted Jul 23, 2009 18:31 UTC (Thu) by rogue@autistici.org (guest, #59770) [Link]

Good work spender

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:07 UTC (Fri) by MisterIO (guest, #36192) [Link]

Is that piece of code a copy and paste? Because it's a very strange error. For example I think I could forget about the checking(not:)), but I don't think I could put the checking immediately(there's just mask in between) after the assignment!

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:13 UTC (Fri) by MisterIO (guest, #36192) [Link]

Notice to me : Having said this means that it'll happen to me too for sure!

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:20 UTC (Fri) by MisterIO (guest, #36192) [Link]

2nd note : I need to strengthen my knowledge of english!

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:18 UTC (Fri) by mjcox@redhat.com (guest, #31775) [Link]

For tracking this is CVE-2009-1897

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:28 UTC (Fri) by Ajaxelitus (guest, #56754) [Link]

gcc makes the slightly reasonable assumption that once a pointer has been dereferenced, the successful dereference proves that it is non-NULL, so any checks for NULL afterwards is superfluous.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:30 UTC (Fri) by Ajaxelitus (guest, #56754) [Link]

It would have been better for gcc to print a 'test for NULL after use' warning than for it to silently optimize-away the code.

It's not the worst offender

Posted Jul 17, 2009 16:15 UTC (Fri) by khim (subscriber, #9252) [Link]

It would have been better for gcc to print a 'test for NULL after use' warning than for it to silently optimize-away the code.

If you think GCC is bad then think again. RVCT will happily optimize away things like "if (!this) { ... }" - because standard gurantees that this is never NULL!

It's not the worst offender

Posted Jul 17, 2009 22:32 UTC (Fri) by nix (subscriber, #2304) [Link]

A while back Robert Dewar described an extreme example of something similar here (though this involved uninitialized variables instead).

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 1:38 UTC (Sat) by kjp (guest, #39639) [Link]

Agreed. I can't believe -Wall wouldn't turn something like that on.

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 8:55 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

One of Raymond Chen's rules applies: Features do not exist by default.

If you (or the kernel developer responsible for this goof, or anyone else) want GCC to emit a diagnostic for this scenario then they need to write code to detect the scenario (which may be very tricky depending on how spread through GCC the different aspects of it are) and write an informative warning message.

If nobody has yet done this, there is no warning included in -Wall.

(Insert generic complaint about how when I was a lad we had to write our own compilers, and it were up hill both ways)

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 10:58 UTC (Mon) by muntyan (guest, #58894) [Link]

> Features do not exist by default.

Or they get removed. GCC folks break warnings by "optimizations".

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 19:17 UTC (Mon) by stevenb (guest, #11536) [Link]

Ah, so fact-based, the parent. Examples?

Linux 2.6.30 exploit posted

Posted Jul 21, 2009 5:14 UTC (Tue) by muntyan (guest, #58894) [Link]

Linux 2.6.30 exploit posted

Posted Jul 21, 2009 6:08 UTC (Tue) by stevenb (guest, #11536) [Link]

Great example. It totally justifies your quotes around "optimization" and intentionally breaks warnings.

Not.

Did you actually read the bug? Getting warnings right is hard. See http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings#pro...

Linux 2.6.30 exploit posted

Posted Jul 21, 2009 9:43 UTC (Tue) by muntyan (guest, #58894) [Link]

Did *you* read the bug report? Some quotes: "this is a regression", "It has never worked on the tree-ssa branch", "The 4.x compilers does not warn when using unset variables. The 3.x compilers did warn on this: ...". Gcc-3 produces better warnings, while gcc-4 gives you code which crashes 2% faster.

Linux 2.6.30 exploit posted

Posted Jul 30, 2009 13:59 UTC (Thu) by lysse (guest, #3190) [Link]

Apparently you both read the bug report and came away with different meanings of it. Now stop waving your willies at each other and have a nice cup of tea.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:36 UTC (Fri) by epa (subscriber, #39769) [Link]

So why is this assumption true for userspace code but unsound when running in the kernel?

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:44 UTC (Fri) by trasz (guest, #45786) [Link]

Userspace code would get SEGV signal due to null pointer dereference and probably exit. Pretty much any kernel other than Linux would panic and restart. In Linux there is this strange "Oops" mechanism inspired probably by Windows 95, which makes the kernel try to continue. I've got an impression that it would make the code exit somehow instead of continuing past the null check that got optimized away, though.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:11 UTC (Fri) by hegel (guest, #49501) [Link]

Oops mechanism doesn't try to continue either - it kills the current process most of the time (See http://www.faqs.org/docs/Linux-HOWTO/Kernel-HOWTO.html#AE...). The exploit does mmap(NULL, MAP_FIXED) - that's why it doesn't crash..

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 9:11 UTC (Sat) by Ross (guest, #4065) [Link]

Yes that's the usual response. But the compiler can apply the same logic in userspace and there are ways to make NULL dereferences not crash (installation of a SEGV signal handler, mapping something to page zero).

In any case, it's not like this would be a good idea for userspace code. If you care enough to check if a pointer is NULL, it should really be done before dereferencing it, otherwise it is too late to stop any undefined behavior. Even if you can be sure an implementation will just crash the program, what's the point of adding code afterwards which isn't reachable?

undefined behaviour

Posted Jul 17, 2009 15:06 UTC (Fri) by bluebirch (guest, #58264) [Link]

Gcc is making the reasonable assumption that the code doesn't rely on undefined behaviour. This has previously produced security bugs. E.g. when gcc started removing test like: x + n >= x (n>0) which where checking for integer overflows.

undefined behaviour

Posted Jul 17, 2009 15:55 UTC (Fri) by forthy (guest, #1525) [Link]

The question is "is that reasonable"? And what is "undefined behavior"? The GCC maintainer (the language talibans) argue that whatever the C99 standard defines not accurately (and that's pretty much all of the it ;-) is left open "undefined". I suppose that if x+n >= x holds true, than x+n really ought to be greater than x - which is not true when compiled by GCC. Same for dereferencing the NULL pointer: This is not defined in C, but being not defined does not say "it breaks". When it may not break, optimizing the test away is simply wrong. If GCC's optimizer would work a bit differently, it would reorder the access and the test - because the test causes the function to return without using the accessed field (this value is dead, and dead loads can be optimized away) - and then by the funny logic the test is both vital and can be optimized away ;-). Not.

But I've basically given up on the GCC maintainer to do reasonable things, anyway. There interpretation of standards is just upside down: A standard is a compromise between various implementers and users, so that different qualities of implementations can claim they are "standard". It's like the POSIX discussion we had some times ago here (ext4 problems) where POSIX allowed a file system to loose data and leave the filesystem in a limbo state. This is not a good implementation if it does so.

Standard writers can drive towards good implementations while still allowing bad ones: Use the word "should" instead of "shall". Like "a file system should preserve a consistent state in case of a crash" - this means "best effort", and the amount of reasonable effort is left to the implementer. A C compiler should wrap around numbers on a two's complement system, not trap, not crash. The code should honor the execution model of the underlying machine (which e.g. can dereference null pointers). And making a language defined only in vague terms is not a good idea - because writing programs in an underspecified language won't work. The "good" practice of implementing such an underspecified language is to define the terms properly and then stick to them - and hope the practice catches on, so that the next round of standardization effort can encode them with "shall" instead of "should".

undefined behaviour

Posted Jul 17, 2009 16:29 UTC (Fri) by bluebirch (guest, #58264) [Link]

I think it is safe to say that the gcc maintainers are very closer to the spirit of the C standard:

"The reason some behavior has been left undefined is to allow compilers for a wide variety of instruction set architectures to generate more efficient executable code for well-defined behavior, which was deemed important for C's primary role as a systems implementation language;" (from Wikipedia[1])

Relying on undefined behaviour means the programmer needs to fully understand the (common) implementation. In any case the code is easier to understand if it doesn't rely on this magic. E.g.:

INT_MAX - x >= n // instead of x + n > x

[1] http://en.wikipedia.org/wiki/C_(programming_language)#Undefined_behavior

undefined behaviour

Posted Jul 17, 2009 19:18 UTC (Fri) by ehabkost (subscriber, #46058) [Link]

The point is that gcc itself is relying on undefined behavior, not the program. If dereferencing a NULL pointer is undefined, gcc must NOT assume that the program will crash anyway and the check for NULL can be optimized out.

undefined behaviour

Posted Jul 17, 2009 19:38 UTC (Fri) by foom (subscriber, #14868) [Link]

Well, that's a wrong point. When the C standard says something is undefined, that means the *compiler* can do whatever it wants. GCC would be within its rights to automatically exec NetHack whenever you dereference a null pointer. :) But instead it chooses to assume that the program will crash, and optimizes the rest of the program accordingly. Which is also a perfectly valid option. "Undefined" is for the benefit of the compiler, not a constraint upon the compiler.

Actually no, the idea is slightly different...

Posted Jul 18, 2009 0:57 UTC (Sat) by khim (subscriber, #9252) [Link]

GCC would be within its rights to automatically exec NetHack whenever you dereference a null pointer. :) But instead it chooses to assume that the program will crash, and optimizes the rest of the program accordingly.

No, no, no. Nothing of the sort. Idea is different and it's somewhat simpler:
1. Behavior is undefined so program can do anything it wants. It can destroy the world for god's sake!
2. Program which can destroy the world is pretty useless so obviously people will not write such program.
3. Ergo program is withing defined behavior. Somehow. Compiler does not need to guarantee that - it's programmer's responsibility.
4. This means some other part of program checks the pointer for NULL (even if compiler has no idea which one).
5. And that means the next check is redundant and can be removed.

That's why it's so hard to undertand for the outsider the discussion which goes in cyrcles when GCC developers talk with normal users:
A. This result is totally ridiculous - fix it!
B. This is undefined behavior - fix your program. WONTFIX.
A. What do you mean "undefined behavior"? It introduces security bugs.
B. This is undefined behavior - fix your program. WONTFIX.
A. Argh. This is all just stupid: how can you even imagine such behavior? .
B. This is UNDEFINED behavior - fix your program. WONTFIX.
Ad infinitum...

GCC developer really don't care what happens to the program with undefined behavior. Not one jot. What happens - will happens. ICE, crash, whatever. The programmer must ensue his (or her) program does not contain undefined constructs - then and only then it's time to complain.

Note: not all behaviors come from C standard. Some come from other standards, some come from descussions from in mailing lists (for example if you go with C standard it becomes impossible to write multithreaded programs so there are some additiona guarantees invented by GCC developers). But if you agree that something is "undefined behavior" then the resolution WONTFIX comes automatically.

Actually no, the idea is slightly different...

Posted Jul 18, 2009 7:03 UTC (Sat) by ABCD (subscriber, #53650) [Link]

I agree with most of your post, except that I think the GCC devs do care if you get an ICE: in that case, there should have been either an informative error message, or some kind of output. As I understand it, in an ideal world (where GCC doesn't have any bugs, per the GCC devs' definition of a GCC bug) an ICE is never the proper response to any input, no matter how undefined.

Actually no, the idea is slightly different...

Posted Jul 18, 2009 10:58 UTC (Sat) by nix (subscriber, #2304) [Link]

Undefined behaviour only and precisely means 'behaviour which is not
defined by the language standard'. It is a matter of QoI, tastefulness and
portability matter whether the GCC devs choose to define this behaviour.
There are many things that ISO C considers undefined that GCC has defined:
we call them language extensions. But if GCC also doesn't define what
happens, well, it's not tested, anything goes.

(Undefined behaviour isn't a magic word. It means *exactly what it says*.)

Actually no, the idea is slightly different...

Posted Jul 19, 2009 0:12 UTC (Sun) by xilun (guest, #50638) [Link]

The compiler does not introduce security bugs. The program simply contains bugs, that triggers undefined behaviors, and undefined behaviors can very often be exploited. This is as simple as this.

Do you run all of your programs in production continuously under Valgrind? Because the way you (incorrectly) intrepret the spirit of the standard, I think you should, for your security. Memory is cheap and CPU are fast anyway.

Actually no, the idea is slightly different...

Posted Jul 19, 2009 0:36 UTC (Sun) by dlang (guest, #313) [Link]

what I would consider reasonable 'undefined' behavior for a compiler to do when a null pointer is used would be to put whatever garbage that it wants in the resulting variable. making _use_ of the resulting data is where you would run into grief (because the data is essentially random). another reasonable thing to do would be to do whatever you would do if the pointer was pointing at an address that didn't exist in the system.

deciding to overwrite the hard drive, run nethack, etc may technically qualifies as undefined, but is defiantly not reasonable.

deciding to 'optimize away' a check immediately afterwards that checks if the pointer is null (prior to the resulting variable being used) is not reasonable.

Actually no, the idea is slightly different...

Posted Jul 19, 2009 1:22 UTC (Sun) by xilun (guest, #50638) [Link]

I'm NOT calling for the compiler to intentionally generate malicious code (that would be stupid, even if still conforming). I'm saying that it does NOT have to generate extra checks by default, nor refrain from optimising out useless ones, because this would not be in the spirit of C.

"deciding to 'optimize away' a check immediately afterwards that checks if the pointer is null (prior to the resulting variable being used) is not reasonable"

In the name of what?

You could abritrarily disallow a lot of optimisations with such edict.

Everybody that actually have read the C standard know that it perfectly allow such optimisation. See 6.3.2.3. See 6.5.3.2 (note)

Refraining from making optimisation just because it could make buggy program buggier is not reasonable. Just fix the buggy program. Or in some limited cases, explicitely use the flag that the GCC maintainers kindly provide you, so that even if your program is not strictly conforming, it remains conforming given this particular compiler and compilation option.

undefined behaviour

Posted Jul 17, 2009 19:43 UTC (Fri) by bluebirch (guest, #58264) [Link]

Gcc isn't assuming the program will crash with (this) undefined behaviour. Gcc is assuming that the program doesn't get into undefined behaviour. And if it does, that is a bug in the program.

The correct behaviour is…, well, undefined. So opening a security hole is no less correct (by definition) than crashing or causing "demons to fly out of your nose".

undefined behaviour

Posted Jul 17, 2009 20:58 UTC (Fri) by stevenb (guest, #11536) [Link]

That is what you say.

When GCC makes that the default behavior, there will be others bashing GCC for not optimizing away an obvious unnecessary null-pointer check.

Whatever GCC does, there will always be folks around here and everywhere else who disagree with it. GCC bashing is just the favorite hobby of the entire FOSS developer community, it seems. Just sad...

undefined behaviour

Posted Jul 18, 2009 9:04 UTC (Sat) by Ross (guest, #4065) [Link]

That doesn't quite make sense. First, compilers are able to do whatever they like when a program does something undefined -- so in that sense they certainly rely on the ability to make the program do whatever they like when the behavior is undefined (otherwise the behavior would be specified in the standard and it would no longer be undefined). But where the logic in your statement fails is that it is presumably the program which has a need to stick to defined behavior, to well, act in a useful and predictable manner. Any program relying on some specific thing to happen or not happen when it triggers undefined behavior is going to be unhappy with the results. Even if the code is tested and the programmer can see that the compiler does "the right thing", there is no guarantee it will happen that way every time, or that other compilers (or updates to the same compiler) will produce the same effect.

So your other question is who defines what is defined. The C standard defines what things trigger undefined behavior and it is the obligation of the program to avoid them. It's like a contract between programs and the compiler. In a few places specific compilers may define behavior in some of those cases as extensions, but there is no such extension here. Any program which fails to avoid dereferencing of NULL pointers and cares about the subsequent execution of the program is buggy, and blaming the compiler for the problem is not reasonable.

undefined behaviour

Posted Jul 19, 2009 0:01 UTC (Sun) by xilun (guest, #50638) [Link]

If a compiler vendor wants to go beyond a standard, defining undefined (by the standard) behaviors, he can. If he does not want, he does not have to, no matter how much you insult him.

If a programer wants to write non portable code, by using such vendor specific extensions, he can. Such constructs indeed exists between Linux and GCC, but not in this case.

If no such extension exists and some code triggers an undefined behavior, then is just a bug in this code. It can be because of a false assumption of the programmer about what the standard says (so the programmer thinks a program is portable and correct on some aspect while it is indeed not) or because of a bug (seems to be the case here). In NO way this is a bug of the compiler. In NO way this is an indication the compiler is bad. There are more people than you suspect that value compilers that produces fast binaries, thanks of optimisations completely allowed by the standard.

As long as the compiler does not pretend to support undefined behaviors it indeed does not support, this is reasonable. This is indeed the very way to use standardized technology : do not rely on non standard features unless you are concious this is non portable and sure the way you use it is supported by the implementation.

And about NULL pointer dereferencing, the unreasonable behavior is indeed to think it can be dereferenced under special circonstances. There is no good reason to do that. The very purpose of the NULL pointer is to designate that something does NOT point to a valid object. ("If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compater unequal to a pointer to any object or function".) The standard even says that NULL is an _invalid_ value for dereferencing a pointer.

As for "A C compiler should wrap around numbers on a two's complement system", this is not in the standard, and this is indeed NOT what a lot of people think it should do when the alternative is an optimisation. This clearly can't be portable as C is not limited to 2 complement computers, and this is not needed even on 2 complement computers as you can cast to (unsigned) and then back to (int) if you really do want 2-complement behavior on such computers, with the additionnal advantage that such construct triggers future readers of the code into thinking there can be an overflow there the original programmer has thought about.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:32 UTC (Fri) by trasz (guest, #45786) [Link]

This says a lot about the code quality, btw. Not only this error would be easy to spot if someone actually looked at the code, but it would also be noticed by automated code analysis tools like Coverity, if someone actually bothered to use them.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:40 UTC (Fri) by bluebirch (guest, #58264) [Link]

From the ISC dairy:

“Why is it so fascinating? Because a source code audit of the vulnerable code would never find this vulnerability”

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:46 UTC (Fri) by trasz (guest, #45786) [Link]

Which is false, IMHO - it is readily visible that you're checking the pointer few lines below dereferencing it.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 14:53 UTC (Fri) by spender (guest, #23067) [Link]

But it would be clear from the source that the bug is not exploitable for privilege escalation (it would just cause a crash when dereferencing TUN). Say if I had a page mmaped at 0 in that case and this gcc problem didn't exist: then there would be no crash, and the function would have returned with an error (due to the !tun check). But because that check doesn't exist in the compiled code, not only does no crash occur, but I'm able to do what should be impossible from a review of the source: get arbitrary code execution in kernel context. It's all explained very well in the exploit code (80% of it is comments), if you would have read it.

-Brad

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:05 UTC (Fri) by trasz (guest, #45786) [Link]

Ok, I agree. So, to sum up, it's a programming error that normally would just make the kernel crash, but due to an compiler optimization it makes the kernel exploitable.

One thing I wonder if whether the optimization is actually invalid. What does the C standard say about program behaviour after dereferencing NULL pointer? Does it say anything?

C's notion of null is not really relevant

Posted Jul 17, 2009 15:45 UTC (Fri) by xoddam (subscriber, #2322) [Link]

It's normal, and could make sense, to succeed in dereferencing a pointer containing zero if you have readable memory mapped at address zero. Some older platforms used 'page zero' memory as cache -- it was faster (or took less object code) to reach.

So the C standard can say nothing about it. Where the C standard does mention null pointers is in saying that, when assigning (or comparing) the constant zero to a pointer, a specific value must be used that means 'null'. The semantics of this 'null' value need not be the same as the address zero (ie. it may be some special nonzero bit pattern when represented in a particular pointer of a particular type) and this allows for weird platforms where zero is a commonly used address value and there is some other special invalid address register value which can be efficiently detected or used to raise an exception.

OTOH Linux could say something about it, eg. by not permitting anyone to map anything at address zero. This might break one or two native-code emulators of ancient zero-page-using OSes but surely no-one will cry over those in 2009.

None of which is relevant to whether it's legal for gcc to remove code during optimisation that relies on common-but-not-universal behaviour. This is definitely a compiler bug.

It's possible in practice for the pointer to be zero, and for execution to continue after reading the contents of address zero, so the test should never have been removed.

C's notion of null _is_ the _only_ relevant

Posted Jul 19, 2009 0:38 UTC (Sun) by xilun (guest, #50638) [Link]

"C's notion of null is not really relevant" <-- yeeeehhhaaaaaa! let's write a C compiler, but let's redefine the langage, because nothion defined by the standard are not relevant. What a joke. Create your own langage if you want. Leave C and GCC alone.

Re read C99 again. And again. And again. Maybe then you will understand. The null pointer is not just a pointer containing zero. The null pointer is something you should not dereference. Ever. Any decent C programmer knows that. It's also known for a while now that such errors are sometimes exploitable.

A pointer containing zero is not a valid pointer supported by the C langage if it happens that the representation of a null pointer is value zero, because it would then break LOT of clauses of the standard. If you want to dereference a pointer containing zero, you must do that in assembly langage. Any other way is calling for problems.

So in the name of what can this be a compiler bug when the compiler is absolutely compliant with the langage standard on this point? You are misunderstanding what C is and what it is not. The most it could be is a feature you could request, but you do not even have to because this feature already exist (there is a flag to disable the optimisation, so _this_ particular point _is_ defined at least for the translation phase when you use the flag).

If the kernel support mapping address 0 and given that Linux only support systems where the representation of NULL is 0x0 AFAIK, they should use the flag. GCC maintainers for the C language just don't have to make the default behavior defined for every undefined behavior of the standard just because you want that, even if binaries are 5x slower. C is not Java; live with it.

C's notion of null _is_ the _only_ relevant

Posted Jul 19, 2009 11:19 UTC (Sun) by nix (subscriber, #2304) [Link]

The *attacker* mmap()ed address zero, not the kernel. I suppose there
should be an option to make NULL not all-bits-zero when inside the kernel,
but then you'd have to adjust pointers in transit from userspace and
comparisons to it would be slower and so on.

C's notion of null _is_ the _only_ relevant

Posted Jul 20, 2009 0:16 UTC (Mon) by xoddam (subscriber, #2322) [Link]

Okay. My mistake. I re-read it.

The standard does indeed say you can't rely on successfully dereferencing null.

(So if you need to read the contents of memory at zero, you are doing something nonstandard and should explicitly tell the compiler if you want it to mean something, or use assembler as you propose).

OTOH if I as a C programmer don't *need* the contents of memory at zero (ie. I'm not writing wine or DOSEmu or equivalent) but a pointer may be invalid, I must do any necessary checking *before* dereferencing a pointer.

So I must grudgingly admit that the compiler is within its rights to make my program do something -- anything -- utterly unexpected once I've made such a stupid mistake as dereferencing null.

But in general programs that do utterly unexpected stuff in any circumstances are bad practice.

So I'd sure-as-hell like a big fat WARNING if the optimiser proposes to remove an entire if statement and thereby possibly make my OS kernel behave -- in practice, not in the meta-universe of the C standard -- in unexpected, exploitable ways.

If there's no such warning from gcc, that *is* a bug. Just IMO.

C's notion of null _is_ the _only_ relevant

Posted Jul 20, 2009 6:47 UTC (Mon) by nix (subscriber, #2304) [Link]

Some undefined behaviour does require a diagnostic, but the universe of
undefined behaviour is unbounded, and determining if some things are
undefined rams you right into Rice's theorem and the halting problem.

Spotting null dereferences in the general case certainly is (although
warning when the compiler *already* spots a null dereference, as you
propose, is not hard: have you thought about the case of NULL dereferences
being brought into a function via inlining, though? I'd try a GCC using
this warning on a large template-heavy C++ codebase before considering it:
see how many FPs you get.)

C's notion of null _is_ the _only_ relevant

Posted Jul 21, 2009 6:22 UTC (Tue) by alankila (guest, #47141) [Link]

You may be underestimating how easy it is to generate dead code. Functions which are called with statically allocated objects passed via pointers will never get a NULL and thus any if (foo == NULL) check is unnecessary. A good compiler doesn't generate object code that spends time testing conditions that can't be true, so it is reasonable that it eliminates this test. Neither can it produce a warning without driving everyone crazy, because I stipulate that this is a very common situation in defensively written code.

C's notion of null _is_ the _only_ relevant

Posted Jul 21, 2009 7:09 UTC (Tue) by xoddam (subscriber, #2322) [Link]

Determining that code is dead is easy (and I heartily approve it) if the actual values can be computed at compile time. For the particular case you mention (all callers pass pointers which are known not to be null), you would probably need whole-program optimisation to determine it.

However, knowing that the program has already attempted to dereference a pointer is not quite the same as statically determining that the pointer is definitely non-NULL.

I submit that removing such a test when some possible sources of the value are not visible to the compiler is an excessive optimisation and warrants a warning.

People write defensive code for a reason.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:48 UTC (Fri) by clugstj (subscriber, #4020) [Link]

"C" says that once you dereference a null pointer, all bets are off. UNIX says that you get a SIG_SEGV signal that can't be restarted. So, the GCC optimization is for usermode code only.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:09 UTC (Fri) by bluebirch (guest, #58264) [Link]

I don't understand why the optimisation should be wrong for kernel code. It doesn't violate the "all bets are off".

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 17:11 UTC (Fri) by clugstj (subscriber, #4020) [Link]

True, but since the kernel developers expect a certain thing to happen when a null pointer is dereferenced, they were caught unaware of the effect of the optimization. (my guess)

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 21:01 UTC (Fri) by stevenb (guest, #11536) [Link]

Then kernel folks should use -fno-delete-null-pointer-checks.

Linux 2.6.30 exploit posted

Posted Jul 21, 2009 7:23 UTC (Tue) by xoddam (subscriber, #2322) [Link]

As I understand it, that option would also prevent the optimisation in the case where the pointer's value can be determined not to be NULL at compile time, eg. by having already passed such a test in all callers.

But it's *fine* to remove a test and its consequent if the result can be determined by the compiler. It's not that we don't want the compiler to optimise, it's that this particular optimisation removes a *necessary* check whose result was *not* known at compile time.

(YES, the pointer was dereferenced already so the program's behaviour is -- if the test is true -- already UNDEFINED and it's no-one's fault but the programmer's if he's eaten by a sea serpent. But the dereference does not actually prove that the pointer is not null, so IMO the compiler is not justified in removing this test without a warning).

I'm not asking the compiler to warn every time it removes dead code. I'm asking for the compiler not to assume that a pointer value is valid, and thus that tests for its validity are dead code, once it has been dereferenced.

Coders are human, not gods. Compilers are not gods either, merely tools for coders.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:05 UTC (Fri) by forthy (guest, #1525) [Link]

The behavior of dereferencing a null pointer (or any other invalid pointer) is undefined (Page 79 of the ISO/IEC 9899:1999 draft). This doesn't mean "crash" - undefined means undefined, you only can know if you define it (and then the compiler has to ensure that the implementation- The dereferencing worked, so it can't be the null pointer. This is the same rubbish argument that (x+n >= x)=true, because x+n didn't fail, and overflows are undefined by the C99 standard (but not by the execution model used on GCC's targets, which all use two's complement circular number spaces!).

Looks like some older kernels are affected too

Posted Jul 17, 2009 14:35 UTC (Fri) by dowdle (subscriber, #659) [Link]

See this video where a "test" 2.6.18 from RHEL is violated... although it isn't clear yet what if any other versions might be affected:

http://www.youtube.com/watch?v=iN1fdvktRbk

Looks like some older kernels (with backports?) are affected too

Posted Jul 17, 2009 16:20 UTC (Fri) by mjw (subscriber, #16740) [Link]

That most likely refers to a (non-product) test kernel that has the tun packet accounting backported:
https://bugzilla.redhat.com/show_bug.cgi?id=495863

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:31 UTC (Fri) by dtlin (subscriber, #36537) [Link]

$ git tag --contains 33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554
v2.6.30
v2.6.30-rc1
v2.6.30-rc2
v2.6.30-rc3
v2.6.30-rc4
v2.6.30-rc5
v2.6.30-rc6
v2.6.30-rc7
v2.6.30-rc8
v2.6.31-rc1
v2.6.31-rc2
v2.6.31-rc3
$ git tag --contains 3c8a9c63d5fd738c261bd0ceece04d9c8357ca13
v2.6.31-rc3
As far as I can tell, 2.6.30-rc1 through 2.6.31-rc2 are vulnerable. When I saw the fix on LKML last week, I was wondering if that hole is exploitable... I guess it is.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 21:30 UTC (Fri) by dtlin (subscriber, #36537) [Link]

Looks like 2.6.30.2 will also be fixed.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 15:37 UTC (Fri) by kjp (guest, #39639) [Link]

Lol:

" if SELinux is enabled, it allows pulseaudio to map at 0
UPDATE: not just that, SELinux lets any user in unconfined_t map at
0, overriding the mmap_min_addr restriction! pulseaudio is not
needed at all! Having SELinux enabled actually *WEAKENS* system
security for these kinds of exploits!
"

SELinux policy issue

Posted Jul 17, 2009 22:41 UTC (Fri) by jamesmrh (guest, #31622) [Link]

Yes, there was a mistake in the SELinux policy, which allowed the unconfined user to bypass the mmap_min_addr check, which otherwise would have been enforced if the check was enabled (many disable it to get wine etc. working, btw, google "disable mmap_min_addr"). This is being fixed in the policy.

The lesson learned here is that more careful review of policy changes needs to happen, and to ask the question as to whether the policy is capable of weakening default security.

The LSM interface is theoretically designed to only allow further restriction of access, but this is a special case, where we are applying policy to a kernel compilation option which can also have its value set via a sysctl. It's not a typical "access this resource or not?" decision.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:01 UTC (Fri) by mjw (subscriber, #16740) [Link]

Fascinating. This relies on you getting page zero mapped. Which was indeed possible because of another bug (since fixed in 2.6.31-rc3). The wonders of personalities and capabilities. Full explanation at: http://blog.cr0.org/2009/06/bypassing-linux-null-pointer....

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:25 UTC (Fri) by spender (guest, #23067) [Link]

Don't forget (as noted in the exploit) that on
machines with SELinux, due to a ridiculously
embarrassing vulnerability where both the LSM
framework and default SELinux policies are at
fault, anyone can mmap at NULL regardless of the
mmap_min_addr setting, without needing
pulseaudio or any other suid app. Just look at the
exploit.

BTW, the fact that vendor-sec had a week to
watch a video where I say I mmap at NULL on a
machine with SELinux without using pulseaudio,
and none of them thought to write a program
consisting of one line to test it...is both incredibly
sad and hilarious.

And that SELinux vulnerability surely goes back a
long time (how far exactly I haven't bothered to
check).

-Brad

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 16:57 UTC (Fri) by iabervon (subscriber, #722) [Link]

Every time a null-pointer-dereference bug comes up, I wonder why the kernel doesn't ensure on entry to the kernel from a userspace task that mmaps at address 0 that null pointers will cause a fault, and refuse to handle such a fault when it occurs within the kernel. It's not like the kernel ever needs to read from virtual address 0, even if userspace does, or that userspace cares a ton about avoiding cache misses at address 0. I think userspace processes that mmap at address 0 are actually running code that expects this to be a trap of some sort handled by the operating system, more like mmio than just ordinary RAM, or they're so old that they don't expect memory access to be faster than taking the fault on a modern system will be anyway.

NULL pointers in kernel space

Posted Jul 17, 2009 17:13 UTC (Fri) by corbet (editor, #1) [Link]

I think the only way that could be done would be to just remove the user-space page tables entirely on entry into the kernel. There's a precedent for that - the 4G/4G patches did it years ago. But it's a very expensive thing to do, to the point that people wouldn't really stand for it. Far better to just configure the kernel not to allow mappings at zero.

NULL pointers in kernel space

Posted Jul 18, 2009 1:15 UTC (Sat) by proski (subscriber, #104) [Link]

Perhaps the 4G/4G split was wrongly marketed as a way to use more memory. Had it been presented as a security measure, it would have a better chance. Many users are ready to pay a high price for security.

NULL pointers in kernel space

Posted Jul 18, 2009 9:09 UTC (Sat) by PaXTeam (guest, #24616) [Link]

given the security track record of the 4:4 split patch, i don't think you'd wanted to pay that price ;)

mmap(0,,,MAP_FIXED,,) is *useful*

Posted Jul 17, 2009 17:41 UTC (Fri) by jreiser (subscriber, #11027) [Link]

For over thirty years I have been using mmap(0,,,MAP_FIXED,,) to implement fraid, a file debugger. Invoke gdb on fraid, run with the name of the target file. fraid does open+fstat+mmap(0,.st_size,,MAP_FIXED,fd,0); then traps into gdb. That's the whole program: a dozen lines. The identity mapping from offset in the file to address in the address space of the debugger is exceedingly powerful. This is not something to give up, ever.

mmap(0,,,MAP_FIXED,,) is *useful*

Posted Jul 17, 2009 18:15 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

I don't understand -- how is a "file debugger" useful?

mmap(0,,,MAP_FIXED,,) is *useful*

Posted Jul 17, 2009 18:32 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Limit it to running only root, then. Or add a special capability.

when all you have is a hammer

Posted Jul 17, 2009 19:34 UTC (Fri) by xorbe (guest, #3165) [Link]

um seriously, using GDB to examine a binary file is just the wrong tool ... clever, but wrong!

mmap(0,,,MAP_FIXED,,) is *useful*

Posted Jul 17, 2009 19:39 UTC (Fri) by iabervon (subscriber, #722) [Link]

It's not the address space of the debugger, it's the process being debugged, right? So you wouldn't actually have any code running while the processor has a mapping in the TLB at address zero, because fraid is stopped and gdb is probing it indirectly. So you wouldn't care if any process which has a mapping at address zero has lousy performance on return from interrupt or system call.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 17:21 UTC (Fri) by kingswood (guest, #59623) [Link]

Is it a zero day exploit if the exploit is posted six days after the patch on linux.kernel?
http://lkml.org/lkml/2009/7/11/20 has the fix for the dereferencing bug.

Obviously the bad guys could be reading LKML too.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 17:35 UTC (Fri) by dtlin (subscriber, #36537) [Link]

Make that 11 days: http://lkml.org/lkml/2009/7/6/19 came earlier, and is the
commit that Linus actually pulled (through Dave Miller's net tree).

Nonsense. This is trivial stuff.

Posted Jul 17, 2009 19:47 UTC (Fri) by mikov (guest, #33179) [Link]

Is this an April Fool's joke? Is everybody insane? This is a trivial bug that is immediately visible and would be discovered by any source code audit.

The linked post from Bojan Zdrnja is utter nonsense - he appears to be profoundly confused.

I don't see how this has generated such an amount of discussion. It is as simple as that:

1. Use pointer that could be NULL
2. Check if pointer is NULL

What is here to discuss? What compiler bugs?

The exploitation of the bug may be creative, but the bug itself is OBVIOUS!!! Really, this is the first time in ages when I am disappointed by LWN. Please, somebody should edit the story.

Nonsense. This is trivial stuff.

Posted Jul 17, 2009 20:03 UTC (Fri) by spender (guest, #23067) [Link]

If you don't understand the difference between a
harmless oops and arbitrary code execution in
kernel context, or why these bugs are important,
then apparently the joke's on you.

-Brad

Nonsense. This is trivial stuff.

Posted Jul 17, 2009 20:16 UTC (Fri) by mikov (guest, #33179) [Link]

If you don't understand why using a possibly NULL pointer _before_ checking it is an obvious bug, the joke is on you. And it isn't even a joke - it is sad.

The bug itself is extremely trivial and glaringly obvious. There is no need to involve the C standard to understand it. Suggesting a "compiler bugs" for this, I am sorry I have to say it, is quite silly.

As I already said, the exploitation of the bug however is creative and impressive - my hat is off for finding the bug and exploiting it to whoever did it.

Nonsense. This is trivial stuff.

Posted Jul 17, 2009 20:34 UTC (Fri) by corbet (editor, #1) [Link]

There is an interesting aspect to this that nobody has mentioned: I have seen reviewers advising the removal of NULL pointer checks with the reasoning that, should a NULL pointer ever come about, the resulting kernel oops is just as useful as a "this pointer was null!" message. In the mean time, the overhead of the check can be avoided.

The idea, clearly, is that the memory management hardware can be counted on to do the NULL check so there's no need to do it in software too. But if that address can be mapped, that reasoning clearly does not hold. I don't doubt there are other situations like this one in the kernel code; for most systems, disallowing mappings at zero seems like a reasonable step to take.

Removing NULL checks

Posted Jul 17, 2009 21:02 UTC (Fri) by corbet (editor, #1) [Link]

I had to dig a bit, but here's an example of developers being advised that some NULL checks are not needed. In retrospect, that doesn't seem like the best advice...

Removing NULL checks

Posted Jul 17, 2009 21:06 UTC (Fri) by stevenb (guest, #11536) [Link]

Anyway, the "it appears to be a GCC bug in the end." bit in the LWN story about this is unjustified IMHO.

Removing NULL checks

Posted Jul 17, 2009 22:39 UTC (Fri) by nix (subscriber, #2304) [Link]

It would be an excessive optimization, if it wasn't that GCC has a way to
turn it off... which the kernel is not using.

Removing NULL checks

Posted Jul 17, 2009 22:56 UTC (Fri) by mikov (guest, #33179) [Link]

I don't think that disabling this option for a kernel compile is reasonable. It is not reasonable to have to check every new optimization for every new version of a compiler, assuming the optimizations themselves are valid.

What is reasonable on the other hand is to know C and to write valid C code, which doesn't break when a butterfly flaps its wings. I still can't understand why everybody is making such a big deal over such a trivial bug. It is similar to relying on the value of "(a++) + (a++)" - basic C stuff that hopefully everybody knows by now.

If this was an user app nobody would think twice about it. So, what, the kernel is an excuse for sloppy coding?? I would hope that the opposite was true...

Removing NULL checks

Posted Jul 17, 2009 23:33 UTC (Fri) by stevenb (guest, #11536) [Link]

This optimization is not even new. It was added to GCC ten years ago, see
http://gcc.gnu.org/news/null.html

Removing NULL checks

Posted Jul 17, 2009 21:44 UTC (Fri) by mikov (guest, #33179) [Link]

I think there are three completely orthogonal issues here, which seems to be confusing people:

1. The kernel is (mostly) written in C and uses a general purpose C compiler (even more than one, if you count Intel). So the kernel should follow the C rules (and the C standard), even if on the surface it seems that these rules don't always make sense in the kernel context. For example, under DOS dereferencing a NULL pointer would not generate any kind of error. That doesn't mean that it is a good practice to remove NULL pointer checks for all reads. It is still undefined behavior according to the C standard and any C compiler for DOS is free to apply the same (correct!) optimization.

I think what people don't realize is that such optimizations are not only valid, but necessary. While there are the few exceptional cases like this one where they seem to get in the way, they are useful in the majority of cases.

2. Whether the _tun_ pointer should ever be expected to be NULL in this point. I have no idea, but if it is supposed to have been checked earlier, it is reasonable not to check again and this is a simple logical bug.

3. Whether the kernel should trap NULL pointer accesses in hardware for kernel code. Again I don't know, but personally I prefer that it did, unless there is good reason not to. Is there one?

Still relying on this can be risky, for example:
int * a = NULL;
x = a[10000];

The first page may be unmapped, but we are not accessing it.

Removing NULL checks

Posted Jul 17, 2009 23:24 UTC (Fri) by nowster (subscriber, #67) [Link]

This would never have happened if the kernel were written in some modern language more recent than C. I propose rewriting the kernel in INTERCAL.

Removing NULL checks

Posted Jul 19, 2009 11:13 UTC (Sun) by eupator (guest, #44581) [Link]

I suspect you may underestimate the modernism of C or the classicism of INTERCAL.

Removing NULL checks

Posted Jul 18, 2009 19:09 UTC (Sat) by darwish07 (guest, #49520) [Link]

You have a nice memory: I remember being advised to strip a BUG_ON line since the kernel OOPS mechanism will handle it - http://lkml.org/lkml/2008/2/13/71

I'm not sure if (or how) can this be exploited though ...

Removing NULL checks

Posted Jul 20, 2009 10:03 UTC (Mon) by makomk (guest, #51493) [Link]

Hopefully, that shouldn't be exploitable. I'm pretty sure the kernel
policy is that userspace shouldn't be able to trigger BUG_ONs
deliberately, so removing one in favour of just dereferencing the null
pointer ought to be safe assuming there isn't a bug elsewhere. (Of course,
if there is, it could turn it into a security vulnerability.)

Removing NULL checks

Posted Jul 20, 2009 12:46 UTC (Mon) by spender (guest, #23067) [Link]

Andrew Morton is wrong, a BUG_ON is in no way equivalent to just dereferencing a null. What if I have my page mapped at NULL?
If that author adds one more call to smack_netlabel and forgets to check for sock->sk == NULL before calling it, instead of just causing an OOPS he's now potentially created a vulnerability which can result in arbitrary code execution.

-Brad

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 1:12 UTC (Sat) by spender (guest, #23067) [Link]

Bug that causes a trivial oops (only terminates the 'exploit' process) is not security relevant.
Bug that results in arbitrary code execution due to a gcc optimization is both clever and important.
Or are you stuck in the mindset that all bugs are equal?

People aren't involving the C standard or compiler optimizations to talk about a bug, they're talking about how something which by the appearance of the source is unexploitable can be turned exploitable.

-Brad

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 2:02 UTC (Sat) by mikov (guest, #33179) [Link]

Bug that results in arbitrary code execution due to a gcc optimization is both clever and important.

I would not qualify a bug as clever. Perhaps you mean the exploit, and yes, it is indeed very clever.

The code optimization angle is not really that important. I am sure that there are many other exploits that depend on how GCC generates its code, so this is no different. For example exploits that depend on the layout of variables in the stack. It is exactly the same thing.

In this case however people are getting confused because the explanation for the exploit depends on parts of the C language which are not so well known.

People aren't involving the C standard or compiler optimizations to talk about a bug, they're talking about how something which by the appearance of the source is unexploitable can be turned exploitable.
Nonsense. Any bug is potentially exploitable - you simply never know. No experienced developer would decide that a bug is 100% un-exploitable, especially one with an invalid pointer dereference. Of course it is not obvious how to exploit it, but it never is.

All crashing bugs are very very important.

Nonsense. This is trivial stuff.

Posted Jul 19, 2009 0:55 UTC (Sun) by xilun (guest, #50638) [Link]

A good reviewer will not rule out possibility of exploitation, because: 1) it is undefined behavior, 2) NULL pointer dereferencing had been exploited before. A good reviewer will certainly _not_ think just by reading the source that this is unexploitable.

How might we eliminate undefined behaviour?

Posted Jul 20, 2009 0:41 UTC (Mon) by xoddam (subscriber, #2322) [Link]

Question for language lawyers:

If the behaviour of a program is undefined, we can't rule out the risk that it is exploitable.

So how might we move towards eliminating undefined behaviour from a given piece of software?

It would be nice if the compiler was able to produce meaningful warnings everywhere that the program might ever do anything undefined.

Of course this is impossible in the general case because the task of static analysis is NP-complete (mostly because the program's inputs are not known to the compiler).

However there is surely a way to produce *many* such meaningful warnings. In particular, it must be possible wherever the compiler is presently able to exploit the undefined-ness of an operation to make an optimisation.

Of course if we were to attempt this immediately we would start with a huge, huge number of false positives due to race conditions and the like which are for the most part not bothersome in practice.

If it were then possible to gradually reduce these warnings by meaningful annotations and the judicious introduction of memory barriers and relevant runtime checking, any progress would then be most reassuring.

How might we eliminate undefined behaviour?

Posted Jul 20, 2009 1:41 UTC (Mon) by mikov (guest, #33179) [Link]

A poster in The Register (Steve 105) gave a good example of why a warning in this case, while it does seem desirable at first sight, is probably not a good idea in general:

Surely the reason for the optimisation is (among other things) code like this:
inline char foo(char *p) { if (p == 0) return 0; else return *p; }

char bar(char *p) { *p = 2; return foo(p); }

int main() { char c = 0; return bar(&c); }
If foo gets inlined into bar, the compiler can spot that the null pointer check in the inlined code is unnecessary and remove it. This is a most excellent optimisation (granted, in this example foo and bar do so little work that other optimisations may render it unnecessary).

I think GCC would do good if it reported this a warning iff using the pointer and checking the pointer are in the same original function - that is the optimization didn't appear as a result of other optimizations like global inlining, etc.

That would have caught our case. I am not sure how difficult it would to implement this distinction though - I suspect quite.

How might we eliminate undefined behaviour?

Posted Jul 20, 2009 2:34 UTC (Mon) by xoddam (subscriber, #2322) [Link]

Point taken; there is some subltety involved :-)

But I do think that we can reasonably ask the compiler to distinguish between cases where an optimisation is *demonstrably* safe (ie. the result can be determined at compile time, as in your inlining case) and cases where the optimisation can result in totally different behaviour between naive and optimised object code.

The particular optimisation that resulted in this particular vulnerability does not follow from eager evaluation of known quantities but from an assumption on the part of the compiler that the program is so correct as to never step outside the language specification. Since the spec leaves so much undefined, and programmers are human, and input is untrusted, this assumption is less than completely valid -- and therefore such optimisations merit at the very least a BIG FAT WARNING.

How might we eliminate undefined behaviour?

Posted Jul 20, 2009 3:01 UTC (Mon) by MisterIO (guest, #36192) [Link]

I think you're mixing two different things. 1 is the warning or something similar that could be emitted in a clear bug like the one in the original code(and some static checkers should really find that kind of bug), 2 is the gcc's optimization, that has nothing to do with the bug. Sure, the optimization allows the exploitation of the bug, but the warning about the bug doesn't have anything to do with it and, even without the optimization, the bug is still there so the warning makes sense. I'm not saying that gcc should warn about it at all costs, but it's clearly something that a static checker should be able to do. For example it seems like cppcheck has accepted my suggestion to include it among its checks.

How might we eliminate undefined behaviour?

Posted Jul 20, 2009 9:33 UTC (Mon) by stevenb (guest, #11536) [Link]

Inlining is just one of the problems, but in general "good warnings" and "optimization" don't go well together.

It shouldn't be too hard to warn about this in GCC before inlining in recent GCCs. GCC internals could look like this, for the current trunk (r149803):
* put a new pass after pass_early_warn_uninitialized (which runs before inlining), say pass_null_pointer_check
* look for null-pointer checks in the function
* for every found null-pointer check, see if there is a dereference of the pointer that dominates the check.

Implementing the details is left as an exercise for the reader. This would only warn when optimizing.

The problem is that you could actually need optimizations to get a reliable warning. If you try to warn before doing any optimizations (including inlining) then you wouldn't warn about a snippet like this one:

void bar(char *p) {
char *q = p;
*q = 2;
if (p)
foo(p);
}

because you wouldn't find a dereference of p before inlining, but the compiler will copy propagate p into q to give:

void bar(char *p) {
char *q = p;
*p = 2;
if (p)
foo(p);
}

which would have given you the warning...

It shouldn't be very hard to construct cases where you get missed warnings or false positives depending on what optimizations you do before figuring out what to warn about.

If you warn after optimizing (including inlining perhaps) you may get lots of false positives. But if you warn before optimizations, you may not warn for cases that are obvious by inspection.

Nonsense. This is trivial stuff.

Posted Jul 20, 2009 9:38 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

“Bug that causes a trivial oops (only terminates the 'exploit' process) is not security relevant.”

-sigh-

Suppose there is a process which is supposed to watch a log and take certain actions (maybe alter a firewall rule) depending on what it sees in the logs. Now suppose there's a bug in that program, all the bug enables me (the attacker) to do is force it to call the read syscall with the log fd, a buf value of my choosing (maybe from bytes in the log) and count of zero.

On the face of it, this seems useless for me as an attacker.

Suppose I find a way to get read() to Oops if the buf value is N-1 for a particular magic value N, regardless of the value of count.

On the face of it, this too seems useless. A real security "expert" from the Internet has just assured us that it's not security relevant. But..

Combine them, and I've disabled the firewall tweaking log file reader. This was defending against brute force attacks on a network service, which I promptly break into.

Nonsense. This is trivial stuff.

Posted Jul 20, 2009 12:23 UTC (Mon) by spender (guest, #23067) [Link]

Your post is the perfect reason why I'm the security expert and you're not.

Your entire "suppose I find a way" argument is based off a non-existent bug. I was talking about the vulnerability I exploited. The only way you're going to turn that into a security problem for this firewall app you've invented is if you can get arbitrary code execution in the firewall app to force it to open a device it never wanted to open in the first place, and then perform an additional poll on the device that it never wanted to do in the first place. If you have arbitrary code execution within the firewall app, you might as well just call kill (or just fail with your arbitrary code execution and crash the process)!

Try again.

-Brad

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 0:54 UTC (Sat) by bojan (subscriber, #14302) [Link]

Yeah, that was what I thought exactly when I saw it:

struct sock *sk = tun->sk;  // initialize sk with tun->sk
…
if (!tun)
    return POLLERR;  // if tun is NULL return error

It is _way_ to late to check once tun has already been used. The rest is just other problems that may exist already in the kernel.

PS. I am not the author of that article, despite having the same first name.

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 1:18 UTC (Sat) by spender (guest, #23067) [Link]

Way too late to prevent a trivial oops which only terminates the process causing the oops, with no side-effects on system stability? Yes.
Way too late to prevent arbitrary code execution if -fno-delete-null-pointer-checks was used? No.

Without gcc's optimization, the code would have been unexploitable for arbitrary code execution. That's why it's being discussed.

-Brad

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 9:21 UTC (Sat) by Ross (guest, #4065) [Link]

You are saying that the optimization is what "changes" this.

But it doesn't even have to be an optimization -- the compiler can assume a pointer is valid anywhere in the execution as long as it has the same value and that value is, was, or is going to be dereferenced. And there is no "standard" behavior without said optimization unless you assume a lot about the way the compiler works (basically it would be assuming the compiler doesn't track which pointers are dereferenced). The program is telling something to the compiler which the compiler is likely to use, just like a compiler will almost always assume local variables keep the same value between assignments (unless their address is passed or they are marked volatile). You wouldn't complain about a compiler "skipping" reads of a stack slot. Dereferencing a pointer is shouting to the compiler: "hey! this pointer is good (obviously not NULL) because I'm using it!".

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 10:05 UTC (Sat) by bojan (subscriber, #14302) [Link]

Yeah, I get that. I'm referring to the code itself, which on its face is trivially wrong. In other words, the check whether tun==NULL should be _before_ sk=tun->sk, not after it.

Nonsense. This is trivial stuff.

Posted Jul 20, 2009 8:07 UTC (Mon) by bojan (subscriber, #14302) [Link]

After reading some of the C spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1336.pdf), section 6.3.2.3 has this:

> An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.57) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

So, my guess is that GCC folks simply put two and two together and got four. In other words, if tun was zero, it would not be pointing to any valid object. Therefore, the fact that the programmer is using it must mean that the programmer _knows_ that it will be pointing to a valid object (i.e. is non-zero). Therefore, the check for it being zero later on is superfluous and can be removed. Ergo, no bug in GCC.

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 15:59 UTC (Sat) by pflugstad (subscriber, #224) [Link]

I basically agree - the bug is obvious for anyone doing C coding. I actually would have expected the various static code analysis tools (Stanford Checker, Sparse, etc) to catch this kind of stuff.

But the other side of the discussion is that this bug is unexploitable if GCC had not optimized away the null check.

Nonsense. This is trivial stuff.

Posted Jul 18, 2009 16:14 UTC (Sat) by mikov (guest, #33179) [Link]

But the other side of the discussion is that this bug is unexploitable if GCC had not optimized away the null check.

Yes. I wonder what kind of process lead to finding this bug... Looking at the assembler output? Discovering it is very impressive. Brad really deserves a lot of praise.

But still, this is not different than any exploit which relies on knowledge of how a specific compiler works - e.g. how it places variables in the stack, and so on. It is exactly the same thing if you think about it. But nobody would claim that GCC has a bug because it placed the return address in the stack where it could be overwritten.

Brad keeps claiming that the important thing is that on the surface this bug doesn't present as security related. While that is true, nobody can predict how several bugs will interact. In my book any crashing bug is potentially security related until proven otherwise.

Nonsense. This is trivial stuff.

Posted Jul 19, 2009 3:27 UTC (Sun) by gmaxwell (guest, #30048) [Link]

Eh, if I were betting I'd suggest that this was found by running a simple static analysis tool that warns you of the check after the dereference. Many will.

No fancy examination of the assembly was required to find a bug which justified analysis for exploitability. The exact compiler behaviour was only needed for the creation of the exploit, which is pretty usual, not the finding of the bug.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 22:39 UTC (Fri) by nix (subscriber, #2304) [Link]

That's bloody ingenious, I must say. Horrible but ingenious.

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 23:35 UTC (Fri) by cde (guest, #46554) [Link]

Hi Brad,

Congrats on your exploit :) You did the right thing by releasing it right away, the OSS community owes you one.

Cheers,

Christophe

Linux 2.6.30 exploit posted

Posted Jul 17, 2009 23:41 UTC (Fri) by cde (guest, #46554) [Link]

I mean, releasing the details of this vulnerability without sitting on it.

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 10:53 UTC (Sat) by nix (subscriber, #2304) [Link]

Or (the worst case, which has been known although I'm sure Brad has never
done it) selling it to the blackhats instead of disclosing it. For all
that people whine about full disclosure, it's always got to be better than
*that*.

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 17:46 UTC (Sat) by spender (guest, #23067) [Link]

Indeed. Also, in case you weren't aware, the 'shady' purchasers (organized crime) of exploits pay out a lot more than the legitimate purchasers.
Also, while some legitimate purchasers do notify the vendor of the vulnerability, others do not. Customers of that legitimate purchaser get a copies of fully-working exploits (not PoCs) for vulnerabilities that are unfixed.

Keep that in mind.

-Brad

Then this vulnerability was only published because it was "spoiled goods"?

Posted Jul 20, 2009 5:38 UTC (Mon) by khim (subscriber, #9252) [Link]

Customers of that legitimate purchaser get a copies of fully- working exploits (not PoCs) for vulnerabilities that are unfixed.

Does it mean that vulnerabilities like discussed one are only ever disclosed when they can not be sold? This time fix was introduced before the exploit (sure it was not included in -stable relase, but it was in git-head already), so it was impossible to sell it to "legitimate purchaser"?

Then this vulnerability was only published because it was "spoiled goods"?

Posted Jul 20, 2009 11:20 UTC (Mon) by nix (subscriber, #2304) [Link]

Well, since he just said elsewhere in this thread that's he's going to start selling RH vulnerabilities, one must assume the answer, this time, was no, but in future will be yes.

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 20:42 UTC (Mon) by SEJeff (guest, #51588) [Link]

So where does 3Com's ZDI (Zero-Day Initiative) lay? On the good or bad side?

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 13:21 UTC (Sat) by jengelh (subscriber, #33263) [Link]

Does this affect immediate versions before 2.6.30 too? E.g. 2.6.27–2.6.29.

Linux 2.6.30 exploit posted

Posted Jul 18, 2009 17:49 UTC (Sat) by spender (guest, #23067) [Link]

No, as mentioned in the exploit the code that introduced the vulnerability was added in the 2.6.30 kernel. Previous kernels aren't vulnerable to this particular bug (but may be vulnerable to other bugs that not having the gcc optimization turned off made exploitable).

Additionally, the SELinux vulnerability likely affects all of your current distributions that use SELinux and is currently unfixed, so you're susceptible to exploitation of any existing null ptr dereference vulnerability.

-Brad

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 0:58 UTC (Sun) by mikov (guest, #33179) [Link]

Previous kernels aren't vulnerable to this particular bug (but may be vulnerable to other bugs that not having the gcc optimization turned off made exploitable).

Can you explain why you think that this is a common pattern in the kernel - using a pointer, and only later checking it was null?

It seems to me that it is highly unlikely to have other code affected by this problem. The bug is just horrible code and the kernel is not _that_ bad.

And that is even ignoring the fact that nobody should be doing NULL pointer deferences to begin with. My understanding is that unmapping the NULL page is a mainly diagnostic feature - not a security measure.

Linux 2.6.30 exploit posted

Posted Jul 19, 2009 22:15 UTC (Sun) by madscientist (subscriber, #16861) [Link]

I've been seeing a series of patches going in on various mailing lists that fix problems like this over the last few days. The issue is that the kernel uses lots of complex data structures, and programmers want to simplify the code by creating local variables. So, you might see a function like:

int foo(struct bar *barp)
{
    struct subbar *sb = barp->sub;

    if (!barp) die...

In this context it's not so hard to see, but in a function with lots of local variables, and which has been constructed over time (as the kernel structures add more abstraction), it might not be so clear.

It is true that even the most basic static code analysis tool will find this. I remember Coverity was doing "pro bono" checking of the kernel for a while; is that still going on? Maybe they haven't done 2.6.30 yet? Or maybe this class of errors was deemed low priority?

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 0:20 UTC (Mon) by jengelh (subscriber, #33263) [Link]

» but in a function with lots of local variables, and which has been constructed over time[...], it might not be so clear. [...]Static code analysis tool[s] will find this.

These days, this would probably be done with coccinelle/spatch. Does not need to be a full problem resolving patch, just one that flags it. Along the lines of the following example (I do not claim to have hit the spatch syntax right):

@@
type localtype
identifier localid, data, member
statement s
@@
-localtype localid = data->member;
+willnotcompile localtype localid = data->member;
 if (!data)
   s;
@@

Linux 2.6.30 exploit posted

Posted Jul 20, 2009 20:39 UTC (Mon) by tdz (subscriber, #58733) [Link]

A lot of this discussion reminds me of the noise around the original ext4 behavior.

I wish people would respect the relevant standards and not just make up some idea of how things should work. Otherwise, they shouldn't complain if something goes wrong; and especially don't blame others for these errors.

Regards, Thomas


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds