LWN.net Logo

Random numbers for ASLR

By Jake Edge
May 13, 2009

Two weeks ago on this page, we looked at two problems in the Linux implementation of address space layout randomization (ASLR). At the time, neither had been addressed by kernel hackers, but, since then, both have been. One was a rather trivial fix, but the other, regarding random number generation (RNG), led to a bit of a contentious thread on linux-kernel.

There is always some tension between performance and security, and that is part of why there was disagreement about how to fix a, clearly faulty, RNG that is used for ASLR. As noted in our earlier article, recently reported research had shown that the address space of a process could be reconstructed because of the poor RNG used to generate the values. Some part of the early discussion of the problem, along with the patch originally proposed by Matt Mackall, occurred on the private security@kernel.org mailing list. It surfaced on linux-kernel when Linus Torvalds asked for opinions on Mackall's fix:

Also, does anybody have any commentary or opinion on the patch by Matt Mackall to use stronger random numbers than "get_random_int()". I wonder what the performance impact of that is - "get_random_int()" is very cheap by design, and many users may consider calling "get_random_bytes()" to be overkill and a potential performance issue.

Quite frankly, the way "get_random_bytes()" works now (it does a _full_ sha thing every time), I think it's insane overkill. But I do have to admit that our current "get_random_int()" is insane _underkill_.

I'd like to improve the latter without going to [quite] the extreme that matt's patch did.

Mackall's patch used get_random_bytes()—a kernel-internal source of random numbers which is equivalent to user space reading from /dev/urandom—to define a get_random_u32(). That function had a much better name as well as better random numbers. But, as Torvalds pointed out, it does a lot more work. Mackall believed that was a reasonable starting point:

As to what's the appropriate sort of RNG for ASLR to use, finding a balance between too strong and too weak is tricky. I'd rather move things to a known safe point and back it off to acceptable performance from there if anyone complains. So let's do my patch now.

Torvalds came up with a patch that improved the randomness of get_random_int() by keeping the calculated hash data between invocations, using that and a bit of noise from the system (pid + jiffies) to feed back into the half_md4_transform() function. Though it still uses the key that changes every five minutes, there is general agreement that this increased the strength of the RNG without impacting performance.

But, half_md4_transform() does a reduced round version of the MD4 hash. MD4 itself is essentially "broken" in that collisions can be trivially generated, so a reduced round version is weaker still. That concerns Mackall:

I still don't like it. I bounced it off some folks on the adversarial side of things and they didn't think it looked strong enough either. Full MD5 collisions can be generated about as fast as they can be checked, which makes _reduced strength_ MD4 not much better than an LFSR [linear feedback shift register] in terms of attack potential.

Instead of using the MD4 variant or his original patch, Mackall suggested using SHA1 as a compromise of sorts. He did some benchmarks on the two and found that the SHA1 variant was about half as fast as the MD4 (.660us vs. .326us), which seemed like an acceptable tradeoff to him. Ingo Molnar disagreed, noting that it came to roughly 1% of the performance of fork() (which uses get_random_int() for the starting stack_canary value). He also wanted to know what threat model necessitated using SHA1. Mackall replied:

My threat model is that someone more clever and with a lot more expertise attacking systems than either you or me will be able to leverage the extreme weakness of this hash (O(1) attacks against the *full* version!) into an attack that incrementally exposes the hidden RNG state. I've asked a couple such people whether they think that's likely, and they've said yes.

In fact, it's been known for over a decade that reduced-round MD4 such as ours is *not one way* and that preimages (aka our hidden state) can be found in less than an hour on a *mid-90s era PC*:

Torvalds, though, is unimpressed by Mackall's "only twice as long" argument: "In the kernel, we tend to never even talk about how many _times_ slower something is. We talk about cycles or small percentages". In his typical fashion, Torvalds also points out another flaw he sees in Mackall's argument:

I mean, really. The virtual address randomization was never meant to be "cryptographically secure" in that sense. Dammit, look at the code: it only takes something like 8 bits of the results _anyway_.

In other words, YOUR WHOLE ARGUMENT IS TOTALLY INSANE. You talk about "cryptographically secure hashes" for some 8-bit value. Listen to yourself. At that point, any cryptographer will just ridicule you. There's no point in trying to break the randomness, because you'll be much better off just trying a lot of different values.

In the end, Torvalds's code (with a small addition by Molnar) went in to the kernel, without the random driver maintainer's, Mackall's, "Acked-by" tag. While the uses of get_random_int() in today's kernel may not require that much in the way of randomness, it does seem wrong, at some level, to have a function of that name, which doesn't really perform its job. But, clearly this alleviates much, perhaps all, of the ASLR problems that prompted the change. That can only be a good thing.


(Log in to post comments)

Random numbers for ASLR

Posted May 14, 2009 4:27 UTC (Thu) by wahern (subscriber, #37304) [Link]

What's wrong w/ the ARC4-derived PRNG, arc4random(), that OpenBSD uses? Faster than SHA-1, it exemplifies an acceptable balance between security and performance. Not to mention there are more modern one-way hashes which are both stronger and faster to compute than SHA-1.

Linux suffers from one of the worst cases of NIH syndrome I've ever come across. It's only become worse over the years.

Random numbers for ASLR

Posted May 14, 2009 5:10 UTC (Thu) by bvdm (guest, #42755) [Link]

Yes, exactly. I find this compromise a bit shocking.

The best practice, which is fairly well-known from NIST and academic sources, is to use a strong (and expensive) RNG source to initialize a stream cipher. Preferably a trusted block cipher such as AES in a stream mode-of-operation should be used given the security history of stream ciphers, though RC4 may still be okay.

The stream output is by definition cryptographically still strong but computationally far less expensive random bytes. Definitely much cheaper than using a crypto hash function such as the note-quite-broken-yet SHA-1 or the obsolete MD4/5.

Why is a hack being implemented when a standard solution exist? Why is MD4 still in the kernel at all?

Random numbers for ASLR

Posted May 14, 2009 5:22 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

Because the code was in the kernel and already debugged.

Because a fix was needed.

Because no one who was good with pseudo random number generators cared to join in the conversation.

Because just about anything is better than a constant value.

So are you willing to write a patch to correct the situation?

This is problem of communication

Posted May 14, 2009 14:00 UTC (Thu) by khim (subscriber, #9252) [Link]

So are you willing to write a patch to correct the situation?

A lot of guys can and will write a patch (including me), but few a ready to fight with kernel developers...

This is problem of communication

Posted May 14, 2009 18:10 UTC (Thu) by zooko (subscriber, #2589) [Link]

Yeah, I personally would hesitate to offer patches or analysis to lkml, simply because I don't want to be part of a conversation with that tone.

For what it is worth, it sounds like Matt Mackall stated a plausible threat -- that an attacker might be able to predict results from get_random_int(), thus being able to predict the address space randomization that is supposed to stop him. Linus's reply as quoted in this article, saying that Matt's concern is "TOTALLY INSANE" doesn't make sense to me.

I don't think that any cryptographer would ridicule Matt for this concern. To the contrary, I've always observed cryptographers (the vast majority of them) to be polite and precise. Engaging in ridicule leads one to make mistakes. ;-)

This is problem of communication

Posted May 15, 2009 0:10 UTC (Fri) by droundy (subscriber, #4559) [Link]

From what I could tell, his point was that they're only using 8 bits of randomness in the ASLR (presumably because of address-layout constraints and not wanting to waste address space?), which means that 256 tries will get you the answer anyhow. If there's a simple and easy attack, why should we work so hard to prevent a tricky sophisticated attack that achieves the same result?

But then again, maybe I misunderstood, in which case I'll be glad to be corrected...

Random numbers for ASLR

Posted May 15, 2009 15:43 UTC (Fri) by NRArnot (subscriber, #3033) [Link]

Patch?

How about changing the name? get_randomish_int()? get_somewhat_random_int()? fast_get_randomish_int()?

Any of these would make it clear what is (or isn't) being accomplished.

Random numbers for ASLR

Posted May 14, 2009 15:18 UTC (Thu) by copsewood (subscriber, #199) [Link]

The question Linus raised concerns how much slower can we accept a kernel everyone uses in order to defend against a secondary attack against address space layout randomisation defences (ASLR) against a primary class of attack (buffer or heap overflow) when insufficient bits of entropy are available within the ASLR defence to prevent brute force attacks against this defence in the first place. If 8 bits of genuine entropy are available, this makes a successful buffer overflow exploit work one in 2**8 i.e. 256 times attempted compared to without ASLR, and if 16 bits are available the exploit might work one in 2**16 times given a fully unpredictable ASLR within this set of permutations compared to a system without ASLR.

So the whole point of ASLR isn't to make a buffer or heap attack impossible, but to make it more difficult for an attacker to have code of their choosing executed by a vulnerable application, which is hopefully more likely to crash when an exploit is attempted than give a privilege escalation to the attacker. In a more fully secure system, the attacker wouldn't be able to carry out the primary buffer or heap overflow attack on a vulnerable application through an unvalidated input method, because these primary security bases would have been dealt with first; ASLR is a second line of defence and not the first.

So if the ASLR coding is too performance expensive it will have to be a compile time option and too few people will compile it in to make it of any use to them at all. Better to have enough speed that it doesn't have to be an option and to have enough unpredictability so those trying to defeat it are more likely to use brute force than PRNG state prediction.

Random numbers for ASLR

Posted May 14, 2009 20:55 UTC (Thu) by wahern (subscriber, #37304) [Link]

This argument is pure misdirection. There should be little doubt that cryptographically strong PRNGs exist which are just as performant as whatever ridiculous MD4 hack is being used now. Clearly there are cryptographers falling over themselves to try provide the code to Linus & Co.; he's just not hearing it.

Use a strong PRNG now, and worry (or not worry, as it may be) about ASLR entropy later. But who's going to care about the latter if the former would be the weak chain, regardless.

And let's not forget that there's an army of wholly untalented OEM engineers who will follow Linus & Co's lead, and will be (and surely are) using this inadequate PRNG for sundry cases in existing products, mimicking the same, poor calculus Linus uses in his stubborn defense.

Random numbers for ASLR

Posted May 14, 2009 21:47 UTC (Thu) by nix (subscriber, #2304) [Link]

There should be little doubt that cryptographically strong PRNGs exist which are just as performant as whatever ridiculous MD4 hack is being used now. Clearly there are cryptographers falling over themselves to try provide the code to Linus & Co.; he's just not hearing it.
If so, they're not doing it in that thread. Matt presented a PRNG that was twice as slow as the existing (crappy but cheap) MD4 one, to be used in time-critical contexts like process execution. That's not going to fly, given that that path has attention paid to every last cycle.

Random numbers for ASLR

Posted May 15, 2009 11:55 UTC (Fri) by man_ls (guest, #15091) [Link]

Linus wanted to speak about a small % instead of a big factor. Just from reading the article Ingo provided the answer: looking at the numbers in context we are talking about a 1% performance hit in fork(). If your system is completely CPU-bound and fork() takes half the CPU then your task will take 0.5% more to execute (seven extra minutes every day). I think it is quite acceptable for even a small increase in security.

Random numbers for ASLR

Posted May 15, 2009 12:33 UTC (Fri) by hppnq (guest, #14462) [Link]

If you genuinely worry about address space related exploits, you will know that ASLR is not really going to save the day regardless of the RNG used, although obviously some randomization is needed.

Random numbers for ASLR

Posted May 15, 2009 18:17 UTC (Fri) by nix (subscriber, #2304) [Link]

ASLR can make attacks very much harder, but only on 64-bit systems (and of
course only if more than 8 bits of randomness is used). On 32-bit there
isn't the room to make attacks significantly harder :(

Random numbers for ASLR

Posted May 14, 2009 13:01 UTC (Thu) by nix (subscriber, #2304) [Link]

How much slower is arc4random() than the current function, though? *That* is why Linus rejected SHA-1 in the first place...

(and this last-names kick gets seriously silly when you find yourself referring to Linus by his last name. Not even the Economist did that.)

Mr Torvalds

Posted May 15, 2009 12:01 UTC (Fri) by man_ls (guest, #15091) [Link]

and this last-names kick gets seriously silly when you find yourself referring to Linus by his last name.
If you are referring to Jake as the author of the article then I have to disagree. A consistent naming convention (first name or last name) adds to the clarity of the story. I find it mildly distracting when Jon refers to Linus by first name in an article, even if it is completely justified given the circumstances.
Not even the Economist did that.
How come?

Mr Torvalds

Posted May 15, 2009 12:21 UTC (Fri) by hppnq (guest, #14462) [Link]

You too?! Edge, you mean.

Mr Torvalds

Posted May 15, 2009 18:32 UTC (Fri) by man_ls (guest, #15091) [Link]

But I am not a reporter, my friend :D I try to follow the usual (altough perhaps outfashioned) rule: use first name for those people with whom I have had personal contact (maybe not too personal, even an exchange in a public forum qualifies); and family name elsewhere. So you see, when I say "Linus" I am actually being a bit pretentious. And of course I have had several pleasant exchanges with Jon and Jake.

Mr Torvalds

Posted May 15, 2009 14:47 UTC (Fri) by jake (editor, #205) [Link]

> Not even the Economist did that.

Yeah I was a bit puzzled by nix's comment, assuming, without any real knowledge, that the Economist did its normal 'Mr. Torvalds' on second reference. Which your link would seem to bear out.

There is, after all, more than one 'Linus' working on the kernel. So far, I haven't run into a problem with folks sharing the same last name in a particular article, but first names do collide from time to time.

jake

Mr Torvalds

Posted May 15, 2009 18:16 UTC (Fri) by nix (subscriber, #2304) [Link]

My memory is failing, it seems: that link was pretty persuasive.

(I'm surprised the kernel hasn't run into the Rasmussen/Smith problem yet:
in my experience family names are a *lot* more limited in supply than
personal names. This is especially notable in China.)

Random numbers for ASLR

Posted May 21, 2009 9:21 UTC (Thu) by PaXTeam (subscriber, #24616) [Link]

i think there're quite a few issues conflated in this discussion, let me try to clear them up.

1. purpose of ASLR

i think as the creator of ASLR, i can speak with some authority here: ASLR was *NOT* designed or ever meant to protect against local attacks. really. it was meant as a temporary solution against certain exploit techniques used in remote attacks only.

there're two very simple reasons for this decision: one is that on localhost it's very easy to bruteforce even 32 bits of randomness (if you remember last year's task refcount overflow bug), and we can't get much more than that even for 64 bit userland (note that for remote attacks even 32 bit userland is adequately protected, at least in PaX, despite what the Shacham paper claimed).

the other reason is that anyone able to run an exploit on localhost is much better off by exploiting a kernel bug simply because the kernel is always there whereas the userland process/program may or may not be on the target box (and may not even give root access directly anyway when exploited). not to mention that there's virtually no protection against exploiting kernel bugs, except some attempts in PaX. so it's a lot more economical to develop and use kernel exploits on localhost. this line of reasoning was true back in 2001 as much as it is true (even more so) these days.

2. PRNG for ASLR

for the above mentioned reasons it should be clear that there's no need for a strong PRNG for ASLR's use as the same infoleak that would allow an attack on the PRNG would already destroy the protection provided by ASLR anyway, there's no use in attacking the PRNG itself. one could digress into situations where the attacker leaks randomization info from one process and uses it to reconstruct another's, but that's easy to defend against without resorting to a strong PRNG.

3. fast and 'good enough' PRNG

independently of ASLR, it would still be nice if the kernel provided a 'strong' but fast PRNG device that one could for example use to sanitize a harddrive at raw write speeds, something that isn't possible with /dev/urandom for example. if such a PRNG existed it could then of course be used for ASLR as well but ASLR itself can live with less (ditto for the SSP cookie by the way).

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds