Two weeks ago on this page, we looked at
two problems in the Linux implementation of address space layout
randomization (ASLR). At the time, neither had been addressed by kernel
hackers, but, since then, both have been. One was a rather trivial fix,
but the other, regarding random number generation (RNG), led to a bit of a
contentious thread on linux-kernel.
There is always some tension between performance and security, and that is
part of why there was disagreement about how to fix a, clearly faulty, RNG
that is used for ASLR. As noted in our earlier article, recently reported
research had shown that the address space of a process could be
reconstructed because of the poor RNG used to generate the values.
Some part of the early discussion of the problem, along with the patch
originally proposed by Matt Mackall, occurred on the private
firstname.lastname@example.org mailing list. It surfaced on linux-kernel when Linus
Torvalds asked for opinions on Mackall's
Also, does anybody have any commentary or opinion on the patch by Matt
Mackall to use stronger random numbers than "get_random_int()". I wonder
what the performance impact of that is - "get_random_int()" is very cheap
by design, and many users may consider calling "get_random_bytes()" to be
overkill and a potential performance issue.
Quite frankly, the way "get_random_bytes()" works now (it does a _full_
sha thing every time), I think it's insane overkill. But I do have to
admit that our current "get_random_int()" is insane _underkill_.
I'd like to improve the latter without going to [quite] the extreme that
matt's patch did.
Mackall's patch used get_random_bytes()—a kernel-internal
source of random numbers which is equivalent to user space reading from
/dev/urandom—to define a get_random_u32(). That
function had a much better name as well as better random numbers. But, as
Torvalds pointed out, it does a lot more work. Mackall believed that was a reasonable starting point:
As to what's the appropriate sort of RNG for ASLR to use, finding a
balance between too strong and too weak is tricky. I'd rather move
things to a known safe point and back it off to acceptable performance
from there if anyone complains. So let's do my patch now.
Torvalds came up with a patch that improved
the randomness of get_random_int() by keeping the calculated hash
data between invocations, using that and a bit of noise from the system
(pid + jiffies) to feed back into the
half_md4_transform() function. Though it still uses the key that
changes every five minutes, there is general agreement that this increased
of the RNG without impacting performance.
But, half_md4_transform() does a reduced round version of the MD4
hash. MD4 itself is essentially "broken" in that collisions can be
trivially generated, so a reduced round version is weaker still. That
I still don't like it. I bounced it off some folks on the adversarial
side of things and they didn't think it looked strong enough either.
Full MD5 collisions can be generated about as fast as they can be
checked, which makes _reduced strength_ MD4 not much better than an
LFSR [linear feedback shift register] in terms of attack potential.
Instead of using the MD4 variant or his original patch, Mackall suggested
using SHA1 as a compromise of sorts. He did some benchmarks on the two and found that the SHA1
variant was about half as fast as the MD4 (.660us vs. .326us), which seemed
like an acceptable tradeoff to him. Ingo Molnar disagreed, noting that it came to roughly 1%
of the performance of fork() (which uses get_random_int()
starting stack_canary value). He also wanted to know what threat
model necessitated using SHA1. Mackall replied:
My threat model is that someone more clever and with a lot more
expertise attacking systems than either you or me will be able to
leverage the extreme weakness of this hash (O(1) attacks against the
*full* version!) into an attack that incrementally exposes the hidden
RNG state. I've asked a couple such people whether they think that's
likely, and they've said yes.
In fact, it's been known for over a decade that reduced-round MD4 such
as ours is *not one way* and that preimages (aka our hidden state) can
be found in less than an hour on a *mid-90s era PC*:
Torvalds, though, is unimpressed by
Mackall's "only twice as
long" argument: "In the kernel, we tend to never even talk
about how many
_times_ slower something is. We talk about cycles or small
percentages". In his typical fashion, Torvalds also points out
another flaw he
sees in Mackall's argument:
I mean, really. The virtual address randomization was never meant to be
"cryptographically secure" in that sense. Dammit, look at the code: it
only takes something like 8 bits of the results _anyway_.
In other words, YOUR WHOLE ARGUMENT IS TOTALLY INSANE. You talk about
"cryptographically secure hashes" for some 8-bit value. Listen to
yourself. At that point, any cryptographer will just ridicule you. There's
no point in trying to break the randomness, because you'll be much better
off just trying a lot of different values.
In the end, Torvalds's code (with a small addition by Molnar) went in to
the kernel, without the random driver maintainer's, Mackall's, "Acked-by" tag.
While the uses of get_random_int() in today's kernel may not
require that much in the way of randomness, it does seem wrong, at some
level, to have a function of that name, which doesn't really perform its job.
But, clearly this alleviates much, perhaps all, of the ASLR problems that
prompted the change. That can only be a good thing.
to post comments)