Theoretical vs. practical cryptography in the kernel
Port numbers assigned to network sockets are not an especially secure item — they are only 16 bits, after all. That said, there is value in keeping them from being predictable; an attacker who can guess which port number will be assigned next can interfere with communications and, in the worst case, inject malicious data. Seemingly back in March, Amit Klein reported a port-guessing vulnerability to the kernel's security team; properly exploited, this vulnerability could be used to inject malicious answers to DNS queries, as one example.
The source of the problem comes down to how the kernel selects port numbers, which should be chosen randomly so as to not be guessable by an attacker. The kernel is able to generate random numbers that, as far as anybody knows, are not predictable, but doing so takes time — more time than the network stack is willing to wait. So, instead, the networking code calls prandom_u32(), which is a much simpler PRNG; it is effectively a linear-feedback shift register. That makes it fast, but unsuited to cryptographic operations; its output is a relatively simple function of its state, so anybody who can figure out what its internal state is can predict its output going forward. Klein, it seems, was able to do exactly that by observing the port numbers assigned by the kernel.
Since the problem results from an attacker having worked out the internal state of prandom_u32(), an obvious way to address the problem is to change that state, preferably before the attacker learns what it is. The patch that was applied (written by Willy Tarreau) does this by mixing some random data into the pool used by prandom_u32() on every interrupt. This reseeding is also done when process accounting is performed, just to ensure that it happens even if no interrupts are coming in. An attacker can figure out the internal state just as easily as before but, the reasoning goes, by the time that has been done and the information can be used, said state has changed.
The complaints
On August 4, Marc Plumb posted a
request that Tarreau's patch be immediately reverted and rethought.
His concerns came down to this: "it allows an attacker to determine the
state and [from] there [the] output of /dev/random. I sincerely hope that
this was not
the intended goal :)
". Since /dev/random is the kernel's
"real" random-number generator, compromising its state could lead to far
worse consequences than guessable port numbers; that was indeed not
anybody's intended goal.
Perturbing the state of prandom_u32() doesn't seem like something that could wreck the security of an unrelated random-number generator, so it is worth looking into the specifics of the complaint. Those were best explained by George Spelvin (a pseudonym associated with a long-time occasional kernel developer). In short: an attacker might be able to work out the value of the additional random bits, and those same bits are fed into the kernel's main entropy pool.
As Spelvin explained, the motivation for adding noise to the PRNG's state is a fear that an attacker may know the current contents of that state. An attacker with that knowledge can, obviously, predict the future "random" values that will be generated by the PRNG. If you inject k bits of random data into the PRNG, the attacker will lose that knowledge — to an extent. But, if k is sufficiently small, this attacker can simply try the PRNG with all 2k possibilities and see which one corresponds to the actual observed output. Once that has been done, the attacker has determined the value of those k bits and regained knowledge of the PRNG's internal state.
That might be bad enough, suggesting that the fix that was applied may be ineffective at actually closing the vulnerability. The other part of the complaint, though, is that the 32 bits applied to the PRNG come from the "fast pool", which can be thought of as a sort of queue of random bits waiting to be fed into /dev/random. These bits are not removed from the fast pool; they are still, eventually, used to add entropy to the primary random-number generator. This, it is argued, gives the attacker insight into the kernel's real random numbers and might compromise the random-number generator entirely.
There is general agreement in the kernel community that such a compromise is highly unlikely to lead to anything good. That is where the agreement stops, though.
If nothing else, Spelvin argued, the injection of random data into the PRNG must be done in a way that does not allow the guessing of that data. In practice, that means making k sufficiently large that a brute-force attack to calculate k bits becomes impractical; this "catastrophic reseeding" makes the use of that data safer. Even better, of course, would be to replace the simple PRNG with something that does not so readily expose its internal state.
Responses
Tarreau responded that the assumption that the attacker knows the PRNG state is not warranted; the purpose of adding the noise was to make guessing that state much more difficult. If the time required to calculate the PRNG's internal state is longer than the time between perturbations, there should never be a point where the attacker truly knows that state. If that is true, then concerns about exposing the random bits added to the PRNG go away.
He also questioned the practicality of any attack on /dev/random even if those bits are exposed:
Linus Torvalds's response was rather more direct and categorical:
Because I don't think they exist. And I think it's actively dangerous to continue on the path of thinking that stable and "serious" algorithms that can be analyzed are better than the one that end up depending on random hardware details.
More generally, he said, the insistence on theoretical security has often had the effect of making the kernel less secure, and he had a number of examples to cite. The random-number generator traditionally blocked in the absence of sufficient entropy to initialize it, causing users to avoid it entirely; this issue was only addressed in late 2019. The insistence that only "analyzable" entropy sources could be used to feed the random-number generator has led to systems with no sources of entropy at all, resulting in vast numbers of embedded devices having the same "random" state. And more; the whole message merits a read.
This same problem, he added later, delayed the resolution of the port-number-guessing problem:
The addition of noise, he said, can also be useful in the face of a Meltdown-style vulnerability. In that case, an attacker may well be able to read out the state of the random-number generator, but the regular mixing of noise should limit the usefulness of that information. In general, adding noise is the opposite of the "analyzability" that (he argues) crypto people want, and that is a good thing. It is better, he said, if the internal state of any random-number generator is not susceptible to analysis. Torvalds was quite clear that he would entertain no patches addressing theoretical problems.
Where to from here
It seems evident that the patch in question will not be reverted anytime soon; that would require the demonstration of a practical attack, and it is far from clear that anybody can do that. So, for now, port numbers generated by Linux should be a bit harder to guess and, presumably, the random pool as a whole remains uncompromised.
That said, nobody felt the need to defend the simple PRNG that underlies prandom_u32() now; the door is clearly open to replacing it outright if a suitable alternative can be found. That alternative would have to be something that does not make guessing its internal state so easy, but it would also have to retain the performance characteristics of the current implementation. Torvalds has proclaimed that he will not accept a replacement that slows things down.
Spelvin has proposed a
replacement described as "a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key
"; Tarreau played with it some and
discussions over how to improve it have been ongoing. Tarreau also said that he is
working on a replacement based on the Middle Square Weyl Sequence
random-number generator [PDF], but no patches have been posted yet.
Meanwhile, Ted Ts'o, the maintainer of the kernel's random-number generator, has expressed worries that perhaps prandom_u32() is being used in places where it shouldn't be:
Replicating that grep suggests that there are nearly 500 call sites to be worried about — and that doesn't count any BPF programs in the wild using bpf_get_prandom_u32(). Replacing prandom_32() might help to mitigate such concerns, but it will not eliminate them. The whole point of prandom_32() is that performance takes priority over security, so if it is being used in places that require a proper random-number generator, there may indeed be a need for some changes in the near future.
Overall, random-number generation remains a surprisingly difficult
problem. It's hard to predict which issues will occupy kernel developers
in, say, 2030, but chances are good that random numbers will be one of
them. Meanwhile, Linux has always been a bit of a battleground between
theoretically superior solutions and those that are deemed to work in
real-world situations; that, too, is unlikely to change anytime soon.
| Index entries for this article | |
|---|---|
| Kernel | Random numbers |
| Kernel | Security/Random number generation |
