Theoretical vs. practical cryptography in the kernel
Port numbers assigned to network sockets are not an especially secure item — they are only 16 bits, after all. That said, there is value in keeping them from being predictable; an attacker who can guess which port number will be assigned next can interfere with communications and, in the worst case, inject malicious data. Seemingly back in March, Amit Klein reported a port-guessing vulnerability to the kernel's security team; properly exploited, this vulnerability could be used to inject malicious answers to DNS queries, as one example.
The source of the problem comes down to how the kernel selects port numbers, which should be chosen randomly so as to not be guessable by an attacker. The kernel is able to generate random numbers that, as far as anybody knows, are not predictable, but doing so takes time — more time than the network stack is willing to wait. So, instead, the networking code calls prandom_u32(), which is a much simpler PRNG; it is effectively a linear-feedback shift register. That makes it fast, but unsuited to cryptographic operations; its output is a relatively simple function of its state, so anybody who can figure out what its internal state is can predict its output going forward. Klein, it seems, was able to do exactly that by observing the port numbers assigned by the kernel.
Since the problem results from an attacker having worked out the internal state of prandom_u32(), an obvious way to address the problem is to change that state, preferably before the attacker learns what it is. The patch that was applied (written by Willy Tarreau) does this by mixing some random data into the pool used by prandom_u32() on every interrupt. This reseeding is also done when process accounting is performed, just to ensure that it happens even if no interrupts are coming in. An attacker can figure out the internal state just as easily as before but, the reasoning goes, by the time that has been done and the information can be used, said state has changed.
The complaints
On August 4, Marc Plumb posted a
request that Tarreau's patch be immediately reverted and rethought.
His concerns came down to this: "it allows an attacker to determine the
state and [from] there [the] output of /dev/random. I sincerely hope that
this was not
the intended goal :)
". Since /dev/random is the kernel's
"real" random-number generator, compromising its state could lead to far
worse consequences than guessable port numbers; that was indeed not
anybody's intended goal.
Perturbing the state of prandom_u32() doesn't seem like something that could wreck the security of an unrelated random-number generator, so it is worth looking into the specifics of the complaint. Those were best explained by George Spelvin (a pseudonym associated with a long-time occasional kernel developer). In short: an attacker might be able to work out the value of the additional random bits, and those same bits are fed into the kernel's main entropy pool.
As Spelvin explained, the motivation for adding noise to the PRNG's state is a fear that an attacker may know the current contents of that state. An attacker with that knowledge can, obviously, predict the future "random" values that will be generated by the PRNG. If you inject k bits of random data into the PRNG, the attacker will lose that knowledge — to an extent. But, if k is sufficiently small, this attacker can simply try the PRNG with all 2k possibilities and see which one corresponds to the actual observed output. Once that has been done, the attacker has determined the value of those k bits and regained knowledge of the PRNG's internal state.
That might be bad enough, suggesting that the fix that was applied may be ineffective at actually closing the vulnerability. The other part of the complaint, though, is that the 32 bits applied to the PRNG come from the "fast pool", which can be thought of as a sort of queue of random bits waiting to be fed into /dev/random. These bits are not removed from the fast pool; they are still, eventually, used to add entropy to the primary random-number generator. This, it is argued, gives the attacker insight into the kernel's real random numbers and might compromise the random-number generator entirely.
There is general agreement in the kernel community that such a compromise is highly unlikely to lead to anything good. That is where the agreement stops, though.
If nothing else, Spelvin argued, the injection of random data into the PRNG must be done in a way that does not allow the guessing of that data. In practice, that means making k sufficiently large that a brute-force attack to calculate k bits becomes impractical; this "catastrophic reseeding" makes the use of that data safer. Even better, of course, would be to replace the simple PRNG with something that does not so readily expose its internal state.
Responses
Tarreau responded that the assumption that the attacker knows the PRNG state is not warranted; the purpose of adding the noise was to make guessing that state much more difficult. If the time required to calculate the PRNG's internal state is longer than the time between perturbations, there should never be a point where the attacker truly knows that state. If that is true, then concerns about exposing the random bits added to the PRNG go away.
He also questioned the practicality of any attack on /dev/random even if those bits are exposed:
Linus Torvalds's response was rather more direct and categorical:
Because I don't think they exist. And I think it's actively dangerous to continue on the path of thinking that stable and "serious" algorithms that can be analyzed are better than the one that end up depending on random hardware details.
More generally, he said, the insistence on theoretical security has often had the effect of making the kernel less secure, and he had a number of examples to cite. The random-number generator traditionally blocked in the absence of sufficient entropy to initialize it, causing users to avoid it entirely; this issue was only addressed in late 2019. The insistence that only "analyzable" entropy sources could be used to feed the random-number generator has led to systems with no sources of entropy at all, resulting in vast numbers of embedded devices having the same "random" state. And more; the whole message merits a read.
This same problem, he added later, delayed the resolution of the port-number-guessing problem:
The addition of noise, he said, can also be useful in the face of a Meltdown-style vulnerability. In that case, an attacker may well be able to read out the state of the random-number generator, but the regular mixing of noise should limit the usefulness of that information. In general, adding noise is the opposite of the "analyzability" that (he argues) crypto people want, and that is a good thing. It is better, he said, if the internal state of any random-number generator is not susceptible to analysis. Torvalds was quite clear that he would entertain no patches addressing theoretical problems.
Where to from here
It seems evident that the patch in question will not be reverted anytime soon; that would require the demonstration of a practical attack, and it is far from clear that anybody can do that. So, for now, port numbers generated by Linux should be a bit harder to guess and, presumably, the random pool as a whole remains uncompromised.
That said, nobody felt the need to defend the simple PRNG that underlies prandom_u32() now; the door is clearly open to replacing it outright if a suitable alternative can be found. That alternative would have to be something that does not make guessing its internal state so easy, but it would also have to retain the performance characteristics of the current implementation. Torvalds has proclaimed that he will not accept a replacement that slows things down.
Spelvin has proposed a
replacement described as "a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key
"; Tarreau played with it some and
discussions over how to improve it have been ongoing. Tarreau also said that he is
working on a replacement based on the Middle Square Weyl Sequence
random-number generator [PDF], but no patches have been posted yet.
Meanwhile, Ted Ts'o, the maintainer of the kernel's random-number generator, has expressed worries that perhaps prandom_u32() is being used in places where it shouldn't be:
Replicating that grep suggests that there are nearly 500 call sites to be worried about — and that doesn't count any BPF programs in the wild using bpf_get_prandom_u32(). Replacing prandom_32() might help to mitigate such concerns, but it will not eliminate them. The whole point of prandom_32() is that performance takes priority over security, so if it is being used in places that require a proper random-number generator, there may indeed be a need for some changes in the near future.
Overall, random-number generation remains a surprisingly difficult
problem. It's hard to predict which issues will occupy kernel developers
in, say, 2030, but chances are good that random numbers will be one of
them. Meanwhile, Linux has always been a bit of a battleground between
theoretically superior solutions and those that are deemed to work in
real-world situations; that, too, is unlikely to change anytime soon.
Index entries for this article | |
---|---|
Kernel | Random numbers |
Kernel | Security/Random number generation |
Posted Aug 13, 2020 21:56 UTC (Thu)
by mrecho (guest, #53167)
[Link]
Posted Aug 13, 2020 22:03 UTC (Thu)
by jkingweb (subscriber, #113039)
[Link]
Posted Aug 13, 2020 23:32 UTC (Thu)
by ms-tg (subscriber, #89231)
[Link] (5 responses)
Given how recently the terms of respectful discussion of the Linux kernel were a focus of articles at LWN, I was wondering if it might benefit all of us to also write an article reviewing the tone and forms taken in this discussion, in addition to this excellent coverage of the substance of it?
Posted Aug 14, 2020 18:58 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Is George Spelvin a theoretician?
I guess George was simply refusing to see Linus' point of view that it's no point having a system that doesn't work. I have the same problem with filesystem devs, especially the transition from ext3 to 4 ... by accident 3's journal made sure that the CONTENTS of files were saved. 4 changed that. The filesystem guys' arguments seemed to be "but we can get the computer up and running quicker if we don't care about the contents". But if you DON'T KNOW whether the user's data has been corrupted, a few minutes less booting the system is most definitely NOT a worthwhile tradeoff against having to run a data restore or integrity check over a multi-terabyte database ...
At the end of the day, the point of the computer is not to run the operating system - it's to do jobs for users, and unlike a lot of computer guys Linus doesn't forget that. That's why linux has conquered pretty much everything except the desktop - and that's why MS is so hard to dislodge from the desktop, because they know that too ...
Cheers,
Posted Aug 15, 2020 15:38 UTC (Sat)
by nivedita76 (subscriber, #121790)
[Link]
And let's please not rehash the ext3 vs ext4 flamewar.
Posted Aug 15, 2020 6:34 UTC (Sat)
by LtWorf (subscriber, #124958)
[Link]
Posted Aug 15, 2020 7:22 UTC (Sat)
by ragnar (guest, #139237)
[Link]
Posted Aug 31, 2020 12:29 UTC (Mon)
by milesrout (subscriber, #126894)
[Link]
It's an internet mailing list. People say 'FFS' on internet mailing lists and have done since before I was born. Get used to it or get off the internet.
Posted Aug 14, 2020 1:02 UTC (Fri)
by robert.cohen@anu.edu.au (subscriber, #6281)
[Link] (3 responses)
How hard would it be to remove the bits from the "fast pool" when they are added to prandom_u32. That way they wouldnt be added to both RNG's.
Posted Aug 14, 2020 7:21 UTC (Fri)
by kleptog (subscriber, #1183)
[Link] (2 responses)
I also think the issue is way overblown. So 32-bits of this fast pool are used elsewhere. Even if they were completely exposed, it doesn't help you with the other 96-bits and they are eventually mixed even more before being fed to the actual pool.
If you require that all inputs to your random number generator be verifiably random then you have set your goals impossibly high as the article states. Entropy is a theoretical measure, there is no actual way to determine how random something actually is. You'd need to examine the entire multiverse and count the number of universes which are (apparently) identical except for those bits. A better approach seems to me to collect as much unpredictable data as you can and keep on mixing. Estimating the actual randomness is (IMHO) a fool's game.
Posted Aug 16, 2020 4:36 UTC (Sun)
by rahvin (guest, #16953)
[Link]
Posted Aug 31, 2020 11:52 UTC (Mon)
by cpitrat (subscriber, #116459)
[Link]
Hold my beer ...
Posted Aug 14, 2020 2:45 UTC (Fri)
by flussence (guest, #85566)
[Link] (22 responses)
(I guess this would sound more appealing if we didn't already have getrandom(2)…)
Posted Aug 14, 2020 18:47 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
If a flag says "wait until sufficient entropy is available" how many people will use it anyway, and then wonder why their system fails to boot ...
At the end of the day if programmers don't understand (or more likely bother to read) the documentation, ANY attempted fix is going to be a disaster.
Cheers,
Posted Aug 27, 2020 9:50 UTC (Thu)
by Vipketsh (guest, #134480)
[Link]
The real issue is that there is no reasonable response a program can make when it is told "no". If you asked for random, you need that random to continue, there is no way around that. You can read or write as much documentation as you want, but the situation doesn't change.
Posted Aug 14, 2020 19:49 UTC (Fri)
by shemminger (subscriber, #5739)
[Link] (19 responses)
Posted Aug 14, 2020 21:04 UTC (Fri)
by atnot (subscriber, #124910)
[Link] (18 responses)
Posted Aug 14, 2020 22:38 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
If the PRNG is not very secure, as in this case, then it is possible for an attacker to work out the PRNG's state. Once a plausible adversary has the ability to determine the value of an input to the PRNG (i.e. its seed), then that seed is no longer unpredictable, and it is not unreasonable to call it "used up" in that case, particularly if the same seed has also been reused for other purposes under the assumption that it was unpredictable. However, this characterization is problematic, because it encourages the (wrong) assumption that generating N bits of random data "uses up" f(N) bits of input entropy for some f() - which is certainly not the case with a proper CSPRNG. This was the whole problem with the random(4) interface randomly blocking for (in most cases) no good reason.
Posted Aug 15, 2020 3:06 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (16 responses)
The idea of reseeding (/mixing in new entropy in to) a prng, by the way, is much like the idea of re-keying in cryptography. It resets the search space for anyone trying to crack it. But if only one bit was added...? So we *have* to try to measure entropy.
What if we made heavy use of a source that turns out to be predictable or mostly predictable? The effective state space of the prng seed is dramatically reduced.
You say if the state of a prng is not known it doesn’t get easier to crack as its use continues. Given this, why does anyone ever rekey in cryptography? There are in fact many cryptographic attacks that require large quantities of cypher text (obviously with the same key). So, you rekey a lot, whether or not you know such an attack exists. Same for the prng handling ‘random‘.
So, we *have* to try to measure/guess entropy of our inputs - attackers certainly will. And, sure, we can’t do it perfectly, but we can get a decent idea of how predictable things are in practice, and use that.
Posted Aug 15, 2020 13:55 UTC (Sat)
by kleptog (subscriber, #1183)
[Link] (5 responses)
I don't see how this follows. Since entropy is only additive there is no reason not to simply include any and every piece of data that could possibly contain some entropy. For attackers it might be interesting to try and estimate the entropy, but for the system (Linux in this case) the only useful tactic is to be adding entropy faster than an attacker can constrain the search space.
Rekeying in data streams is indeed for resetting the search space. For a CSPRNG this means adding entropy equal to the size of the internal state. However, unlike in a data stream you don't actually need to know or care when this happens. There's no actual reason to be trying to estimate the entropy since it doesn't actually make a difference.
So throw in the block numbers of data being read from disk. Do they contain entropy? Who knows? Let a theoretical attacker worry about it. We've at least wasted their time having to think about it.
Posted Aug 15, 2020 18:38 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (2 responses)
Cheers,
Posted Aug 16, 2020 5:37 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Aug 26, 2020 20:04 UTC (Wed)
by amarao (guest, #87073)
[Link]
Posted Aug 16, 2020 18:55 UTC (Sun)
by nivedita76 (subscriber, #121790)
[Link] (1 responses)
The reason to reseed any PRNG, CS or not, is because you are concerned about an attacker using some means to exfiltrate the state of the PRNG (for a non-CSPRNG, this might be as easy as observing a moderate number of outputs, an LFSR just requires observing a number of outputs that is proportional to the state size, for example).
Given the scenario that reseeding is protecting against, adding small amounts of entropy at a time provides almost no additional security, and has two problems. Firstly, it effectively "wastes" that entropy: if you had waited to accumulate enough entropy and add it all at the same time, you could actually have gained security. Secondly, it actually reveals the bits you are mixing in to the attacker. If those bits are then also used later to seed a different PRNG, you've reduced the security of that PRNG as well, for no gain in security of the first one.
Posted Aug 24, 2020 14:16 UTC (Mon)
by ecree (guest, #95790)
[Link]
Except that this isn't actually true.
Viewing the attack from the information-theoretical perspective, there is _no difference_ between "mix in the entropy in dribs and drabs as you collect it" and "save it up until you can catastrophically re-seed": either you're adding entropy faster than it leaks (in which case you're fine) or you're leaking state faster than you add entropy (in which case you're screwed).
The difference only appears when you consider the amount of work required to brute-force-invert the mixing function _if_ state is leaking fast enough to allow the information-theoretic attack; partial-state-extension on frequent small re-seedings adds up to less total work than one big re-seed because the 2^k factor dominates.
Catastrophic re-seeding is thus important if your security depends on the cryptographic strength of your PRNG, but _not_ if your security depends on adding noise to the state faster than the attacker can obtain information to constrain that state (e.g. by observing outputs) — in which case cat reseed may actually _harm_ you by letting the attacker reach a greater partial state in the (longer) interval between re-seed events.
> for a non-CSPRNG, this might be as easy as observing a moderate number of outputs, an LFSR just requires observing a number of outputs that is proportional to the state size, for example
From the information-theoretical perspective, the same is true of any PRNG, CS or no, because you can always enumerate all possible states and see which one matches. Observing (on average) n consecutive outputs and doing O(n*2^n) work will crack an n-bit CSPRNG just fine. The difference with the LFSR is that there's a fairly easy 'cryptanalysis' that lets you do it with much less work than brute force.
One of the things that drove me crazy reading the original thread on netdev was the way some people (especially Spelvin) equivocated between the information-theoretical argument and the work argument. Tbh it looked like taking the Schneier cat-reseed paper and cargo-culting it into some kind of panacea-cum-universal-necessity.
Posted Aug 15, 2020 16:09 UTC (Sat)
by imMute (guest, #96323)
[Link] (9 responses)
One answer is damage mitigation. If the encryption key gets leaked, how much data is immediately available?
>What if we made heavy use of a source that turns out to be predictable or mostly predictable? The effective state space of the prng seed is dramatically reduced.
That's why you use more that one source of entropy. Suppose you don't trust RDRAND. I say mix it into the entropy pool anyway and just don't "credit" those bits. So far I've not seen anyone show that an untrusted entropy source is capable of *reducing* entropy of the pool.
Posted Aug 15, 2020 23:18 UTC (Sat)
by excors (subscriber, #95769)
[Link]
There's the idea shown in "Prototyping an RDRAND Backdoor in Bochs" (https://archive.org/stream/pocorgtfo03#page/n17/mode/1up). If code does something like "entropy_pool ^= RDRAND()" to mix in the new randomness, a malicious implementation of RDRAND could return "entropy_pool ^ x" where x is a random-looking value that is known to the attacker, cancelling out all the previous entropy. To someone who doesn't know x, the output of RDRAND would still look perfectly random, and the CPU is otherwise following the architecture specification perfectly (which makes it harder for users to observe any malicious behaviour).
That's probably not very useful in an emulator, because a malicious emulator could leak the content of entropy_pool in plenty of other ways, and it's probably difficult in a physical CPU without affecting the timing or performance counters in a way that someone would eventually notice. But still, it means in principle RDRAND can reduce entropy without doing anything other than returning a carefully constructed value.
Posted Aug 16, 2020 16:23 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (7 responses)
Cryptography is hard; the idea that we can add whatever to the pool forever is a dangerous one (the possibility of deliberately tainted inputs noted above, for example). I don’t think Linus is making that argument. He’s arguing about where the fence goes, not about removing it.
Posted Aug 16, 2020 16:49 UTC (Sun)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Umm, I don't think that's how it works at all. Adding a gigabyte of zeros to the pool doesn't reduce the entropy in it at all. It doesn't add anything either, but it doesn't hurt.
Tainted inputs are an issue, but that just means to need to add as many sources as you can find, and you have to be careful how you add things. If any untainted source is included you have basically made it infeasible for the attacker.
Posted Aug 20, 2020 4:43 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
> Umm, I don't think that's how it works at all. Adding a gigabyte of zeros to the pool doesn't reduce the entropy in it at all. It doesn't add anything either, but it doesn't hurt.
The original quote is in fact true, but does not mean what you think it means. If you didn't get enough data with real randomness, then you could have a predictable state. This happens regardless of whether or not you also added a bunch of stuff that wasn't random.
The problem, therefore, is whether they should do something with the entropy accounting, not whether they should allow reusing the fast pool bits in this fashion at all.
Posted Aug 17, 2020 7:18 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (4 responses)
HMAC(oldState, "Predictable stuff") -> newState
Doesn't tell us anything about oldState or newState, even though we're quite certain what "Predictable stuff" is. Worse, if we're a traitor inside the machine we can't usefully influence newState despite knowing exactly what oldState was and having total control over "Predictable stuff".
Posted Sep 2, 2020 18:29 UTC (Wed)
by tanriol (guest, #131322)
[Link] (3 responses)
In the HMAC example, before the "mixing" you had an N-bit pool in one of the 2**N possible states and when mixing every input state becomes an output state. However, as multiple input states can become a single output state, the final number of possible state decreases after this operation. This looks like the birthday paradox setting, so I'd guess that a series of known random mappings applied to the pool can reduce the number of possible states to sqrt(2**N) at most.
Posted Sep 8, 2020 11:31 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
I don't see why a method that exploits this for real (on an actual HMAC with an actual N-bit pool) wouldn't also work perfectly well to collide arbitrary hashes, which is not supposed to be practical. But maybe you can explain why it doesn't do that?
Posted Sep 8, 2020 12:11 UTC (Tue)
by tanriol (guest, #131322)
[Link] (1 responses)
In the original statement you said that spewing a bunch of something does not benefit the adversary *at all*. The way I understood that was that the N-bit pool still has N bits of entropy. That's obviously not the case because the number of possible states goes down, and quick experiments with a really small toy HMAC confirm that (repeatedly running it causes the entropy to reduce from the original 16 bits to 8 or so, just as expected from the birthday paradox style configuration). To avoid that you need your primitive to be an (input-driven, obviously) permutation in the state space so that the number of states stays the same. There are constructions like this - IIUC, SHA3 uses one, unlike MD5/SHA1/SHA2 - but you need to pay attention to this.
On the other hand, for collision attacks N/2 is already the baseline due to the birthday paradox, so by itself this effect does not provide any improvements over that - a hash resistant to collision attacks would also be resistant to this effect.
Posted Sep 9, 2020 10:00 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link]
Theoretical vs. practical cryptography in the kernel
Or would that 'random' re seeding cause issues?
Theoretical vs. practical cryptography in the kernel
Would appreciate review of the language used
Would appreciate review of the language used
Wol
Would appreciate review of the language used
Would appreciate review of the language used
Would appreciate review of the language used
Would appreciate review of the language used
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Wol
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Both could easily use up all the available entropy for other high need users.
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Wol
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
It seems Linus is in a similar line of thinking - who cares if we can't *prove* that all these things are secure, throw a shitload of them into the mix and make the attackers job *extremely* difficult.
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
If you add a bunch of stuff that turns out to not be random, then you could make the state predictable if you didn’t get enough data with real randomness in. Linus is having an argument of degree here - this stuff looks pretty random, let’s add it in. He’s very much not suggesting you add ... whatever ... to the pool.
Theoretical vs. practical cryptography in the kernel
Theoretical vs. practical cryptography in the kernel
You can in fact "add whatever to the pool forever"
You can in fact "add whatever to the pool forever"
You can in fact "add whatever to the pool forever"
You can in fact "add whatever to the pool forever"
You can in fact "add whatever to the pool forever"