Theoretical vs. practical cryptography in the kernel

By Jonathan Corbet
August 13, 2020

Shortly before the release of the 5.8 kernel, a brief patch to a pseudo-random-number generator (PRNG) used by the networking stack was quietly applied to the kernel. As is the norm for such things, the changelog gave no indication that a security vulnerability had been fixed, but that turns out indeed to be the case. The resulting controversy had little to do with the original vulnerability, though, and everything to do with how cryptographic security is managed in the kernel. Figuring prominently in the discussion was the question of whether theoretical security can undermine security in the real world.

Port numbers assigned to network sockets are not an especially secure item — they are only 16 bits, after all. That said, there is value in keeping them from being predictable; an attacker who can guess which port number will be assigned next can interfere with communications and, in the worst case, inject malicious data. Seemingly back in March, Amit Klein reported a port-guessing vulnerability to the kernel's security team; properly exploited, this vulnerability could be used to inject malicious answers to DNS queries, as one example.

The source of the problem comes down to how the kernel selects port numbers, which should be chosen randomly so as to not be guessable by an attacker. The kernel is able to generate random numbers that, as far as anybody knows, are not predictable, but doing so takes time — more time than the network stack is willing to wait. So, instead, the networking code calls prandom_u32(), which is a much simpler PRNG; it is effectively a linear-feedback shift register. That makes it fast, but unsuited to cryptographic operations; its output is a relatively simple function of its state, so anybody who can figure out what its internal state is can predict its output going forward. Klein, it seems, was able to do exactly that by observing the port numbers assigned by the kernel.

Since the problem results from an attacker having worked out the internal state of prandom_u32(), an obvious way to address the problem is to change that state, preferably before the attacker learns what it is. The patch that was applied (written by Willy Tarreau) does this by mixing some random data into the pool used by prandom_u32() on every interrupt. This reseeding is also done when process accounting is performed, just to ensure that it happens even if no interrupts are coming in. An attacker can figure out the internal state just as easily as before but, the reasoning goes, by the time that has been done and the information can be used, said state has changed.

The complaints

On August 4, Marc Plumb posted a request that Tarreau's patch be immediately reverted and rethought. His concerns came down to this: "it allows an attacker to determine the state and [from] there [the] output of /dev/random. I sincerely hope that this was not the intended goal :)". Since /dev/random is the kernel's "real" random-number generator, compromising its state could lead to far worse consequences than guessable port numbers; that was indeed not anybody's intended goal.

Perturbing the state of prandom_u32() doesn't seem like something that could wreck the security of an unrelated random-number generator, so it is worth looking into the specifics of the complaint. Those were best explained by George Spelvin (a pseudonym associated with a long-time occasional kernel developer). In short: an attacker might be able to work out the value of the additional random bits, and those same bits are fed into the kernel's main entropy pool.

As Spelvin explained, the motivation for adding noise to the PRNG's state is a fear that an attacker may know the current contents of that state. An attacker with that knowledge can, obviously, predict the future "random" values that will be generated by the PRNG. If you inject k bits of random data into the PRNG, the attacker will lose that knowledge — to an extent. But, if k is sufficiently small, this attacker can simply try the PRNG with all 2^k possibilities and see which one corresponds to the actual observed output. Once that has been done, the attacker has determined the value of those k bits and regained knowledge of the PRNG's internal state.

That might be bad enough, suggesting that the fix that was applied may be ineffective at actually closing the vulnerability. The other part of the complaint, though, is that the 32 bits applied to the PRNG come from the "fast pool", which can be thought of as a sort of queue of random bits waiting to be fed into /dev/random. These bits are not removed from the fast pool; they are still, eventually, used to add entropy to the primary random-number generator. This, it is argued, gives the attacker insight into the kernel's real random numbers and might compromise the random-number generator entirely.

There is general agreement in the kernel community that such a compromise is highly unlikely to lead to anything good. That is where the agreement stops, though.

If nothing else, Spelvin argued, the injection of random data into the PRNG must be done in a way that does not allow the guessing of that data. In practice, that means making k sufficiently large that a brute-force attack to calculate k bits becomes impractical; this "catastrophic reseeding" makes the use of that data safer. Even better, of course, would be to replace the simple PRNG with something that does not so readily expose its internal state.

Responses

Tarreau responded that the assumption that the attacker knows the PRNG state is not warranted; the purpose of adding the noise was to make guessing that state much more difficult. If the time required to calculate the PRNG's internal state is longer than the time between perturbations, there should never be a point where the attacker truly knows that state. If that is true, then concerns about exposing the random bits added to the PRNG go away.

He also questioned the practicality of any attack on /dev/random even if those bits are exposed:

I'd be particularly interested in seeing this reality demonstrated, considering that the whole 128 bits of fast_pool together count as a single bit of entropy, and that as such, even if you were able to figure the value of the 32 bits leaked to net_rand_state, you'd still have to guess the 96 other bits for each single entropy bit.

Linus Torvalds's response was rather more direct and categorical:

I want to hear _practical_ attacks against this whole "let's mix in the instruction pointer and the cycle counter into both the 'serious' random number generator and the prandom one".

Because I don't think they exist. And I think it's actively dangerous to continue on the path of thinking that stable and "serious" algorithms that can be analyzed are better than the one that end up depending on random hardware details.

More generally, he said, the insistence on theoretical security has often had the effect of making the kernel less secure, and he had a number of examples to cite. The random-number generator traditionally blocked in the absence of sufficient entropy to initialize it, causing users to avoid it entirely; this issue was only addressed in late 2019. The insistence that only "analyzable" entropy sources could be used to feed the random-number generator has led to systems with no sources of entropy at all, resulting in vast numbers of embedded devices having the same "random" state. And more; the whole message merits a read.

This same problem, he added later, delayed the resolution of the port-number-guessing problem:

And that was actually reported to us back in early March. Almost five months later, nothing had been done, all the discussion bogged down about "theoretical randomness" rather than to actual real-life security.

The addition of noise, he said, can also be useful in the face of a Meltdown-style vulnerability. In that case, an attacker may well be able to read out the state of the random-number generator, but the regular mixing of noise should limit the usefulness of that information. In general, adding noise is the opposite of the "analyzability" that (he argues) crypto people want, and that is a good thing. It is better, he said, if the internal state of any random-number generator is not susceptible to analysis. Torvalds was quite clear that he would entertain no patches addressing theoretical problems.

Where to from here

It seems evident that the patch in question will not be reverted anytime soon; that would require the demonstration of a practical attack, and it is far from clear that anybody can do that. So, for now, port numbers generated by Linux should be a bit harder to guess and, presumably, the random pool as a whole remains uncompromised.

That said, nobody felt the need to defend the simple PRNG that underlies prandom_u32() now; the door is clearly open to replacing it outright if a suitable alternative can be found. That alternative would have to be something that does not make guessing its internal state so easy, but it would also have to retain the performance characteristics of the current implementation. Torvalds has proclaimed that he will not accept a replacement that slows things down.

Spelvin has proposed a replacement described as "a homebrew cryptographic PRNG based on the SipHash round function, which is in turn seeded with 128 bits of strong random key"; Tarreau played with it some and discussions over how to improve it have been ongoing. Tarreau also said that he is working on a replacement based on the Middle Square Weyl Sequence random-number generator [PDF], but no patches have been posted yet.

Meanwhile, Ted Ts'o, the maintainer of the kernel's random-number generator, has expressed worries that perhaps prandom_u32() is being used in places where it shouldn't be:

That being said, a quick "git grep prandom_u32" shows that there are a *huge* number of uses of prandom_u32() and whether they are all appropriate uses of prandom_u32(), or kernel developers are using it because "I haz a ne3D for spE3d" but in fact it's for a security critical application is a pretty terrifying question. If we start seeing CVE's getting filed caused by inappropriate uses of prandom_u32, to be honest, it won't surprise me.

Replicating that grep suggests that there are nearly 500 call sites to be worried about — and that doesn't count any BPF programs in the wild using bpf_get_prandom_u32(). Replacing prandom_32() might help to mitigate such concerns, but it will not eliminate them. The whole point of prandom_32() is that performance takes priority over security, so if it is being used in places that require a proper random-number generator, there may indeed be a need for some changes in the near future.

Overall, random-number generation remains a surprisingly difficult problem. It's hard to predict which issues will occupy kernel developers in, say, 2030, but chances are good that random numbers will be one of them. Meanwhile, Linux has always been a bit of a battleground between theoretically superior solutions and those that are deemed to work in real-world situations; that, too, is unlikely to change anytime soon.

Index entries for this article
Kernel	Random numbers
Kernel	Security/Random number generation

Theoretical vs. practical cryptography in the kernel

Posted Aug 13, 2020 21:56 UTC (Thu) by mrecho (guest, #53167) [Link]

What's interesting is that there is __prandom_timer() that reseeds the state every so often. Could the __init be setup with the same thing?
Or would that 'random' re seeding cause issues?

Theoretical vs. practical cryptography in the kernel

Posted Aug 13, 2020 22:03 UTC (Thu) by jkingweb (subscriber, #113039) [Link]

Nicely concise summary. My compliments to the chef.

Would appreciate review of the language used

Posted Aug 13, 2020 23:32 UTC (Thu) by ms-tg (subscriber, #89231) [Link] (5 responses)

I noticed yesterday that the discussion between Linus and George Spelvin included some *heated* language, including “FFS” and “idiot”.

Given how recently the terms of respectful discussion of the Linux kernel were a focus of articles at LWN, I was wondering if it might benefit all of us to also write an article reviewing the tone and forms taken in this discussion, in addition to this excellent coverage of the substance of it?

Would appreciate review of the language used

Posted Aug 14, 2020 18:58 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

Linus is an engineer - to him the correct question is "does it work".

Is George Spelvin a theoretician?

I guess George was simply refusing to see Linus' point of view that it's no point having a system that doesn't work. I have the same problem with filesystem devs, especially the transition from ext3 to 4 ... by accident 3's journal made sure that the CONTENTS of files were saved. 4 changed that. The filesystem guys' arguments seemed to be "but we can get the computer up and running quicker if we don't care about the contents". But if you DON'T KNOW whether the user's data has been corrupted, a few minutes less booting the system is most definitely NOT a worthwhile tradeoff against having to run a data restore or integrity check over a multi-terabyte database ...

At the end of the day, the point of the computer is not to run the operating system - it's to do jobs for users, and unlike a lot of computer guys Linus doesn't forget that. That's why linux has conquered pretty much everything except the desktop - and that's why MS is so hard to dislodge from the desktop, because they know that too ...

Cheers,
Wol

Would appreciate review of the language used

Posted Aug 15, 2020 15:38 UTC (Sat) by nivedita76 (subscriber, #121790) [Link]

George is not just a theoretician. I'll disclaim that I actually don't know him personally, but from his comments he comes across as someone who understands both the engineering and the theory. His position seems to be that it's not that hard to do the right thing anyway, so why bother with some half-baked code. He's certainly not proposing a "system that doesn't work".

And let's please not rehash the ext3 vs ext4 flamewar.

Would appreciate review of the language used

Posted Aug 15, 2020 6:34 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

So like a ranked list of who was more abusive?

Would appreciate review of the language used

Posted Aug 15, 2020 7:22 UTC (Sat) by ragnar (guest, #139237) [Link]

I don't think a review of one particular discussion is that useful. What would be very interesting though is an analysis of if there has been an improvement in the tone and language used since the adoption of the CoC. And perhaps even more importantly if there has been an improvement in how welcoming the community is to new contributors of all colors and shapes.

Would appreciate review of the language used

Posted Aug 31, 2020 12:29 UTC (Mon) by milesrout (subscriber, #126894) [Link]

This is why people were so upset about the imposition of a code of conduct in the first place: what was supposedly to be introduced to handle sexism, racism, extremely toxicity, extremely abusive language etc. was obviously going to be used to try to stop people from saying things like 'FFS'.

It's an internet mailing list. People say 'FFS' on internet mailing lists and have done since before I was born. Get used to it or get off the internet.

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 1:02 UTC (Fri) by robert.cohen@anu.edu.au (subscriber, #6281) [Link] (3 responses)

One part of the complaint was "the 32 bits applied to the PRNG come from the "fast pool", which can be thought of as a sort of queue of random bits waiting to be fed into /dev/random. These bits are not removed from the fast pool; they are still, eventually, used to add entropy to the primary random-number generator."

How hard would it be to remove the bits from the "fast pool" when they are added to prandom_u32. That way they wouldnt be added to both RNG's.

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 7:21 UTC (Fri) by kleptog (subscriber, #1183) [Link] (2 responses)

You can't really. The "fast pool" is 128 bits and any new information is "mixed" in, you can't really "unmix" it.

I also think the issue is way overblown. So 32-bits of this fast pool are used elsewhere. Even if they were completely exposed, it doesn't help you with the other 96-bits and they are eventually mixed even more before being fed to the actual pool.

If you require that all inputs to your random number generator be verifiably random then you have set your goals impossibly high as the article states. Entropy is a theoretical measure, there is no actual way to determine how random something actually is. You'd need to examine the entire multiverse and count the number of universes which are (apparently) identical except for those bits. A better approach seems to me to collect as much unpredictable data as you can and keep on mixing. Estimating the actual randomness is (IMHO) a fool's game.

Theoretical vs. practical cryptography in the kernel

Posted Aug 16, 2020 4:36 UTC (Sun) by rahvin (guest, #16953) [Link]

As pointed out in the article, even if you can access and read the whole pool of 128 bits you have to be know exactly when it's inserted in the RNG and time it all perfectly at each step. There's been plenty of news about how the RNG's input data can be determined using various methods (like the RNG data pulled from the network port) but you still have the base problem of knowing exactly when that data is put in the RNG and to what program which RNG bit's go and that just sounds like something that would be nearly impossible.

Theoretical vs. practical cryptography in the kernel

Posted Aug 31, 2020 11:52 UTC (Mon) by cpitrat (subscriber, #116459) [Link]

"You'd need to examine the entire multiverse and count the number of universes which are (apparently) identical except for those bits. "

Hold my beer ...

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 2:45 UTC (Fri) by flussence (guest, #85566) [Link] (22 responses)

How useful would a general RNG API be for cleaning this up? Call a function with some flags (secure/seedable/hardware), dial a minimum bandwidth and maximum read length, and the kernel either gives back some object that can satisfy those constraints or says no. The API user shouldn't have to care or know what algorithm they're using; when they do is usually when crypto stuff starts to go wrong.

(I guess this would sound more appealing if we didn't already have getrandom(2)…)

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 18:47 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

Trouble is, if the kernel says "no", how many applications will actually check for it?

If a flag says "wait until sufficient entropy is available" how many people will use it anyway, and then wonder why their system fails to boot ...

At the end of the day if programmers don't understand (or more likely bother to read) the documentation, ANY attempted fix is going to be a disaster.

Cheers,
Wol

Theoretical vs. practical cryptography in the kernel

Posted Aug 27, 2020 9:50 UTC (Thu) by Vipketsh (guest, #134480) [Link]

It's not so much a case of "people don't read documentation", after all "no" is not so spectacularly complicated.

The real issue is that there is no reasonable response a program can make when it is told "no". If you asked for random, you need that random to continue, there is no way around that. You can read or write as much documentation as you want, but the situation doesn't change.

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 19:49 UTC (Fri) by shemminger (subscriber, #5739) [Link] (19 responses)

This probably would not help because the main users of entropy on highly loaded system are TCP port generation and address space randomization.
Both could easily use up all the available entropy for other high need users.

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 21:04 UTC (Fri) by atnot (subscriber, #124910) [Link] (18 responses)

This does not make sense. Entropy does not "get used up". If the state of the csprng is unpredictable once, it will always be unpredictable (precluding any bugs that let you read the pool).

Theoretical vs. practical cryptography in the kernel

Posted Aug 14, 2020 22:38 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

Well, it depends.

If the PRNG is not very secure, as in this case, then it is possible for an attacker to work out the PRNG's state. Once a plausible adversary has the ability to determine the value of an input to the PRNG (i.e. its seed), then that seed is no longer unpredictable, and it is not unreasonable to call it "used up" in that case, particularly if the same seed has also been reused for other purposes under the assumption that it was unpredictable. However, this characterization is problematic, because it encourages the (wrong) assumption that generating N bits of random data "uses up" f(N) bits of input entropy for some f() - which is certainly not the case with a proper CSPRNG. This was the whole problem with the random(4) interface randomly blocking for (in most cases) no good reason.

Theoretical vs. practical cryptography in the kernel

Posted Aug 15, 2020 3:06 UTC (Sat) by Paf (subscriber, #91811) [Link] (16 responses)

And here we have theory vs practice again. There are a lot of users of randomness, many of them are themselves cryptographic. The vast majority of them would like to use “new” random bits that have not been used elsewhere, and they do not want them generated by a prng that was seeded once and left to run.

The idea of reseeding (/mixing in new entropy in to) a prng, by the way, is much like the idea of re-keying in cryptography. It resets the search space for anyone trying to crack it. But if only one bit was added...? So we *have* to try to measure entropy.

What if we made heavy use of a source that turns out to be predictable or mostly predictable? The effective state space of the prng seed is dramatically reduced.

You say if the state of a prng is not known it doesn’t get easier to crack as its use continues. Given this, why does anyone ever rekey in cryptography? There are in fact many cryptographic attacks that require large quantities of cypher text (obviously with the same key). So, you rekey a lot, whether or not you know such an attack exists. Same for the prng handling ‘random‘.

So, we *have* to try to measure/guess entropy of our inputs - attackers certainly will. And, sure, we can’t do it perfectly, but we can get a decent idea of how predictable things are in practice, and use that.

Theoretical vs. practical cryptography in the kernel

Posted Aug 15, 2020 13:55 UTC (Sat) by kleptog (subscriber, #1183) [Link] (5 responses)

> So, we *have* to try to measure/guess entropy of our inputs - attackers certainly will.

I don't see how this follows. Since entropy is only additive there is no reason not to simply include any and every piece of data that could possibly contain some entropy. For attackers it might be interesting to try and estimate the entropy, but for the system (Linux in this case) the only useful tactic is to be adding entropy faster than an attacker can constrain the search space.

Rekeying in data streams is indeed for resetting the search space. For a CSPRNG this means adding entropy equal to the size of the internal state. However, unlike in a data stream you don't actually need to know or care when this happens. There's no actual reason to be trying to estimate the entropy since it doesn't actually make a difference.

So throw in the block numbers of data being read from disk. Do they contain entropy? Who knows? Let a theoretical attacker worry about it. We've at least wasted their time having to think about it.

Theoretical vs. practical cryptography in the kernel

Posted Aug 15, 2020 18:38 UTC (Sat) by Wol (subscriber, #4433) [Link] (2 responses)

And if the "addition" algorithm is not transitive ( ie OldSeed + LBA1 + LBA2 != OldSeed + LBA2 + LBA1), then we've thrown yet another randomness into the system, namely what has the i/o elevator done to the order of writes. Seeing as, if done at a low-enough level, this depends on the location of the disk heads as the system tries to optimise read or write accesses, this will be *horribly* hard for an attacker to predict ... :-) It depends on SO many system-specific quirks (including what other programs happen to be accessing the disk at the time ...).

Cheers,
Wol

Theoretical vs. practical cryptography in the kernel

Posted Aug 16, 2020 5:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

Ever since Spectre, I've become increasingly wary of the "no reasonable attacker" line. It's the "no true Scotsman" of computer security. The reality is that attackers are a lot smarter than we often give them credit for. And setups are also a lot more predictable than we tend to assume. For example, take a server farm with 10k machines, all physically identical, plugged into the same network, etc., and then run 30k (untrusted) guest VMs on those 10k machines. Are you really, truly sure that the post-mixing entropy on one of those VMs will display no correlation whatsoever with the entropy of another? Of course, many companies actually use this business model (with different numbers), so this is not at all theoretical. Those companies, and their clients, want cryptographic security, not a handwavy "it's probably good enough" argument on LKML.

Theoretical vs. practical cryptography in the kernel

Posted Aug 26, 2020 20:04 UTC (Wed) by amarao (guest, #87073) [Link]

Each of those vm will use network. Each hypervisor host will use network. The difference on frame timings due to different cable length and port order is enough to cause change in the event order in the kernel. I really, really would like to see a server farm with trully identical timings with disregard of hardware topology and speed of light in copper.

Theoretical vs. practical cryptography in the kernel

Posted Aug 16, 2020 18:55 UTC (Sun) by nivedita76 (subscriber, #121790) [Link] (1 responses)

I think you're missing what the crypto guys are actually saying. They're not saying don't mix in low-entropy or unknown-entropy inputs at all. They're saying, don't do it too often, and don't reuse those bits.

The reason to reseed any PRNG, CS or not, is because you are concerned about an attacker using some means to exfiltrate the state of the PRNG (for a non-CSPRNG, this might be as easy as observing a moderate number of outputs, an LFSR just requires observing a number of outputs that is proportional to the state size, for example).

Given the scenario that reseeding is protecting against, adding small amounts of entropy at a time provides almost no additional security, and has two problems. Firstly, it effectively "wastes" that entropy: if you had waited to accumulate enough entropy and add it all at the same time, you could actually have gained security. Secondly, it actually reveals the bits you are mixing in to the attacker. If those bits are then also used later to seed a different PRNG, you've reduced the security of that PRNG as well, for no gain in security of the first one.

Theoretical vs. practical cryptography in the kernel

Posted Aug 24, 2020 14:16 UTC (Mon) by ecree (guest, #95790) [Link]

> adding small amounts of entropy at a time provides almost no additional security, and has two problems. Firstly, it effectively "wastes" that entropy: if you had waited to accumulate enough entropy and add it all at the same time, you could actually have gained security.

Except that this isn't actually true.

Viewing the attack from the information-theoretical perspective, there is _no difference_ between "mix in the entropy in dribs and drabs as you collect it" and "save it up until you can catastrophically re-seed": either you're adding entropy faster than it leaks (in which case you're fine) or you're leaking state faster than you add entropy (in which case you're screwed).

The difference only appears when you consider the amount of work required to brute-force-invert the mixing function _if_ state is leaking fast enough to allow the information-theoretic attack; partial-state-extension on frequent small re-seedings adds up to less total work than one big re-seed because the 2^k factor dominates.

Catastrophic re-seeding is thus important if your security depends on the cryptographic strength of your PRNG, but _not_ if your security depends on adding noise to the state faster than the attacker can obtain information to constrain that state (e.g. by observing outputs) — in which case cat reseed may actually _harm_ you by letting the attacker reach a greater partial state in the (longer) interval between re-seed events.

> for a non-CSPRNG, this might be as easy as observing a moderate number of outputs, an LFSR just requires observing a number of outputs that is proportional to the state size, for example

From the information-theoretical perspective, the same is true of any PRNG, CS or no, because you can always enumerate all possible states and see which one matches. Observing (on average) n consecutive outputs and doing O(n*2^n) work will crack an n-bit CSPRNG just fine. The difference with the LFSR is that there's a fairly easy 'cryptanalysis' that lets you do it with much less work than brute force.

One of the things that drove me crazy reading the original thread on netdev was the way some people (especially Spelvin) equivocated between the information-theoretical argument and the work argument. Tbh it looked like taking the Schneier cat-reseed paper and cargo-culting it into some kind of panacea-cum-universal-necessity.

Theoretical vs. practical cryptography in the kernel

Posted Aug 15, 2020 16:09 UTC (Sat) by imMute (guest, #96323) [Link] (9 responses)

>You say if the state of a prng is not known it doesn’t get easier to crack as its use continues. Given this, why does anyone ever rekey in cryptography?

One answer is damage mitigation. If the encryption key gets leaked, how much data is immediately available?

>What if we made heavy use of a source that turns out to be predictable or mostly predictable? The effective state space of the prng seed is dramatically reduced.

That's why you use more that one source of entropy. Suppose you don't trust RDRAND. I say mix it into the entropy pool anyway and just don't "credit" those bits. So far I've not seen anyone show that an untrusted entropy source is capable of *reducing* entropy of the pool.
It seems Linus is in a similar line of thinking - who cares if we can't *prove* that all these things are secure, throw a shitload of them into the mix and make the attackers job *extremely* difficult.

Theoretical vs. practical cryptography in the kernel

Posted Aug 15, 2020 23:18 UTC (Sat) by excors (subscriber, #95769) [Link]

> So far I've not seen anyone show that an untrusted entropy source is capable of *reducing* entropy of the pool.

There's the idea shown in "Prototyping an RDRAND Backdoor in Bochs" (https://archive.org/stream/pocorgtfo03#page/n17/mode/1up). If code does something like "entropy_pool ^= RDRAND()" to mix in the new randomness, a malicious implementation of RDRAND could return "entropy_pool ^ x" where x is a random-looking value that is known to the attacker, cancelling out all the previous entropy. To someone who doesn't know x, the output of RDRAND would still look perfectly random, and the CPU is otherwise following the architecture specification perfectly (which makes it harder for users to observe any malicious behaviour).

That's probably not very useful in an emulator, because a malicious emulator could leak the content of entropy_pool in plenty of other ways, and it's probably difficult in a physical CPU without affecting the timing or performance counters in a way that someone would eventually notice. But still, it means in principle RDRAND can reduce entropy without doing anything other than returning a carefully constructed value.

Theoretical vs. practical cryptography in the kernel

Posted Aug 16, 2020 16:23 UTC (Sun) by Paf (subscriber, #91811) [Link] (7 responses)

There’s still an issue here:
If you add a bunch of stuff that turns out to not be random, then you could make the state predictable if you didn’t get enough data with real randomness in. Linus is having an argument of degree here - this stuff looks pretty random, let’s add it in. He’s very much not suggesting you add ... whatever ... to the pool.

Cryptography is hard; the idea that we can add whatever to the pool forever is a dangerous one (the possibility of deliberately tainted inputs noted above, for example). I don’t think Linus is making that argument. He’s arguing about where the fence goes, not about removing it.

Theoretical vs. practical cryptography in the kernel

Posted Aug 16, 2020 16:49 UTC (Sun) by kleptog (subscriber, #1183) [Link] (1 responses)

> If you add a bunch of stuff that turns out to not be random, then you could make the state predictable if you didn’t get enough data with real randomness in.

Umm, I don't think that's how it works at all. Adding a gigabyte of zeros to the pool doesn't reduce the entropy in it at all. It doesn't add anything either, but it doesn't hurt.

Tainted inputs are an issue, but that just means to need to add as many sources as you can find, and you have to be careful how you add things. If any untainted source is included you have basically made it infeasible for the attacker.

Theoretical vs. practical cryptography in the kernel

Posted Aug 20, 2020 4:43 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> > If you add a bunch of stuff that turns out to not be random, then you could make the state predictable if you didn’t get enough data with real randomness in.

> Umm, I don't think that's how it works at all. Adding a gigabyte of zeros to the pool doesn't reduce the entropy in it at all. It doesn't add anything either, but it doesn't hurt.

The original quote is in fact true, but does not mean what you think it means. If you didn't get enough data with real randomness, then you could have a predictable state. This happens regardless of whether or not you also added a bunch of stuff that wasn't random.

The problem, therefore, is whether they should do something with the entropy accounting, not whether they should allow reusing the fast pool bits in this fashion at all.

You can in fact "add whatever to the pool forever"

Posted Aug 17, 2020 7:18 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (4 responses)

For example if you use an HMAC for your mixing primitive then an adversary doesn't benefit at all by spewing "a bunch of stuff that turns out not to be random" as input.

HMAC(oldState, "Predictable stuff") -> newState

Doesn't tell us anything about oldState or newState, even though we're quite certain what "Predictable stuff" is. Worse, if we're a traitor inside the machine we can't usefully influence newState despite knowing exactly what oldState was and having total control over "Predictable stuff".

You can in fact "add whatever to the pool forever"

Posted Sep 2, 2020 18:29 UTC (Wed) by tanriol (guest, #131322) [Link] (3 responses)

I'm pretty sure that, unless special attention is paid to make mixing always preserve all information from the original pool state, that's not the case.

In the HMAC example, before the "mixing" you had an N-bit pool in one of the 2**N possible states and when mixing every input state becomes an output state. However, as multiple input states can become a single output state, the final number of possible state decreases after this operation. This looks like the birthday paradox setting, so I'd guess that a series of known random mappings applied to the pool can reduce the number of possible states to sqrt(2**N) at most.

You can in fact "add whatever to the pool forever"

Posted Sep 8, 2020 11:31 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (2 responses)

Mmm. You can experiment with this by building a toy HMAC that uses a too small hash so that the numbers are tractable. My guess is that for non-trivial N this takes a very long time (lots of data you 100% control shoved in as "random") to not do very much at all to the pool.

I don't see why a method that exploits this for real (on an actual HMAC with an actual N-bit pool) wouldn't also work perfectly well to collide arbitrary hashes, which is not supposed to be practical. But maybe you can explain why it doesn't do that?

You can in fact "add whatever to the pool forever"

Posted Sep 8, 2020 12:11 UTC (Tue) by tanriol (guest, #131322) [Link] (1 responses)

Sorry, looks like I've been unclear. On the other hand, I'm not a cryptographer so you've been right to double-check for yourself :-)

In the original statement you said that spewing a bunch of something does not benefit the adversary *at all*. The way I understood that was that the N-bit pool still has N bits of entropy. That's obviously not the case because the number of possible states goes down, and quick experiments with a really small toy HMAC confirm that (repeatedly running it causes the entropy to reduce from the original 16 bits to 8 or so, just as expected from the birthday paradox style configuration). To avoid that you need your primitive to be an (input-driven, obviously) permutation in the state space so that the number of states stays the same. There are constructions like this - IIUC, SHA3 uses one, unlike MD5/SHA1/SHA2 - but you need to pay attention to this.

On the other hand, for collision attacks N/2 is already the baseline due to the birthday paradox, so by itself this effect does not provide any improvements over that - a hash resistant to collision attacks would also be resistant to this effect.

You can in fact "add whatever to the pool forever"

Posted Sep 9, 2020 10:00 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Ah, so you can get from N to N/2 (half the pool size) but no further? I guess I can see no intuitive objection to that and if you've done the experiment then it must be so. I agree that's significantly weaker than my original claim though.