|
|
Log in / Subscribe / Register

Actually still confused

Actually still confused

Posted May 1, 2015 10:10 UTC (Fri) by tpo (subscriber, #25713)
In reply to: Actually still confused by fandingo
Parent article: Random numbers from CPU execution time jitter

So I've been reading that paper [1] and think it is neither particularly clear nor precise. But my opinions about that paper are besides the issue that I'm interested in, which is how the random generator actually works.

The one thing that has become clearer to me - thank you! - is that there exists a mechanism to add input data to the entropy pool, which has the property of not reducing the existing entropy in the pool no matter what the entropy quality of the new input data is. I've not verified that claim, but assume it true, it being a long standing mathematical finding. That's good news to me.

However you write:

> There is an entropy pool of data that is filled immediately and always stays full. Over time, this pool has new random data from a variety of sources mixed into it. As data is mixed in, the kernel estimates how much entropy it thinks is now in the pool and sets a counter appropriately. In the background, there is a kernel thread that checks a different output pool. If the pool isn't full, f(epool) is run to populate the output pool.

I think the contentious claim here is "the entropy pool ... always stays full". If you mean "stays full" in the sense of "a stack that never gets an element popped out from it" then I agree with that, since the pool is a fixed size structure, that, even if it were "empty" still contains "something" even if its "only all zeros". However that is not what is relevant in this discussion. The relevant thing is that by generating random data from that pool you transfer entropy out of the entropy pool. I quote the paper:

"When k bytes need to be generated, ... k output bytes are generated
from this pool and the entropy counter is decreased by k bytes."

Thus if we measure "fullness" by the here relevant metric of "amount of entropy" contained in the entropy pool, then the pool is *not* always full and in fact sometimes even empty as in the case where you have ssh-keygen pulling random data out of /dev/random and blocking because the kernel is unable to refill the entropy pool from its entropy sources.

All this said, the above is only my understanding acquired by reading what I have been referred to and what I could find. My understanding may well still be insufficient and wrong. If you've put up enough with an ignorant of my likeness then I can fully understand that. Otherwise I'll be happy to hear more and try to improve my understanding of the matter.

Thanks,
*t

[1] https://eprint.iacr.org/2012/251.pdf


to post comments

Actually still confused

Posted May 1, 2015 13:52 UTC (Fri) by cesarb (subscriber, #6266) [Link]

Think of the pool's entropy as a "measure of unpredictability". If the entropy is ten bits, for instance, you'd need at most 1024 guesses to find the pool state.

You should think of the random generator as having two separate and independent parts: the pool itself and the output function.

Inputs are mixed into the pool using a cryptographic function which takes two values: the previous pool state and the input value, and outputs the new pool state. This function thoroughly mixes its inputs, such that a one-bit difference on any of them would result on average to half of the output bits changing, and other than by guessing it's not possible to get from the output back to the inputs.

Suppose you start with the pool containing only zeros. You add to it an input containing one bit of entropy. Around half of the pool bits will flip, and you can't easily reverse the function to get back the input, but since it's only one bit of entropy you can make two guesses and find one which matches the new pool state. Each new bit of entropy added doubles the number of guesses you need to make; but due to the pigeonhole principle, you can't have more entropy than the number of bits in the pool.

To read from the pool, you use the second part, the output function. It again is a cryptographic function which takes the whole pool as its input, mixes it together, and outputs a number. Other than by guessing, it's not possible to get from this output to the pool state.

Now let's go back to the one-bit example. The pool started with zero entropy (a fixed initial state), and got one bit of entropy added. It can now be in one of two possible states. It goes through the output function, which prevents one reading the output from getting back to the pool state. However, since there were only two possible states (one bit of entropy), you can try both and see which one would generate the output you got... and now the pool state is completely predictable, that is, it now has zero bits of entropy again! By reading from the pool, even with the protection of the output function, you reduced its entropy. Not only that, but there were only two possible outputs, so the output itself had only one bit of entropy, no matter how many bits you had asked for.

Now if you read a 32-bit number from a pool with 33 bits of entropy, you can make many guesses and find out a possible pool state. However, again due to the pigeonhole principle, there's on average two possible pool states which will generate the same 32-bit output. Two pool states = one bit. So by reading 32 bits, you reduced the remaining entropy to one bit.

This is important because if you can predict the pool state, you can predict what it will output next, which is obviously bad.

----

Now, why isn't the situation that bad in practice? First, the input entropy counter tends to underestimate the entropy being added (by design, since it's better to underestimate than to overestimate). Second, "by guessing" can take a long enough time to be impractical. Suppose you have a 1024-bit pool, and read 1023 bits from it. In theory, the pool state should be almost completely predictable: there should be only two possible states. In practice, you would have to do more than 2^1000 guesses (a ridiculously large number) before you could actually make it that predictable.

However, that only applies after the pool got unpredictable enough. That's why the new getrandom() system call (which you should use instead of reading /dev/random or /dev/urandom) will always block (or return failure in non-blocking mode) until the pool has gotten enough entropy at least once.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds