Random numbers and virtual-machine forks
The kernel's RNG maintains an "entropy pool" from which random numbers are derived. As randomness from the environment is harvested, it is mixed into the pool, keeping the level of entropy up. Every running system has its own pool, naturally, with its own internal state. If two systems were to somehow end up with their entropy pools containing the same data, they would produce the same sequence of random numbers, for a while at least. That is something that should never happen.
But, as Donenfeld pointed out in a patch set first released in February, there is a way that two systems can end up with the same entropy-pool content. If a running virtual machine is somehow duplicated, the entropy pool will be duplicated with it. This can happen if a machine is checkpointed and restored, or if it forks for any reason. Once the restored or forked machine starts running, it will reproduce the sequence of random data created by the previous instance until the addition of new entropy perturbs the pool.
Microsoft, it seems, has already addressed this concern in Windows; the solution takes the form of a "virtual-machine ID" made available via the ACPI firmware. When a virtual machine forks or is restarted, the ID is changed. The kernel can watch this value and, on noticing that it has changed, conclude that some sort of virtual-machine fork has occurred and that action is necessary to keep the random-data stream unique. Some virtualization systems, including QEMU, have implemented this functionality, so it makes sense for Linux to make use of it as well.
The patch set thus adds a new "vmgenid" driver that makes the virtual-machine ID available to the kernel. When this driver is notified (by the ACPI firmware) of a change, it checks the ID and, if that has changed, calls a new function (crng_vm_fork_inject()) to inform the RNG that it needs to muddy up the entropy pool. This is done by mixing in that same virtual-machine ID. It is not claimed to be the ultimate in security, but it does address the immediate problem; Donenfeld intends to merge this work for 5.18.
This project does not stop there, though; in a later email, Donenfeld described the
changes he would like to make next. He started by complaining about the
design of Microsoft's solution, which has a race condition designed into
it. The kernel cannot respond to a virtual-machine fork until it notices
that the generation ID has changed; the new virtual machine could run for
some time before that happens, and it will generate duplicate random
numbers during that time. It would have been better, he said, to provide a
simple "generation counter" that could be quickly polled by the CPU every
time random data is requested; that would allow a change to be caught
immediately. "But that's not what we have, because Microsoft didn't
collaborate with anybody on this, and now it's implemented in several
hypervisors
".
Having gotten that off his chest, he proceeded to the real task at hand: propagating the news about a virtual-machine fork to other interested kernel subsystems. He originally envisioned creating a new power-management event to serve as a sort of notifier, but that was seen as an awkward fit; a virtual-machine fork isn't really related to power management in the end. So Donenfeld posted a new solution creating a separate notifier (using the kernel's existing notifier mechanism) to inform subsystems. The first user is, unsurprisingly, the WireGuard VPN, which needs to know about such events:
When a virtual machine forks, it's important that WireGuard clear existing sessions so that different plaintext is not transmitted using the same key+nonce, which can result in catastrophic cryptographic failure.
User-space code may benefit from knowing about virtual-machine forks as well; for example, the Samba server may want to reset sessions in that situation. For user space, Donenfeld proposes adding a new virtual file that programs can poll for VM-fork notifications. This file is currently located in /proc/sys, even though it is not a true sysctl knob in that it cannot be written to as a way of tuning system behavior.
Response to this work has been positive, overall; kernel developers tend to
have little appetite for catastrophic cryptographic failure. That said,
Greg Kroah-Hartman did observe: "It
seems crazy that the 'we just were spawned as a new vm' notifier is based
in the random driver, but sure, put it here for now!
" So this work,
too, seems destined for merging for the 5.18 kernel release. That should
help to close a vulnerability that many of us may not have ever been aware
existed.
Index entries for this article | |
---|---|
Kernel | Random numbers |
Kernel | Security/Random number generation |
Kernel | Virtualization |
Posted Mar 11, 2022 18:43 UTC (Fri)
by bof (subscriber, #110741)
[Link] (2 responses)
I mean, duplicate MACs? Duplicate IPs? Huh? What do I miss?
Posted Mar 11, 2022 21:14 UTC (Fri)
by josh (subscriber, #17465)
[Link]
Fast per-client VMs: boot your VM to the point where it's listening for connections, then fork it.
You don't need to worry about IPs or MAC addresses if your VMM is handling routing and giving your VM an internal-use-only IP.
Posted Mar 12, 2022 0:58 UTC (Sat)
by iabervon (subscriber, #722)
[Link]
There are various applications for checkpoint+restore+restore+..., where the copies are mostly not contemporaneous; it's really nice for being able to produce test environments that aren't full of the remains of previous testing, for example, but you generally don't need to have your pristine copy running all the time and being forked into new copies while live.
Posted Mar 12, 2022 0:00 UTC (Sat)
by ejr (subscriber, #51652)
[Link] (12 responses)
There are "true" random numbers. Defining that (to me) is as difficult as defining true.
There are "cryptographic" random numbers. Far, far outside my bailiwick.
There are "sufficient" but reproducible random numbers. A climate simulation may depend on random numbers for everything from error bounds to intermediate perturbations. If you cannot reproduce the results across a variety of hardware, using those results in international negotiations is, um, tricky. Similar for software debugging where it is my problem.
And certainly there are more.
Randomness is our construct. In our views of nature, sometimes random becomes a pattern. The issue in my uneducated opinion is defining what "random" means in each context. I work with randomized algorithms, so I understand *a few* of those contexts. It ain't easy.
Posted Mar 12, 2022 1:14 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link]
Here's my attempt: A number X is "truly" random if:
1. We know that X follows some specific distribution. For example, we know that "the number of radioactive decay events in an hour" (or a minute, or a second) follows a Poisson distribution with the lambda parameter determined by the quantity of the radioactive element and its half-life (as well as the length of the period sampled). Similarly, "the number of heads when you flip ten fair coins" follows a binomial distribution with p=0.5 and n=10.
> There are "cryptographic" random numbers. Far, far outside my bailiwick.
To a rough approximation, "cryptographically secure" means "close enough to truly random that it's unrealistic an attacker will be able to predict the value in practice." Depending on the application, there may be an allowance for "the attacker can make limited predictions, but they are not accurate enough to be useful for real, plausible cryptanalytic attacks against a particular algorithm." For example, AES-256 provides 256-bit security, meaning that a naive brute-force attack would require trying 2^256 possible keys. If the key was generated in such a way that the attacker can imperfectly predict 2 bits of it, then it still has at least 254-bit security, and 2^254 is still stupidly huge. In practice it's more complicated than that, because you don't actually do a brute force attack, instead you do various more complicated attacks which I don't fully understand, and those don't require the full 2^256 attempts in the first place. But that's OK, because even a much "smaller" number like 2^100 is still too big for a practical attack.
Posted Mar 12, 2022 7:02 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (2 responses)
That's why in the end it's important to combine as many dependencies as possible. Even if some are observable by some, as long as you mix as many different ones as possible, you progressively remove the ability for any observer to observe the whole model.
In the situation here, I think that jitter entropy and general system activity will quickly make the pools diverge. Probably that after a few seconds to minutes the two systems will be completely independent. But for the short initial time one could be vulnerable to the analysis performed on the other one, especially if the target hardware is known and allows the observer to try to mimmick it very closely and limit the divergence.
Thus overall Jason's work on this is definitely useful.
Posted Mar 14, 2022 14:31 UTC (Mon)
by plugwash (subscriber, #29694)
[Link]
One immediate question that springs to mind is "what about userland"?
As you say the kernel RNGs will likely diverge quickly, so the window of opportunity for failure is fairly narrow. OTOH RNGs in userland that are rarely or never re-seeded and are used in a predictable way may remain in sync for much longer.
Crypto libraries often already have code in place to defend against fork calls, but presumably cloning the whole VM would go unnoticed by such countermeasures.
Posted Apr 2, 2022 7:05 UTC (Sat)
by sammythesnake (guest, #17693)
[Link]
What you're describing here is what's called a "hidden variable model" in which there are unknown substrates below the observed behaviour that could (in principle) become known, modelled, and predicted (you'd still have the challenge of making suitable observations in the face of engineering limitations and the Heisenberg uncertainty principle of course...)
I understand that there are proofs for various quantum phenomena that preclude the possibility of such hidden variables. Off the top of my head, I'm not 100% certain that nuclear decay is one of the phenomena covered by these proofs, but I think it is.
I once tried to follow a description of such a proof, but ended up glassy eyed and needing a nap...
Posted Mar 12, 2022 20:47 UTC (Sat)
by ballombe (subscriber, #9523)
[Link] (7 responses)
2/ hardware RNG: each bit is given by a physical process we expect to be random (slow)
3/ Pseudo-RNG (PRNG): deterministic state machine that produce output that look statically like an ideal RNG for batchs of length at most something (e.g 2^32) (usually fast)
4/ Cryptographicaly-secure PRNG: a PRNG where it is believed it is computationnaly impossible to recover the state from a batch of outputs. (Note that If you have unlimited computational power, you can break it by trying all possible state).
5/ Backdoored PRNG: as 4, but there is a key that allow to recover the state from a medium-sized batch of output.
6/ Mixed RNG: mix of 2/ and 3/ as used by the kernel. A physical process is used to change the state of the PRNG from time to time. Compromise between speed, security and hardware integrity.
Posted Mar 13, 2022 5:04 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
Posted Mar 13, 2022 9:06 UTC (Sun)
by atnot (subscriber, #124910)
[Link] (3 responses)
* I am not a cryptographer, this is not cryptography advice
Posted Mar 13, 2022 22:39 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
1. We want each individual output to be as unpredictable as possible. That means it should contain the maximum possible amount of entropy.
Obviously, you can't satisfy both goals, so you have to compromise on one or the other, or else you have to use something other than the PRNG's internal state (e.g. RDRAND, jitter, etc.) to increase the entropy of each individual output from a non-PRNG-derived source. Now, I'm assuming that the kernel does use such a non-PRNG source, but that's mathematically equivalent to expanding the entropy pool to include other factors outside the control of the PRNG itself - so you still run into the same "we're running out of entropy" problem as before. The net amount of entropy entering the entire system as a whole must equal or exceed the net amount of entropy leaving the system via PRNG outputs.
Posted Mar 14, 2022 11:20 UTC (Mon)
by atnot (subscriber, #124910)
[Link]
Cryptographic hash functions are designed such that if you have a hash function of size n you need on the order of 2^n inputs to determine any bit of the plaintext with odds >50%. For BLAKE-256 the best known attack is 2^232, which is the number I was citing. So while it is mathematically true that entropy is leaving the pool through the hash function, cryptography means that entropy is infeasible to recover. Every bit is so heavily entangled with every other bit that you get no information until you've broken effectively the entire hash.
If you like, although it's not completely accurate, you can think of it like the bits of information gained from the output of the cryptographic hash function being smeared very thin across it's layers and layers of hidden inputs and outputs, never thick enough to get over 50% at the output until you collect a truly extraordinary number of them.
Another way to think about it is via encryption. If you encrypt 16 bytes with a key and the output is 16 bytes, that output must appear random to an outside observer, otherwise the algorithm would leak information. If you used that to encrypt the pool with itself and handed that out, would you be removing 16 bytes from the pool? The bits are correlated, sure, but what does that mean you if you can not determine the correlation? Tracking entropy "movement" isn't really a useful concept at that point.
Posted Mar 14, 2022 11:23 UTC (Mon)
by cesarb (subscriber, #6266)
[Link]
The trick is that, even though you might have in theory exhausted the entropy of the system, the only known way to take advantage of it is a brute force search. With a large enough state (around a couple hundred bits), brute force search becomes infeasible.
Posted Mar 14, 2022 13:25 UTC (Mon)
by kleptog (subscriber, #1183)
[Link]
Ideal RNGs are unpredictable, no matter what information you have. That's why cryptographers love them: the proofs become very simple.
CS-PRNGs are intended to be totally predicable if you know the internal state, but not if you just know some of the output.
So in the case of forked VMs, supposed you started one VM up and got a bunch of random numbers. Then you start a second fork. If you are given the two VMIDs, plus the first stream of random numbers, can you predict the random numbers for the new VM? In principle: sure. But you need to think about the computing power required. And you need to decide on a computing model: does it matter if you have a quantum computer?
Like the discussions about mixing in timing information. Maybe it's predictable, but it matters only if the attacker knows what's being mixed in. So you're better off just mixing everything in and soon as there's one source that the attacker doesn't know exactly, you've already made it computationally impossible to reverse.
Posted Mar 14, 2022 20:19 UTC (Mon)
by wittenberg (subscriber, #4473)
[Link]
We do. You can buy off-the-shelf random number generators which depend on quantum processes for their randomness. The first google hit for one is: https://www.quintessencelabs.com/products/qstream-quantum... which claims to offer 1 Gb/sec (after whitening, 8 Gb/sec before whitening). If quantum processes are not random, we have a huge hole in our understanding of the physics which underlies chips. Possible, but unlikely.
I don't know about this particular device, but last I checked, similar devices cost about $5000 (US).
--David
Posted Mar 12, 2022 4:25 UTC (Sat)
by Otus (subscriber, #67685)
[Link] (4 responses)
Sure, there are CPUs that don't support it, but how many of them are being used for VM cloning? Seems to me that just mixing in the cycle counter in that case would solve things.
Posted Mar 13, 2022 22:32 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Posted Mar 14, 2022 6:06 UTC (Mon)
by Otus (subscriber, #67685)
[Link] (2 responses)
Posted Mar 14, 2022 8:43 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
All it takes is an application (which doesn't know) to access RDRAND directly, and you're in trouble.
The article gives the example of Wireguard, which when it hits one of these, locks the system HARD. Actually, that could possibly be behind why my system locks up every now and then ...
Cheers,
Posted Mar 14, 2022 16:11 UTC (Mon)
by Otus (subscriber, #67685)
[Link]
I do wonder if the latter should be something more generic (not tied to vmid), since I can easily imagine other cases where you might want to tell everyone to reseed. For example, if you are using something like systemd-random-seed.service to feed entropy you trust more than the kernel's entropy collection during boot.
But anyway, that's just academical, clearly something like this is required. Thanks.
Posted Mar 12, 2022 5:58 UTC (Sat)
by mtthu (subscriber, #123091)
[Link]
Posted Mar 12, 2022 16:45 UTC (Sat)
by rwmj (subscriber, #5474)
[Link]
Posted Mar 17, 2022 7:42 UTC (Thu)
by gasche (subscriber, #74946)
[Link]
RNG algorithm designers have devised "splittable" generators, which make it possible to separate a single RNG state into two states that look independent. Our solution was to change our PRNG algorithm to use a "splittable" one (we chose LXM: https://dl.acm.org/doi/abs/10.1145/3485525 ). On `spawn`, the parent thread "splits" its PRNG state, keeps one state and passes the other to the child.
This has the very good property that the if threads are created (and create threads themselves etc.) and consume random numbers without communication/synchronizing with each other, then the PRNG states and random numbers produced are deterministic -- for a given tree of spawns and draws, the results depend only on the initial PRNG state of the ancestor thread. So, for example, debugging strategies involving fixing a random seed at program startup for reproducibility still work in a multi-threaded scenario. (In contrast, if you initialize new threads PRNG states by re-seeding, or if you maintain a single PRNG state protected by a mutex, the results will depend on reseeding input or the non-deterministic scheduling of the threads with respect to each other.)
Of course, this works because we have full control of the OCaml runtime: we can add specific code when a new thread is spawned, that is run on the parent at spawn time. In the situations of VMs as described, this would correspond to notifying the VM that it is about to be cloned, and giving it the ability to change its own state and also decide parts of the new clone state. There is no mechanism providing this amount of control, and working around this is the main point of the article. Unfortunately, the vmgenid approach means that the child will find out after the fact that it was cloned, but the parent is never told, so the "splitting" approach is not applicable -- the splitting operation modifies both the parent and the child PRNG state.
Some PRNG generators support a "jump" primitive where the PRNG state advances by many iterations (for example 2^128) at once. Upon finding out that it is a cloned child, a VM could "jump" by a fixed amount. Unfortunately, this does not solve the problem of guaranteeing independence of PRNG states. It works if you clone a VM (1 jump), then clone the cloned VM again (2 jumps), etc., but it does not work if you clone the original VM several time: all copies will have jumped the same amount, so they end up in the same state. With this little amount of control over the cloning protocol, it looks like reseeding is the only option.
Posted May 23, 2022 8:29 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
https://docs.microsoft.com/en-us/windows-server/identity/...
Posted Jan 12, 2023 4:22 UTC (Thu)
by kmeyer (subscriber, #50720)
[Link]
[1]: https://github.com/freebsd/freebsd-src/commit/ffac39deae0...
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
2. We know that CDF(X) is not correlated with any quantity that can be physically measured, other than X itself. For radioactive decay events, this is believed to be true under our current theory of physics. For coin flips, it may be correlated with the initial orientation of the coin as well as other physical factors, but those are certainly very *hard* to measure.
3. Nobody knows the value of X (yet).
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
They are:
1/ ideal RNG: each bit is an independent random variable. We have no way to build that.
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
2. We want each individual output to provide as little information about the PRNG's internal state as possible. That means it should contain the *minimum* possible amount of entropy.
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Wol
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks
Random numbers and virtual-machine forks