Problems emerge for a unified /dev/*random
In mid-February, we reported on the plan to unite the two kernel devices that provide random numbers; /dev/urandom was to effectively just be another way to access the random numbers provided by /dev/random. That change made it as far as the mainline during the Linux 5.18 merge window, but it was quickly reverted when problems were found. It may be possible to do that unification someday, but, for now, there are environments that need their random numbers early on—without entropy or the "Linus jitter dance" being available on the platform.
A bunch of changes for the kernel random-number generator (RNG) were merged by Linus Torvalds on March 21. Those changes included unifying the two RNG devices, because it was hoped that no mainstream platforms would lack a source of unpredictable data that would allow the RNG pool to initialize in short order at boot time. For several years now, the jitter dance has used CPU execution time jitter to initialize the pool in less than a second; it uses the differences in code-execution speed of repetitive operations due to unpredictability in modern CPUs, from caches, branch prediction, and the like. But some systems lack jitter and have no other source of unpredictable data. That leads to the boot process hanging waiting for the RNG pool to initialize.
De-unification
Guenter Roeck reported
a problem the day after the code was merged. He saw "a
large number of qemu boot test failures for various architectures (arm,
m68k, microblaze, sparc32, xtensa are the ones I observed). Common
denominator is that boot hangs at 'Saving random seed:'
"
He bisected the problem to
the patch
that unified the RNG devices, and noted that reverting it fixes the
problems he found.
As would be expected, a user-space regression of that sort led Torvalds to
say
that he would revert the patch. The idea was good, but it "causes problems for
various platforms that can't do jitter entropy and have nothing else
happening either
". Jason A. Donenfeld, the author of the patch and
one of the kernel RNG maintainers, agreed
with that assessment.
Later that day, Torvalds reverted
the unification; "This isn't hugely unexpected - we tried it, it failed, so now we'll
revert it.
"
But Donenfeld was interested in finding
out more about the underlying problem and asked Roeck for information
about the QEMU virtual machines (VMs) used. If the unified-RNG-devices idea
is ever going to return, "understanding everything about why
the previous time failed might be a good idea
", Donenfeld said. He
poked around in one of the VM images and discovered the boot-time shell script that
was printing the message in question. It was using a stored random seed to
initialize /dev/urandom by writing to it, then reading from the
device to grab a new seed to store for the next boot.
There are some problems with that approach, however. The first is that writing to /dev/urandom will only mix the data into the pool; it does not credit any entropy, so the RNG subsystem still does not initialize. Because it does not initialize, and the changes merged (and since reverted) make /dev/urandom block until it is initialized, the boot process would hang when it tried to read the new seed. That was a clear user-space regression that is not going to be—was not—tolerated.
A more insidious problem, perhaps, is that a seed written to the RNG before
it initializes will not actually be used until after the pool is
initialized properly; "you might write in a perfectly good seed to
/dev/urandom, but what you read out for the subsequent seed may be
complete deterministic crap
", Donenfeld said.
The data that gets written to the device is not credited with any entropy
unless the RNDADDTOENTCNT ioctl() command is used, but
that also means the fast initialization pool is
not updated with the seed data, so the random numbers returned from the
non-blocking /dev/urandom before it is initialized are much worse.
That behavior has been true for
quite some time, he said, but it makes the "innocuous
pattern
" of writing a seed and reading a new one into a potentially
serious flaw; he thought he had a "quick unobtrusive
fix
" for that.
Torvalds said
that he hates the "no entropy means that we can't use it
" idea
that exists in the RNG; "It's a disease, I tell you.
" It is
the direct cause of that second problem, he said, continuing:
By all means the code can say "I can't credit this as entropy", but the fact that it then doesn't even mix it into the fast pool is just wrong, wrong, wrong.I think *that* is what we should fix. The fact is, urandom has long-standing semantics as "don't block", and that it shouldn't care about the (often completely insane) entropy crediting rules.
But that "don't care about entropy rules" should then also mean "oh, we'll mix things in even if we don't credit entropy".
Donenfeld agreed
with that view: "In general, your intuition is correct, I think, that the entropy
crediting scheme is sort of insane and leads to problems.
" He
pointed to his recent report on kernel RNG
changes, noting that it talks about other RNGs that could be used
(e.g. Fortuna), which
do not suffer from the same types of problems, so that "might be
something to look at seriously in the future
".
Another patch
Donenfeld posted a patch on March 22 to try to address the problem with reading poor-quality random numbers for seed files during early boot. He said that he had fixed a related problem in systemd, but that cleaner fix did not help the existing shell scripts.
So this patch fixes the issue by including /dev/urandom writes as part of the "fast init", but not crediting it as part of the fast init counter. This is more or less exactly what's already done for kernel-sourced entropy whose quality we don't know, when we use add_device_randomness(), which both contributes to the input pool and to the fast init key.
Torvalds wondered why reads from /dev/urandom did not simply use the initializing pool, since the data written to the device is already mixed into that. Donenfeld agreed that his approach was less-than-perfect, but that it is far better now that some changes he made for 5.18 are in place. He had a lengthy explanation on the goals of his changes and on the differences between the input pool and the fast init pool. The input pool is used to rekey the ChaCha cipher that actually provides the random bytes after initialization, while the fast init pool is used before the full 256 bits of entropy are gathered in the input pool. Until that entropy is gathered (from various kernel sources, a hardware RNG, entropy credited from user space, or from jitter), the input pool is not properly initialized, thus an alternative needs to be used:
The "pre init pool", the "fast init pool", the "super terrible but at least not zero pool", the "all bets are off pool", ... whatever you want to call it. Why a separate pool for pre init? Because the real input pool needs to accumulate 256 bits of entropy before it's safe to use.Your suggestion is to instead not have a separate pool, but perhaps just do separate accounting. That might work to some degree, but the devil is in the details, and that sounds a lot harder and messier to code.
He reiterated there may be some other longer-term solutions to consider,
but did not think Torvalds's suggestion would help much. Ted Ts'o,
the other kernel RNG maintainer, cautioned that
writing to /dev/urandom is not a privileged operation, so a
malicious user-space program could write specific values to it; that is the
reason why inputs to the device are not used "until there
is a chance for it to be mixed
in with other entropy which is hopefully not under the control of
malicious userspace
".
Now, I recognize that things are a bit special in early boot, and if we have a malicious script running in a systemd unit script, we might as well go home. But something to consider is whether we want to do [something] special if the process writing to /dev/[u]random has CAP_SYS_ADMIN, or some such.
Donenfeld said
that the input provided by writing to the RNG devices is
cryptographically hashed, so "we can haphazardly mix whatever any user
wants, without too much
concern
" as long as it is not given entropy credit. While crediting entropy when the writes to
/dev/urandom are done from a process with CAP_SYS_ADMIN
(or some other privilege) might be reasonable, there may be user-space code
that is expecting the current behavior. He noted that the problem he saw in the
shell scripts being used by Roeck's tests is not really a kernel bug:
[...] this has _always_ been broken, and those shell scripts have _always_ been vulnerable. Maybe the kernel should fix that, but due to the ambiguity of the /dev/urandom write interface, maybe the best fix is actually in userspace itself, which means it'd work on old kernels too (which are rather common for the embedded devices that tend to have those types of shell scripts).
Donenfeld pointed to his systemd fix and one he submitted
for Buildroot as a better approach.
The idea is that user space hashes the old seed with the data it receives
from reading the RNG device before storing it away for the next boot. That
way, even if the RNG is not initialized, "the amount of entropy in the new seed
will stay the same or get better, but not appreciably regress
".
RNDADDTOENTCNT
In his proposed patch, Donenfeld noted that the RNDADDTOENTCNT ioctl() command is a poor interface for crediting entropy, in part because of the separation of the pools during initialization. The RNDADDTOENTCNT command simply tells the kernel to credit the number of entropy bits that is passed in, but the write to the RNG device has already happened (without credit). Repeatedly writing small amounts of data to the RNG device and then crediting it would give an attacker a window to brute-force the exact data that was written from the fast init pool. If the write-credit cycle is done enough times to fully initialize the input pool, the attacker could then brute-force the initial state of the pool.
The RNDADDENTROPY command combines the write and the credit in the same call, which allows the kernel to make a better decision on what to do with it, he said. With RNDADDTOENTCNT, the kernel does not know whether the data is simply meant to perturb the pool, or whether it should be counted as truly unpredictable data, until after the data has already been processed. He suggested that deprecating RNDADDTOENTCNT might be a good plan; Ts'o concurred on that.
Alex Xu did some research on the uses of RNDADDTOENTCNT in existing code, which led Donenfeld to question his patch. There are, it seems, programs that do exactly what Donenfeld was worried about. For example, maxwell is a daemon to feed jitter entropy into the Linux RNG; Xu said it operates as follows:
sandy-harris/maxwell is a "jitter entropy" daemon, similar to haveged. It writes 4 bytes of "generated entropy" to /dev/random, then calls RNDADDTOENTCNT, then repeats.
Donenfeld replied: "Okay bingo. The existence of this means that this patch will
definitely introduce a new vulnerability.
" An attacker could
brute-force the data 32 bits at a time, so his patch would need to change,
he said. Part of the problem is that some early boot efforts to initialize the
RNG in user space are not actually doing so unless there is entropy being
credited during that process. David Laight said:
"You can't really expect startup scripts to be issuing ioctl
requests.
". But that is exactly what is required if there are no
other sources of entropy, as Donenfeld explained:
Crediting bits has required an ioctl since forever. Those shell scripts have been broken forever. The proposal here is to add new behavior to support those old broken shell scripts.Fortunately, it seems sort of fixable. But only sort of. There are also a lot of complications, as detailed above. Part of it is that some people use /dev/urandom writes expecting it not to credit, while others use that and then later manually credit it with the ioctl. Both of those in general seem like not a very good interface for seeding the rng. The correct interface to use is RNDADDENTROPY, which takes both the data and whether it should be credited, since then the kernel knows what the intentions are and can do something smart with it. Barring that knowledge, we're in this vague middle ground where it's unclear what the user intends to do.
There was some discussion of possible heuristics for when to credit entropy
for writes to the RNG devices. for example only during early boot or when the process has
certain privileges, but all of those are fraught. There are other cases
where crediting that entropy could be problematic or catastrophic.
Eric Biggers provided an example
of where things might go wrong. The Android system writes its kernel
command line to /dev/urandom early in the boot process with the
expectation that it will not count as entropy, "given that
the command line might not contain much entropy, or any at all
".
For those reasons, Donenfeld came to the conclusion that not changing the current behavior was the right thing to do.
Based on this, the fact that shell scripts cannot seed the RNG anyway, and due to the hazards in trying to retrofit some heuristics onto an interface that was never designed to work like this, I'm convinced at this point that the right course of action here is to leave this alone. There's no combination of /dev/urandom write hacks/heuristics that do the right thing without creating some big problem elsewhere. It just does not have the right semantics for it, and changing the existing semantics will break existing users.
His plan is to work with user-space utilities to make changes that reflect
the current reality, so that file-based seeding works well everywhere. To
that end he has created a simple SeedRNG program that is
intended to be incorporated into programs that need this kind of
functionality. If that work is successful, it "might lead to a better ecosystem and
less boot time blocking and all that jazz
".
It is clear that Donenfeld has injected some needed energy into the maintenance of the kernel RNG code. There is a lot of work going on in that area right now and, seemingly, more to come. For now, we are still facing that longtime kernel bugaboo, entropy gathering woes early in the boot process, but perhaps there is some light on the horizon in the form of other techniques that might improve the situation. In the meantime, reworking user space to properly use the facilities we do have looks like the right approach.
Posted Mar 29, 2022 22:02 UTC (Tue)
by jepler (subscriber, #105975)
[Link] (1 responses)
It appears that I (A) asked the kernel how much entropy was available via /proc/sys/kernel/random/entropy_avail, (B) if it was at least 3072, slept (max entropy was/is 4096?) (C) otherwise, read enough data from my device, wrote it to /dev/random (not urandom!), and credited it with ioctl RNDADDTOENTCNT
So in this case, it looks like big gulps of data were added by my random daemon. Hooray for good luck in a long ago decision.
In any case, I don't use the device anymore.
Posted Mar 30, 2022 12:39 UTC (Wed)
by ncm (guest, #165)
[Link]
Such devices might be more trustworthy than the "hardware RNG" instructions provided on processor cores, which we have seen demonstrated (in a recent AMD erratum) may be reliably turned off from microcode, and, we may presume, from an appropriate not-publicly-documented instruction sequence. Whether to try to defend against such an attack depends on your threat model, of course; if so, an unreliable source of random numbers might be the least of your problems. But defense in depth has rarely been a mistake.
Posted Mar 29, 2022 23:50 UTC (Tue)
by Hello71 (subscriber, #103412)
[Link] (2 responses)
Posted Mar 30, 2022 1:54 UTC (Wed)
by jake (editor, #205)
[Link] (1 responses)
the article doesn't ignore that, but it perhaps takes a while to get there; it talks about it starting around here:
> The RNDADDENTROPY command combines the write and the credit in the same call, which
jake
Posted Mar 30, 2022 13:41 UTC (Wed)
by Hello71 (subscriber, #103412)
[Link]
Posted Mar 30, 2022 17:35 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link]
Is there any hope of a sane reboot under sysfs?
Posted Mar 30, 2022 19:34 UTC (Wed)
by willmo (subscriber, #82093)
[Link]
Posted Mar 30, 2022 21:22 UTC (Wed)
by istenrot (subscriber, #69564)
[Link] (3 responses)
Posted Mar 30, 2022 21:46 UTC (Wed)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Mar 31, 2022 10:51 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Apr 1, 2022 0:50 UTC (Fri)
by zx2c4 (guest, #82519)
[Link]
Posted Mar 31, 2022 13:39 UTC (Thu)
by pj (subscriber, #4506)
[Link] (2 responses)
Posted Mar 31, 2022 15:48 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Apr 11, 2022 14:53 UTC (Mon)
by flussence (guest, #85566)
[Link]
Posted Mar 31, 2022 20:14 UTC (Thu)
by jhhaller (guest, #56103)
[Link]
Posted Mar 31, 2022 23:05 UTC (Thu)
by gerdesj (subscriber, #5446)
[Link]
A booting system is obviously not in the same state as it will be when it's fully initialised and running, so why insist on pseudo devices like random being a "one size fits all"?
VPNs and webservers for example are huge consumers of randomness and can happily wait until the system is up and running and firing on all four. They could really benefit from a relatively simple random that "knows" that suitable sources of entropy are available and cryptographic researchers aren't going to get sarcastic! They tend to be bloody complicated and any simplification would be a good thing: eg an assumption about the quality of random.
The things that need early random can use brandom and accept its limitations and work with them and then switch to random when it's available if they need to.
Another option might be to have one random device that describes how useful it thinks it is and leave the consumer to take action based on that. "Hi I'm Mike and here's a stream of gibberish-ish(3)". The road to Hell is paved with odd interfaces ...
Posted Apr 1, 2022 4:14 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link] (3 responses)
- device reset timings (slow devices such as UARTs do not always take an integral
- the device's configuration: very often you'll find a MAC or some device-specific
- other external persistent info (e.g. any RTC time value that varies between boots)
But often all these data are lost after the boot loader finishes initialization and transfers execution to the kernel. We'll *need* to standardize a solution for this, that boot loaders will have to use for future kernels if we want to improve the situation for such embedded devices. Otherwise they're too deterministic. I do remember that the SSH key I used to have on my old NSLU2 existed on at least 89 other devices connected to the net... This definitely shows that without early entropy there's little hope to collect more later and whatever we'll try to do can result in frustration. And while VMs are terrible for this, at least they can benefit from entropy being spoon-fed at boot by the hypervisor.
Posted Apr 1, 2022 9:07 UTC (Fri)
by wsy (subscriber, #121706)
[Link] (2 responses)
Posted Apr 1, 2022 10:06 UTC (Fri)
by nickleverton (subscriber, #81592)
[Link] (1 responses)
Posted Apr 1, 2022 12:07 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link]
For the PC world, grub might be too late due to the BIOS often doing most of the cleanup. However on PCs there's often a video card whose memory is not reset and which contains garbage. I think it already happened to all of us to power-cycle a PC, then discover a fantom image of previous session for a fraction of a second when typing "startx" because that memory wasn't completely lost yet. And most PCs have hardware RNGs and jitter entropy anyway ;-)
Posted Apr 7, 2022 23:55 UTC (Thu)
by cypherpunks2 (guest, #152408)
[Link]
This has been noticed on StackExchange it seems https://security.stackexchange.com/questions/183506/rando...
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
> allows the kernel to make a better decision on what do with it, he said.
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
And it would even be the same for every distro kernel (minus the mac, whatever mixing).
Problems emerge for a unified /dev/*random
From this commit message:
Problems emerge for a unified /dev/*random
Additionally, the plugin can pre-initialize arrays with build-time
random contents, so that two different kernel builds running on identical
hardware will not have the same starting values.
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
- pre-init device contents (uninitialized RAM usually contains noise, except in VMs);
output GPIOs may also read noise before they're configured as outputs. UARTs
often read a first crappy byte. Many chips also include a "reset cause" register
that indicates power-on, reset, exception, etc
number of cycles to reset; RTC's second transition also solely depends on when
the device was booted, relative to the current second.
WiFi calibration data stored in a special area on the flash, that may differ from
device to device. When running in a VM, some arguments may come from other
means.
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random
Problems emerge for a unified /dev/*random