Problems emerge for a unified /dev/*random

By Jake Edge
March 29, 2022

In mid-February, we reported on the plan to unite the two kernel devices that provide random numbers; /dev/urandom was to effectively just be another way to access the random numbers provided by /dev/random. That change made it as far as the mainline during the Linux 5.18 merge window, but it was quickly reverted when problems were found. It may be possible to do that unification someday, but, for now, there are environments that need their random numbers early on—without entropy or the "Linus jitter dance" being available on the platform.

A bunch of changes for the kernel random-number generator (RNG) were merged by Linus Torvalds on March 21. Those changes included unifying the two RNG devices, because it was hoped that no mainstream platforms would lack a source of unpredictable data that would allow the RNG pool to initialize in short order at boot time. For several years now, the jitter dance has used CPU execution time jitter to initialize the pool in less than a second; it uses the differences in code-execution speed of repetitive operations due to unpredictability in modern CPUs, from caches, branch prediction, and the like. But some systems lack jitter and have no other source of unpredictable data. That leads to the boot process hanging waiting for the RNG pool to initialize.

De-unification

Guenter Roeck reported a problem the day after the code was merged. He saw "a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at 'Saving random seed:'" He bisected the problem to the patch that unified the RNG devices, and noted that reverting it fixes the problems he found. As would be expected, a user-space regression of that sort led Torvalds to say that he would revert the patch. The idea was good, but it "causes problems for various platforms that can't do jitter entropy and have nothing else happening either". Jason A. Donenfeld, the author of the patch and one of the kernel RNG maintainers, agreed with that assessment. Later that day, Torvalds reverted the unification; "This isn't hugely unexpected - we tried it, it failed, so now we'll revert it."

But Donenfeld was interested in finding out more about the underlying problem and asked Roeck for information about the QEMU virtual machines (VMs) used. If the unified-RNG-devices idea is ever going to return, "understanding everything about why the previous time failed might be a good idea", Donenfeld said. He poked around in one of the VM images and discovered the boot-time shell script that was printing the message in question. It was using a stored random seed to initialize /dev/urandom by writing to it, then reading from the device to grab a new seed to store for the next boot.

There are some problems with that approach, however. The first is that writing to /dev/urandom will only mix the data into the pool; it does not credit any entropy, so the RNG subsystem still does not initialize. Because it does not initialize, and the changes merged (and since reverted) make /dev/urandom block until it is initialized, the boot process would hang when it tried to read the new seed. That was a clear user-space regression that is not going to be—was not—tolerated.

A more insidious problem, perhaps, is that a seed written to the RNG before it initializes will not actually be used until after the pool is initialized properly; "you might write in a perfectly good seed to /dev/urandom, but what you read out for the subsequent seed may be complete deterministic crap", Donenfeld said. The data that gets written to the device is not credited with any entropy unless the RNDADDTOENTCNT ioctl() command is used, but that also means the fast initialization pool is not updated with the seed data, so the random numbers returned from the non-blocking /dev/urandom before it is initialized are much worse. That behavior has been true for quite some time, he said, but it makes the "innocuous pattern" of writing a seed and reading a new one into a potentially serious flaw; he thought he had a "quick unobtrusive fix" for that.

Torvalds said that he hates the "no entropy means that we can't use it" idea that exists in the RNG; "It's a disease, I tell you." It is the direct cause of that second problem, he said, continuing:

By all means the code can say "I can't credit this as entropy", but the fact that it then doesn't even mix it into the fast pool is just wrong, wrong, wrong.
I think *that* is what we should fix. The fact is, urandom has long-standing semantics as "don't block", and that it shouldn't care about the (often completely insane) entropy crediting rules.
But that "don't care about entropy rules" should then also mean "oh, we'll mix things in even if we don't credit entropy".

Donenfeld agreed with that view: "In general, your intuition is correct, I think, that the entropy crediting scheme is sort of insane and leads to problems." He pointed to his recent report on kernel RNG changes, noting that it talks about other RNGs that could be used (e.g. Fortuna), which do not suffer from the same types of problems, so that "might be something to look at seriously in the future".

Another patch

Donenfeld posted a patch on March 22 to try to address the problem with reading poor-quality random numbers for seed files during early boot. He said that he had fixed a related problem in systemd, but that cleaner fix did not help the existing shell scripts.

So this patch fixes the issue by including /dev/urandom writes as part of the "fast init", but not crediting it as part of the fast init counter. This is more or less exactly what's already done for kernel-sourced entropy whose quality we don't know, when we use add_device_randomness(), which both contributes to the input pool and to the fast init key.

Torvalds wondered why reads from /dev/urandom did not simply use the initializing pool, since the data written to the device is already mixed into that. Donenfeld agreed that his approach was less-than-perfect, but that it is far better now that some changes he made for 5.18 are in place. He had a lengthy explanation on the goals of his changes and on the differences between the input pool and the fast init pool. The input pool is used to rekey the ChaCha cipher that actually provides the random bytes after initialization, while the fast init pool is used before the full 256 bits of entropy are gathered in the input pool. Until that entropy is gathered (from various kernel sources, a hardware RNG, entropy credited from user space, or from jitter), the input pool is not properly initialized, thus an alternative needs to be used:

The "pre init pool", the "fast init pool", the "super terrible but at least not zero pool", the "all bets are off pool", ... whatever you want to call it. Why a separate pool for pre init? Because the real input pool needs to accumulate 256 bits of entropy before it's safe to use.
Your suggestion is to instead not have a separate pool, but perhaps just do separate accounting. That might work to some degree, but the devil is in the details, and that sounds a lot harder and messier to code.

He reiterated there may be some other longer-term solutions to consider, but did not think Torvalds's suggestion would help much. Ted Ts'o, the other kernel RNG maintainer, cautioned that writing to /dev/urandom is not a privileged operation, so a malicious user-space program could write specific values to it; that is the reason why inputs to the device are not used "until there is a chance for it to be mixed in with other entropy which is hopefully not under the control of malicious userspace".

Now, I recognize that things are a bit special in early boot, and if we have a malicious script running in a systemd unit script, we might as well go home. But something to consider is whether we want to do [something] special if the process writing to /dev/[u]random has CAP_SYS_ADMIN, or some such.

Donenfeld said that the input provided by writing to the RNG devices is cryptographically hashed, so "we can haphazardly mix whatever any user wants, without too much concern" as long as it is not given entropy credit. While crediting entropy when the writes to /dev/urandom are done from a process with CAP_SYS_ADMIN (or some other privilege) might be reasonable, there may be user-space code that is expecting the current behavior. He noted that the problem he saw in the shell scripts being used by Roeck's tests is not really a kernel bug:

[...] this has _always_ been broken, and those shell scripts have _always_ been vulnerable. Maybe the kernel should fix that, but due to the ambiguity of the /dev/urandom write interface, maybe the best fix is actually in userspace itself, which means it'd work on old kernels too (which are rather common for the embedded devices that tend to have those types of shell scripts).

Donenfeld pointed to his systemd fix and one he submitted for Buildroot as a better approach. The idea is that user space hashes the old seed with the data it receives from reading the RNG device before storing it away for the next boot. That way, even if the RNG is not initialized, "the amount of entropy in the new seed will stay the same or get better, but not appreciably regress".

`RNDADDTOENTCNT`

In his proposed patch, Donenfeld noted that the RNDADDTOENTCNT ioctl() command is a poor interface for crediting entropy, in part because of the separation of the pools during initialization. The RNDADDTOENTCNT command simply tells the kernel to credit the number of entropy bits that is passed in, but the write to the RNG device has already happened (without credit). Repeatedly writing small amounts of data to the RNG device and then crediting it would give an attacker a window to brute-force the exact data that was written from the fast init pool. If the write-credit cycle is done enough times to fully initialize the input pool, the attacker could then brute-force the initial state of the pool.

The RNDADDENTROPY command combines the write and the credit in the same call, which allows the kernel to make a better decision on what to do with it, he said. With RNDADDTOENTCNT, the kernel does not know whether the data is simply meant to perturb the pool, or whether it should be counted as truly unpredictable data, until after the data has already been processed. He suggested that deprecating RNDADDTOENTCNT might be a good plan; Ts'o concurred on that.

Alex Xu did some research on the uses of RNDADDTOENTCNT in existing code, which led Donenfeld to question his patch. There are, it seems, programs that do exactly what Donenfeld was worried about. For example, maxwell is a daemon to feed jitter entropy into the Linux RNG; Xu said it operates as follows:

sandy-harris/maxwell is a "jitter entropy" daemon, similar to haveged. It writes 4 bytes of "generated entropy" to /dev/random, then calls RNDADDTOENTCNT, then repeats.

Donenfeld replied: "Okay bingo. The existence of this means that this patch will definitely introduce a new vulnerability." An attacker could brute-force the data 32 bits at a time, so his patch would need to change, he said. Part of the problem is that some early boot efforts to initialize the RNG in user space are not actually doing so unless there is entropy being credited during that process. David Laight said: "You can't really expect startup scripts to be issuing ioctl requests.". But that is exactly what is required if there are no other sources of entropy, as Donenfeld explained:

Crediting bits has required an ioctl since forever. Those shell scripts have been broken forever. The proposal here is to add new behavior to support those old broken shell scripts.
Fortunately, it seems sort of fixable. But only sort of. There are also a lot of complications, as detailed above. Part of it is that some people use /dev/urandom writes expecting it not to credit, while others use that and then later manually credit it with the ioctl. Both of those in general seem like not a very good interface for seeding the rng. The correct interface to use is RNDADDENTROPY, which takes both the data and whether it should be credited, since then the kernel knows what the intentions are and can do something smart with it. Barring that knowledge, we're in this vague middle ground where it's unclear what the user intends to do.

There was some discussion of possible heuristics for when to credit entropy for writes to the RNG devices. for example only during early boot or when the process has certain privileges, but all of those are fraught. There are other cases where crediting that entropy could be problematic or catastrophic. Eric Biggers provided an example of where things might go wrong. The Android system writes its kernel command line to /dev/urandom early in the boot process with the expectation that it will not count as entropy, "given that the command line might not contain much entropy, or any at all".

For those reasons, Donenfeld came to the conclusion that not changing the current behavior was the right thing to do.

Based on this, the fact that shell scripts cannot seed the RNG anyway, and due to the hazards in trying to retrofit some heuristics onto an interface that was never designed to work like this, I'm convinced at this point that the right course of action here is to leave this alone. There's no combination of /dev/urandom write hacks/heuristics that do the right thing without creating some big problem elsewhere. It just does not have the right semantics for it, and changing the existing semantics will break existing users.

His plan is to work with user-space utilities to make changes that reflect the current reality, so that file-based seeding works well everywhere. To that end he has created a simple SeedRNG program that is intended to be incorporated into programs that need this kind of functionality. If that work is successful, it "might lead to a better ecosystem and less boot time blocking and all that jazz".

It is clear that Donenfeld has injected some needed energy into the maintenance of the kernel RNG code. There is a lot of work going on in that area right now and, seemingly, more to come. For now, we are still facing that longtime kernel bugaboo, entropy gathering woes early in the boot process, but perhaps there is some light on the horizon in the form of other techniques that might improve the situation. In the meantime, reworking user space to properly use the facilities we do have looks like the right approach.

Problems emerge for a unified /dev/*random

Posted Mar 29, 2022 22:02 UTC (Tue) by jepler (subscriber, #105975) [Link] (1 responses)

Ages ago I made a random number generating device and a daemon for adding its entropy into the system. So I had to check out today how I did it back then (circa 2013): https://emergent.unpythonic.net/01257868826

It appears that I (A) asked the kernel how much entropy was available via /proc/sys/kernel/random/entropy_avail, (B) if it was at least 3072, slept (max entropy was/is 4096?) (C) otherwise, read enough data from my device, wrote it to /dev/random (not urandom!), and credited it with ioctl RNDADDTOENTCNT

So in this case, it looks like big gulps of data were added by my random daemon. Hooray for good luck in a long ago decision.

In any case, I don't use the device anymore.

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 12:39 UTC (Wed) by ncm (guest, #165) [Link]

Nowadays most devices come with a hardware random number generating device attached, typically a camera, radio, or microphone. The low bits of analog-to-digital converters provide a copious supply of entropy that the upper bits, howsoever predictable, cannot steal away.

Such devices might be more trustworthy than the "hardware RNG" instructions provided on processor cores, which we have seen demonstrated (in a recent AMD erratum) may be reliably turned off from microcode, and, we may presume, from an appropriate not-publicly-documented instruction sequence. Whether to try to defend against such an attack depends on your threat model, of course; if so, an unreliable source of random numbers might be the least of your problems. But defense in depth has rarely been a mistake.

Problems emerge for a unified /dev/*random

Posted Mar 29, 2022 23:50 UTC (Tue) by Hello71 (subscriber, #103412) [Link] (2 responses)

The article's claim that RNDADDTOENTCNT must be used to add entropy is incorrect, or at least incomplete. As explained in the mailing list thread, use of RNDADDTOENTCNT is highly discouraged, and the correct method to add entropy is to use the RNDADDENTROPY ioctl.

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 1:54 UTC (Wed) by jake (editor, #205) [Link] (1 responses)

> the correct method to add entropy is to use the RNDADDENTROPY ioctl.

the article doesn't ignore that, but it perhaps takes a while to get there; it talks about it starting around here:

> The RNDADDENTROPY command combines the write and the credit in the same call, which
> allows the kernel to make a better decision on what do with it, he said.

jake

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 13:41 UTC (Wed) by Hello71 (subscriber, #103412) [Link]

Ah, you're right.

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 17:35 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

An interface consisting of writing to a procfs file and then having to call an ioctl on it makes me reach for my unix barf bag.

Is there any hope of a sane reboot under sysfs?

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 19:34 UTC (Wed) by willmo (subscriber, #82093) [Link]

Maybe bananas idea, and maybe this is what jhoblitt is suggesting above, but what about making a new character device like "/dev/entropypool" whose writes are always credited as entropy?

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 21:22 UTC (Wed) by istenrot (subscriber, #69564) [Link] (3 responses)

Why don't we generate during a kernel build a sufficiently large kernel embedded binary blob to be used as good quality pre-boot stage random seed? Then mixed with hardware unique identifiers like network interface MAC addresses, SMBIOS data, hardware sensors, RTC time, etc we should have quite nice pre-boot random source. Right?

Problems emerge for a unified /dev/*random

Posted Mar 30, 2022 21:46 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

Except that it would be the same on every early boot.
And it would even be the same for every distro kernel (minus the mac, whatever mixing).

Problems emerge for a unified /dev/*random

Posted Mar 31, 2022 10:51 UTC (Thu) by nix (subscriber, #2304) [Link]

... and the whole problem here is that on some platforms there *is* no varying "MAC, whatever" to mix in, and those are the same platforms which are having trouble now. So this would add complexity to... not solve the problem at all :)

Problems emerge for a unified /dev/*random

Posted Apr 1, 2022 0:50 UTC (Fri) by zx2c4 (subscriber, #82519) [Link]

From this commit message:

Additionally, the plugin can pre-initialize arrays with build-time random contents, so that two different kernel builds running on identical hardware will not have the same starting values.

Problems emerge for a unified /dev/*random

Posted Mar 31, 2022 13:39 UTC (Thu) by pj (subscriber, #4506) [Link] (2 responses)

Maybe have writes to /dev/random credit the entropy pool but writes to /dev/urandom do not? It's approximately symmetrical to their read semantics.

Problems emerge for a unified /dev/*random

Posted Mar 31, 2022 15:48 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

That sounds dangerous as long as the `nobody` user can perform `</dev/zero >/dev/random`.

Problems emerge for a unified /dev/*random

Posted Apr 11, 2022 14:53 UTC (Mon) by flussence (guest, #85566) [Link]

Surprised to find out you're right. I assumed those device nodes would be only writable by root for this exact reason.

Problems emerge for a unified /dev/*random

Posted Mar 31, 2022 20:14 UTC (Thu) by jhhaller (guest, #56103) [Link]

My random issue was a number of years back, when I would build new systems frequently. The login would take forever to accept a connection. Eventually, I discovered that it was openssh building the host key and it required random numbers, and there was no source of entropy other than network packets, and not many of them. It may have gotten changed, I rarely build new machines anymore.

Problems emerge for a unified /dev/*random

Posted Mar 31, 2022 23:05 UTC (Thu) by gerdesj (subscriber, #5446) [Link]

I love the idea of random == urandom but the snags all seem to be about the boot process. Does this not indicate that there still a need for (at least) two randoms. Another one could be random_boot or brandom or whatever.

A booting system is obviously not in the same state as it will be when it's fully initialised and running, so why insist on pseudo devices like random being a "one size fits all"?

VPNs and webservers for example are huge consumers of randomness and can happily wait until the system is up and running and firing on all four. They could really benefit from a relatively simple random that "knows" that suitable sources of entropy are available and cryptographic researchers aren't going to get sarcastic! They tend to be bloody complicated and any simplification would be a good thing: eg an assumption about the quality of random.

The things that need early random can use brandom and accept its limitations and work with them and then switch to random when it's available if they need to.

Another option might be to have one random device that describes how useful it thinks it is and leave the consumer to take action based on that. "Hi I'm Mike and here's a stream of gibberish-ish(3)". The road to Hell is paved with odd interfaces ...

Problems emerge for a unified /dev/*random

Posted Apr 1, 2022 4:14 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (3 responses)

I remember having discussed this problem with Jason a few years ago, saying that until we instrument boot loaders to feed entropy on embedded devices, it's a dead end. Indeed, the only source of entropy you can have on small devices are:
- pre-init device contents (uninitialized RAM usually contains noise, except in VMs);
output GPIOs may also read noise before they're configured as outputs. UARTs
often read a first crappy byte. Many chips also include a "reset cause" register
that indicates power-on, reset, exception, etc

- device reset timings (slow devices such as UARTs do not always take an integral
number of cycles to reset; RTC's second transition also solely depends on when
the device was booted, relative to the current second.

- the device's configuration: very often you'll find a MAC or some device-specific
WiFi calibration data stored in a special area on the flash, that may differ from
device to device. When running in a VM, some arguments may come from other
means.

- other external persistent info (e.g. any RTC time value that varies between boots)

But often all these data are lost after the boot loader finishes initialization and transfers execution to the kernel. We'll *need* to standardize a solution for this, that boot loaders will have to use for future kernels if we want to improve the situation for such embedded devices. Otherwise they're too deterministic. I do remember that the SSH key I used to have on my old NSLU2 existed on at least 89 other devices connected to the net... This definitely shows that without early entropy there's little hope to collect more later and whatever we'll try to do can result in frustration. And while VMs are terrible for this, at least they can benefit from entropy being spoon-fed at boot by the hypervisor.

Problems emerge for a unified /dev/*random

Posted Apr 1, 2022 9:07 UTC (Fri) by wsy (subscriber, #121706) [Link] (2 responses)

I would never trust a boot loader unless it's open source. And I don't think SOC vendors will do that.

Problems emerge for a unified /dev/*random

Posted Apr 1, 2022 10:06 UTC (Fri) by nickleverton (subscriber, #81592) [Link] (1 responses)

U-Boot, which I would guess is probably used by most embedded devices that boot into Linux, is GPL2.0+. It's ideally placed to extract the early boot hardware-based randomness that Willy Tarreau mentions, running after any SoC ROM loader but before the Linux kernel. I am sure Grub could do similar things for bigger SoCs that boot from disk.

Problems emerge for a unified /dev/*random

Posted Apr 1, 2022 12:07 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

Yes U-Boot definitely is the best place here, since it often embeds the SPL code that's used to train the DDR memory. Typically the training is a good way to produce entropy since it tries timings for reliable transfers.

For the PC world, grub might be too late due to the BIOS often doing most of the cleanup. However on PCs there's often a video card whose memory is not reset and which contains garbage. I think it already happened to all of us to power-cycle a PC, then discover a fantom image of previous session for a fraction of a second when typing "startx" because that memory wasn't completely lost yet. And most PCs have hardware RNGs and jitter entropy anyway ;-)

Problems emerge for a unified /dev/*random

Posted Apr 7, 2022 23:55 UTC (Thu) by cypherpunks2 (guest, #152408) [Link]

> A more insidious problem, perhaps, is that a seed written to the RNG before it initializes will not actually be used until after the pool is initialized properly; "you might write in a perfectly good seed to /dev/urandom, but what you read out for the subsequent seed may be complete deterministic crap"

This has been noticed on StackExchange it seems https://security.stackexchange.com/questions/183506/rando...