Fixing getrandom()
A report of a boot hang in the 5.3 series has led to an enormous, somewhat contentious thread on the linux-kernel mailing list. The proximate cause was some changes that made the ext4 filesystem do less I/O early in the boot phase, incidentally causing fewer interrupts, but the underlying issue was the getrandom() system call, which was blocking until the /dev/urandom pool was initialized—as designed. Since the system in question was not gathering enough entropy due to the lack of unpredictable interrupt timings, that would hang more or less forever. That has called into question the design and implementation of getrandom().
Ahmed S. Darwish reported the original problem and tracked it down to the GNOME Display Manager (GDM), which handles graphical logins. It turns out that GDM was calling getrandom() in order to generate the "MIT magic cookie" that is used for authorization by the X Window System. As was pointed out by several in the mega-thread, using cryptographic-strength random numbers for the cookie (or much of anything in terms of X Window security) is well beyond the pale—a much weaker random number generator could have been used with no loss of security. Darwish noted that the call "only" requests a small number of random bytes (five calls requesting 16 bytes each) but, as Theodore Y. Ts'o said, that doesn't matter: by default getrandom() will not return anything until the cryptographic random number generator (CRNG) is initialized—which requires entropy.
When Darwish originally bisected the problem, he pinpointed an ext4 commit that had the effect of reducing the amount of disk I/O that was being done early in the boot process. That performance enhancement also, unfortunately, turned out to reduce the amount of entropy gathered on Darwish's laptop—to the point it would not boot. That change has been reverted for now.
getrandom()
Back in 2014, getrandom() was added at least partly in response to a complaint from the LibreSSL project that Linux lacked a way to get random numbers in the face of file-descriptor exhaustion. The "approved" mechanism was to read from /dev/urandom, but if an attacker arranged that all of the file descriptors were already open, that method could fail. So getrandom() was created to provide a way to get random numbers without a file descriptor or, even, a visible /dev/urandom (e.g. from a container or chroot()). In fact, getrandom() was intentionally designed to block until the /dev/urandom pool is initialized; prior to getrandom() there was no way for user space to be sure that enough entropy had been gathered to properly initialize the pool. Since the behavior of /dev/urandom is part of the kernel ABI, it could not change; a new system call was under no such constraints, of course.
Initializing the CRNG requires 512 bits of estimated entropy, or 4096 interrupts using the current calculations, which Ts'o said were conservatively chosen. getrandom() is clearly documented to block until that happens, but it has not stopped user space from sometimes using it incorrectly. Ts'o said that is going to be a problem moving forward:
Linus Torvalds noted that the RDRAND instruction does not exist everywhere, so it is no panacea. He is also concerned that problems stemming from fewer interrupts will only get worse:
Error return
Ts'o suggested adding a kind of "fail safe" flag that would let callers request that getrandom() only block for, say, two minutes; after that, the best available random numbers would be returned. But Torvalds believes that blocking by default is simply wrong. He said that any new flag should request the blocking behavior explicitly so that unthinking users get what they expect. Or perhaps an error could be returned:
Several seemed in agreement with that approach;
Darwish posted an RFC
patch along those lines.
Alexander E. Patrakov reworked
the commit message, but also complained about the idea of returning an
error and forcing user space to deal with the problem ("the whole result
looks like shifting the responsibility/blame without achieving anything
useful
").
Ts'o clearly thinks it is a bad idea,
overall, but somewhat waspishly further modified the
patch so that the blocking behavior was configurable at build time (or
via a kernel command-line parameter). Darwish took that one step
further and increased the length of commit message again, adding even
more background and details. In addition, at Torvalds's request, the
-EINVAL return was removed, so that getrandom()
effectively reverted to the same behavior as reading from
/dev/urandom: callers get the "best" randomness available at the
time of the call.
Lennart Poettering disagreed
with that approach, calling it "sticking your head in the
sand
" by providing bad random numbers to potentially sensitive
key-generation operations early in the boot process. He suggested that the
problem is not in the kernel at all and that it should be solved in user space:
Part of the problem may be that once the GNU C Library (glibc) got around to adding a wrapper for getrandom(), an OpenBSD-like getentropy() call was also added. However the OpenBSD version does not block, while the glibc version is implemented using getrandom() and, thus, can block indefinitely in the early boot process. Developers calling getentropy() might well be unaware of this little "gotcha"—though it is documented in the man page. As Torvalds and others mentioned in the thread, another problem is that once the system has blocked waiting for entropy, said entropy is likely to never arrive. User space needs to cause things to happen (e.g. keys pressed, disks accessed) to produce the interrupts necessary to get the CRNG initialized.
Yet another part of the problem that Torvalds sees is that there are (at least) two different kinds of users of getrandom() who are passing 0 for the flags value (which he calls "getrandom(0)"): those that actually want/need to block in order to get random numbers only after the CRNG has been initialized and those who are just after "good" random numbers and didn't think too hard about it. Callers of glibc's getentropy() could also fall into that latter category.
Limiting delays
Unsurprisingly, Torvalds was not in
favor of a configuration option; his first solution was to limit the wait time
of getrandom() to 15 seconds on the first call when the CRNG is
not initialized, reducing the delay on each subsequent call so that the
maximum possible delay is 30 seconds. The code returns -EAGAIN in
that case so
that user space can detect it. In a comment in the code (repeated in the
email message), he said: "Just asking for blocking random numbers is
completely and fundamentally wrong, and the kernel will not play that game.
"
That set off another huge sub-thread. Poettering once again said that
the problem should not be solved by the kernel in a "never trust
userspace
" fashion. Darwish posted
another version of his patch set that proposed a getrandom2()
system call, which used new flag names to be ever more explicit about the
intentions of the caller. But there are still plenty of flag bits
available for getrandom(), Torvalds said,
so introducing a new system call seems unnecessary.
Instead, he suggested reworking the flag values to better represent what was being asked for. The GRND_EXPLICIT flag would be used to indicate that user space "knows what it is doing", so if it explicitly asks to block forever, that will be honored. The GRND_SECURE and GRND_INSECURE values would ask for blocking and non-blocking behavior respectively, but both would also set the GRND_EXPLICIT bit. The patch left the getrandom(0) case alone, but Torvalds has plans for that as well:
And the new cases are defined to *not* warn. In particular, GRND_INSECURE very much does *not* warn about early urandom access when crng isn't ready. Because the whole point of that new mode is that the user knows it isn't secure.
Ts'o wondered why the getrandom(0) case was not simply mapped to the same behavior as GRND_SECURE, which would effectively be the same as it is today, but Torvalds was adamant that was the wrong approach; he is concerned that Ts'o is getting overly caught up in what Torvalds sees as theoretical attacks and is missing the real getrandom(0) problem. Torvalds intends the patch to be backported to the stable kernels, so any change to getrandom(0) will be in separate, mainline-only patch.
Jitter entropy
Patrakov asked about using "jitter entropy" as is done by the haveged entropy daemon. Using haveged is being suggested by some distributions as a way to ensure that there is enough entropy early in the system boot. He noted that the technique is controversial as some are concerned that it is not truly random data. Torvalds said that he is one of the skeptics, but thought it might provide a solution to the current mess:
Making absolutely nobody happy, but working in practice. And maybe encouraging the people who don't like jitter entropy to use GRND_SECURE instead.
But getrandom() has been in the kernel for five years and in glibc for more than two years, so it is clearly part of the kernel ABI. The behavior that some want when they call getrandom(0) should not be arbitrarily changed to provide "bad" random numbers in a way that breaks user-space programs. That was the upshot of Andy Lutomirski's argument in the thread. He agreed that the getrandom() call was poorly thought out before it was added, but that should not change now:
Lutomirski also believes it is a "straight up kernel bug
" that
blocking in getrandom(0) early in the boot deadlocks the system by
waiting for entropy. He suggested actively fixing that problem: "How about we make
getrandom() (probably
actually wait_for_random_bytes()) do something useful to try to seed
the RNG if the system is otherwise not doing IO.
" Torvalds is in
agreement with that, though he seems to be leaning toward the
jitter-entropy stopgap:
That will _work_, but it will also make the security-people nervous, which is just one more hint that they should move to GRND_SECURE[_BLOCKING].
The goal is to ensure that callers are really aware that they are asking to block (and potentially deadlock) waiting for the CRNG to be properly initialized. In the ambiguous default case, that may well not be the case, so Torvalds is determined to find a way to make that not block:
He is concerned about the amount of time it might take to gather enough jitter entropy to initialize the CRNG, however. He suggested that he was willing to block as long as 15 seconds, but thought that might require some kind of accelerated jitter-entropy technique. Patrakov said that acceleration was not needed as the existing technique can generate plenty of entropy in two seconds. In addition, as had also been noted elsewhere in the thread, Matthew Garrett pointed out that the Zircon kernel for the Fuchsia operating system initializes its CRNG using jitter entropy, which may lend some credibility to the technique.
ABI
In a departure from his usual stance, Torvalds seems fairly unconcerned about changing the kernel ABI in this case. He said that any breakage from changing getrandom(0) to time out was theoretical, but that the boot deadlock problem was real. In order for the generation of keys to fail under that scheme, he said, they would have to be generated at boot on idle machines that are not doing anything that would allow entropy to be collected. As Garrett noted, though, that is the exact scenario for which the getrandom(0) behavior was designed. Torvalds does not see that kind of key generation as anything other than a hypothetical, it seems.
The main difference between the proposals from Torvalds and Lutomirski is whether or not to actually provide some way for getrandom() callers to block, possibly forever, or not. Torvalds is willing to have that as a non-default option to getrandom(), while Lutomirski would prefer to simplify getrandom() (though the patch text calls it "getentropy()"), while also removing all of the machinery behind the /dev/random blocking pool. The net effect would be that users who truly need today's getrandom(0) behavior could still get it by reading /dev/random.
The thread is long and twisty; Torvalds's final decision is not yet clear. It does seem that something will be done to getrandom(0), but whether it times out or switches to jitter entropy in the problematic case is unclear. It does also seem that the blocking random number pool's days are numbered, as well, based on Torvalds's statements in the thread. But the final shape of those changes is not yet apparent.
It would seem that, once again, the kernel development community has failed in the design of an API/ABI. According to Torvalds and others, the default for getrandom() should never have been "block forever", but that information comes five years too late. API/ABI review is an area that the kernel has struggled with over the years; hopefully situations like this will provide enough incentive to take some extra time (and do some testing, though that probably would not have mattered here) before committing to an ABI that has to be supported, for the most part, anyway, forever.
Index entries for this article | |
---|---|
Kernel | Random numbers |
Posted Sep 27, 2019 16:25 UTC (Fri)
by jcm (subscriber, #18262)
[Link] (7 responses)
Posted Sep 27, 2019 17:53 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link]
Posted Sep 27, 2019 17:55 UTC (Fri)
by patrakov (subscriber, #97174)
[Link] (4 responses)
Posted Sep 27, 2019 18:37 UTC (Fri)
by jem (subscriber, #24231)
[Link] (3 responses)
Posted Sep 27, 2019 19:05 UTC (Fri)
by walters (subscriber, #7396)
[Link] (2 responses)
Posted Sep 29, 2019 20:05 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Sep 29, 2019 20:27 UTC (Sun)
by patrakov (subscriber, #97174)
[Link]
OTOH, jitter entropy will definitely help here, up to the point of making it completely unneeded to save entropy between reboots.
Posted Oct 1, 2019 20:54 UTC (Tue)
by kmeyer (subscriber, #50720)
[Link]
Posted Sep 27, 2019 18:03 UTC (Fri)
by flussence (guest, #85566)
[Link] (8 responses)
I can see the hysterical tech tabloid headlines already: “systemd announces business plan to brick all old systems unless you purchase an expensive security dongle”.
Posted Sep 27, 2019 18:09 UTC (Fri)
by jccleaver (guest, #127418)
[Link] (7 responses)
It's worth pointing out that half this problem is actually *caused* by having moved everything into systemd. If you needed entropy in early but post-initramfs boot and needed to be sure it was there, it was trivial enough to put some sort of arbitrary shell action way up in the script to do it.
Posted Sep 28, 2019 16:36 UTC (Sat)
by mads (subscriber, #55377)
[Link]
Posted Sep 29, 2019 9:26 UTC (Sun)
by mezcalero (subscriber, #45103)
[Link] (5 responses)
Hence, no, systemd is not causing this, systemd does what it can, but it can't magically create entropy where there is none.
Or to say this differently: that "arbitrary shell script" you are envisioning, what is it supposed to do? Where would it derive entropy from where neither the kernel nor systemd do or could do it at least as good?
if you care, have a look here, about the approach systemd takes to help you with the general problem: https://systemd.io/RANDOM_SEEDS.html
Lennart
Posted Sep 29, 2019 19:37 UTC (Sun)
by flussence (guest, #85566)
[Link]
I've got a system where the NVRAM is probably fine, but it has a broken EFI implementation (AMI), where nobody bothered to implement deallocating deleted vars, so eventually it'd start returning -ENOSPC for every write operation. Me naively leaving pstore panic logging enabled soon flushed that out (followed by real panic at efibootmgr failing, and a day of downtime trying to figure out what went wrong and tearing the room up to get at a CMOS jumper).
The kernel help text for EFI features could use a gentle reminder that yes, EFI firmware *is* written by the same nincompoops as the bad old BIOSes of the 90s, and should be equally mistrusted.
Posted Sep 30, 2019 19:30 UTC (Mon)
by wahern (subscriber, #37304)
[Link] (3 responses)
Posted Oct 1, 2019 17:28 UTC (Tue)
by alonz (subscriber, #815)
[Link] (2 responses)
Posted Oct 1, 2019 19:07 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (1 responses)
Some systems are just hopelessly broken when it comes to entropy. And that can't be fixed. But those systems are increasingly (and at this point likely *entirely*), small, embedded systems. It was always the responsibility of the designers of those systems to either make sure there's an entropy source available or design their firmware so that it wasn't necessary (i.e. no sshd generating a new private key on first boot). Are we going to let them hold back the inevitable *forever*? At some point we have to hold the stragglers' feet to the fire and cut our losses on the installed base--most of which would never upgrade, anyhow, and are unlikely to even be using getrandom(2) in the first place.
With the prevalence of not only RDRAND and similar on-chip sources, but also many other sources (e.g. Intel QuickAssist provided a hardware generator on the NIC controller since *before* it was even branded QuickAssist, EFI provides randomness, which in some cases comes from a hardware source--but that's a QoI issue), it's time to make the switch over to assuming (*loudly* assuming) that strong entropy is available at boot or will be available very shortly after boot (see CPU jitter hack as a last-ditch effort). Almost all of userland already makes this assumption, and has for quite some time, rightly or wrongly; now the ball is in the kernel's court to make good on that assumption to the best of its ability.
This *will* happen eventually, the only question is how long we'll wring our hands over misplaced concern for embedded platforms that are and were fundamentally broken. It's been almost 15 years since the VIA C3 included an on-chip RNG. Embedded designers have had ample warning about the necessity of providing strong entropy for a long time.
Posted Oct 1, 2019 20:23 UTC (Tue)
by wahern (subscriber, #37304)
[Link]
Can this scenario exist? Sure. Does it exist? We should assume so. The only question is what's the risk, and does that risk outweigh the risk of not improving other aspects of the system's randomness semantics with the consequence that software will attempt to compensate *poorly*. And, again, what's that relative risk within the context of embedded system + systemd - RNG - clock?
Posted Sep 27, 2019 18:37 UTC (Fri)
by cesarb (subscriber, #6266)
[Link] (2 responses)
Isn't "idle machines that are not doing anything else" exactly the situation in the first boot of a newly-installed distribution, which is when the long-term ssh host keys (which do need strong random numbers) are usually generated?
Posted Oct 4, 2019 7:14 UTC (Fri)
by kmeyer (subscriber, #50720)
[Link] (1 responses)
Posted Oct 4, 2019 11:36 UTC (Fri)
by Jandar (subscriber, #85683)
[Link]
Posted Sep 27, 2019 18:44 UTC (Fri)
by mgedmin (subscriber, #34497)
[Link] (3 responses)
Posted Sep 28, 2019 12:07 UTC (Sat)
by corsac (subscriber, #49696)
[Link] (2 responses)
Posted Sep 28, 2019 19:44 UTC (Sat)
by patrakov (subscriber, #97174)
[Link] (1 responses)
Posted Sep 29, 2019 9:01 UTC (Sun)
by corsac (subscriber, #49696)
[Link]
Posted Sep 27, 2019 21:07 UTC (Fri)
by vstinner (subscriber, #42675)
[Link]
OpenBSD fixed this boot issue: their bootloader loads entropy from disk, and the installer collects enough entropy. It doesn't cover all cases (read-only livecd, systems with no entropy source to feed new entropy, etc.), but it fix the "lack of entropy at boot" issue for the common case.
--
In 2015, a systemd script used Python to compute a hash. Python blocked on getrandom() at boot.
When the bug has been reported, a very long discussion started on the bug tracker, continued on the python-dev mailing list. A new mailing list has been created just to discuss this bug :-)
At the end, we decided to fallback on /dev/urandom if getrandom() blocks, to initialize the "secret hash seed". This secret is used to randomize the dictionary hash function, to reduce the disk of a denial of attack on dictionaries.
Moreover, the os.urandom() function has been modified on Linux (and Solaris: systems providing getrandom() syscall/function) to block (on purpose) until the system collected enough entropy.
Calling os.getrandom(1, os.GRND_NONBLOCK) can be used to check if getrandom() is going to block or not. Some people asked for this feature, but I'm not sure that it's really used in practice.
The https://www.python.org/dev/peps/pep-0524/ describes the issue and fix.
--
Python was an early adopter of getrandom() syscall, before it was exposed as a function in the glibc ;-)
Python keeps a file descriptor open on /dev/urandom for best performance. Some badly written applications close the file descriptor by mistake, so Python detects if the file descriptor changed (compare st_dev and st_ino) to workaround application bugs. Moreover, there is no lock on the file descriptor for best performance, which requires to detect when two threads open the file "at the same time".
So well, getrandom() avoids all these issues.
Posted Sep 28, 2019 5:08 UTC (Sat)
by ncm (guest, #165)
[Link] (7 responses)
Posted Sep 28, 2019 6:00 UTC (Sat)
by alonz (subscriber, #815)
[Link]
For most uses, a simple userspace solution that runs very early in the boot sequence and credits some environment noise as entropy should be sufficient. This would solve even the “initial SSHD seed” concerns — however it is easily broken by distributors / packagers who might remove it in the name of “faster boot”.
Posted Sep 29, 2019 6:04 UTC (Sun)
by edeloget (subscriber, #88392)
[Link]
Posted Sep 30, 2019 8:26 UTC (Mon)
by anton (subscriber, #25547)
[Link] (3 responses)
Posted Sep 30, 2019 11:59 UTC (Mon)
by excors (subscriber, #95769)
[Link] (1 responses)
> you need a way to get the raw data (transformation into JPEG usually tries to get rid of the noise that we want for the RNG)
It's not just the JPEG compression - the Android camera API is happy to give you uncompressed YUV but that still wouldn't be raw enough. You'd want the (typically) 10-bit Bayer data directly from the sensor, before the ISP has tried to make it look pretty (doing noise reduction, adjusting levels in a way that might saturate the noise out of existence, smoothing the image, etc). And you probably want to manually configure the sensor to maximise noise (long exposure, high gain, disable binning, etc). Android provides enough control to let applications request that, but I don't know how many of the camera drivers implement it fully, so it's probably not a very portable approach.
Posted Sep 30, 2019 12:49 UTC (Mon)
by anton (subscriber, #25547)
[Link]
Posted Oct 3, 2019 10:40 UTC (Thu)
by NRArnot (subscriber, #3033)
[Link]
Personally I'd go with a boot parameter "paranoia = n" (maybe the current and maximum value is 11, with a nod to Spinal Tap). 10 would allow use of the random number generator on the CPU chip if there is one, and thereby solve all the problems other than the possibility that (insert conspiracy theory here).
Posted Nov 19, 2021 16:51 UTC (Fri)
by Lawless-M (guest, #155377)
[Link]
Posted Sep 28, 2019 5:52 UTC (Sat)
by josh (subscriber, #17465)
[Link] (2 responses)
Many distributions (both live and initial-boot) generate SSH keys on boot. They do this *today*. That's not a hypothetical, that's a case that Debian folks have been discussing for a while now, where systems take forever to boot. This is still a bug today, if you don't have a hardware random number generator.
Posted Oct 4, 2019 7:21 UTC (Fri)
by kmeyer (subscriber, #50720)
[Link] (1 responses)
Posted Oct 4, 2019 9:23 UTC (Fri)
by zdzichu (subscriber, #17118)
[Link]
Posted Sep 28, 2019 6:09 UTC (Sat)
by alonz (subscriber, #815)
[Link] (1 responses)
This would mean that a userspace that doesn't initialize randomness early enough will just fail, loudly and deterministically. So even the folks who try to “optimize boot time” by just removing boot-time items without thinking won't be able to build a broken system that boots but isn't secure.
Posted Oct 4, 2019 7:31 UTC (Fri)
by kmeyer (subscriber, #50720)
[Link]
As to the rest, the APIs you ask for already exist.
> make getrandom() (and /dev/{u,}random) return an error if they have no randomness to provide.
getrandom(1, GRND_NONBLOCK) ⇒ -1/EAGAIN; poll(/dev/random, POLLIN, 0) ⇒ 0.
> This would mean that a userspace that doesn't initialize randomness early enough will just fail, loudly and deterministically.
All you need to do is add one of the above checks and a printf() to your userspace init process to produce the loud warning.
> So even the folks who try to “optimize boot time” by just removing boot-time items without thinking won't be able to build a broken system that boots but isn't secure.
That is the status quo with correct use of getrandom().
Posted Sep 28, 2019 9:57 UTC (Sat)
by dd9jn (✭ supporter ✭, #4459)
[Link] (3 responses)
Using Stephan Müller's jitter based entropy generator inside the kernel is by any means the Right Thing to do - even if it is for now only a fallback. In Libgcrypt's Windows version we already use it because on Windows the JitterRNG is the only non-external-hardware RNG which has been approved by Germany's BSI for use in restricted communication at the VS-NfD level. On Linux getrandom has been evaluated as fine but nevertheless we mix some entropy from the JitterRNG into our own entropy pool. Right, we also use RDRAND in addition and that is technically okay. But because RDRAND can't be evaluated the evaluation of Libgcrypt assumes that RDRAND adds 0 bits of entropy to the pool.
Posted Sep 28, 2019 19:52 UTC (Sat)
by patrakov (subscriber, #97174)
[Link] (2 responses)
Posted Sep 28, 2019 20:38 UTC (Sat)
by joib (subscriber, #8541)
[Link]
(Though in my non-expert opinion, it seems having a jitter entropy generator in the kernel for supported targets would be the least bad approach of those discussed here. Those few that run unsupported targets are hopefully sufficiently clueful that they can use a hw RNG, haveged, or maybe they don't need early boot random numbers anyway.)
Posted Oct 2, 2019 0:54 UTC (Wed)
by mangix (guest, #126006)
[Link]
Posted Sep 29, 2019 7:29 UTC (Sun)
by patrakov (subscriber, #97174)
[Link] (4 responses)
https://lore.kernel.org/lkml/CAHk-=wgjC01UaoV35PZvGPnrQ81...
Posted Sep 30, 2019 10:54 UTC (Mon)
by joib (subscriber, #8541)
[Link] (3 responses)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
Posted Sep 30, 2019 11:57 UTC (Mon)
by patrakov (subscriber, #97174)
[Link] (2 responses)
"""
If things as late as GDM/gnome-session are still "early boot", then which service does not count as early boot? See the problem?
Posted Sep 30, 2019 13:19 UTC (Mon)
by Otus (subscriber, #67685)
[Link]
From the point of view of the random pools, before this change, anything before the user gets a login screen is early boot. That's when you start getting more than a trickle of entropy.
Posted Oct 1, 2019 9:39 UTC (Tue)
by ceplm (subscriber, #41334)
[Link] (6 responses)
Posted Oct 1, 2019 9:49 UTC (Tue)
by ceplm (subscriber, #41334)
[Link] (3 responses)
Posted Oct 1, 2019 16:53 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Oct 4, 2019 6:50 UTC (Fri)
by kmeyer (subscriber, #50720)
[Link] (1 responses)
Posted Oct 10, 2019 20:28 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Oct 1, 2019 17:05 UTC (Tue)
by zlynx (guest, #2285)
[Link] (1 responses)
At any rate, it's a real problem. Because if they do something during boot such as generate their own SSH or SSL certificates the attacker only has to guess a few possibilities.
Posted Oct 1, 2019 19:02 UTC (Tue)
by ceplm (subscriber, #41334)
[Link]
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
You're assuming that there is a reasonably-initialized system clock at the point where entropy is required – this is just as wrong as any of the other assumptions regarding entropy.
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
2. Optionally, the installer can also generate and write out sshd host keys. There's not a lot of reason to wait until first boot for that.
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Inject entropy from the disk using the bootloader and similar problem fixed in Python
Fixing getrandom()
The actual scarce resource (in my opinion 😏) is random data that can be trusted by a truly-paranoid person. (Whether the paranoia is justified or not is a different question; I would expect the smart paranoid to use a hardware RNG, not trust the off-the-shelf randomness from a general-purpose computer + OS).
Fixing getrandom()
Fixing getrandom()
I do not know if a microphone or radio are good random sources, but a camera is. The resolution of camera sensors is high enough that the randomness of the photons coming in is reflected in the raw sensor output (and it is a lot for a (not too) bright picture). However, that means that the sensor must be on and receive significant light on booting, and you need a way to get the raw data (transformation into JPEG usually tries to get rid of the noise that we want for the RNG).
Fixing getrandom()
Fixing getrandom()
Thermal noise is relatively small compared to photon noise if the sensor receives significant photons, but may be enough for initializing the RNG. And of course you don't want to be have so much brightness and so much exposure that the sensor saturates, but you can recognize anything approaching saturation, and then use shorter exposure time, if too many pixels are saturated. Combining high gain with long exposure will give more thermal noise in darkness, but produce saturation if there is light.
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Template cannot contain pregenerated host keys, because every VM would have the same key.
Using installer everytime when creating new VM is not feasible, installation process takes too much time. Creating new VM is something that should take no more than few seconds.
In my opinion, a better solution would be to remove the automatic collection of entropy from the kernel at boot time, and require userspace to provide randomness (or to explicitly start the kernel randomness collection). And - make getrandom() (and /dev/{u,}random) return an error if they have no randomness to provide.
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
While this was triggered by what is arguably a user space bug with GDM/gnome-session asking for secure randomness during early boot, when they didn't even need any such truly secure thing, the issue ends up being that our "getrandom()" interface is prone to that kind of confusion, because people don't think very hard about whether they want to block for sufficient amounts of entropy.
"""
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()
Fixing getrandom()