Fixing getrandom()

Posted Sep 27, 2019 18:09 UTC (Fri) by jccleaver (guest, #127418)
In reply to: Fixing getrandom() by flussence
Parent article: Fixing getrandom()

> I can see the hysterical tech tabloid headlines already: “systemd announces business plan to brick all old systems unless you purchase an expensive security dongle”.

It's worth pointing out that half this problem is actually *caused* by having moved everything into systemd. If you needed entropy in early but post-initramfs boot and needed to be sure it was there, it was trivial enough to put some sort of arbitrary shell action way up in the script to do it.

Fixing getrandom()

Posted Sep 28, 2019 16:36 UTC (Sat) by mads (subscriber, #55377) [Link]

But it would be trivial to add that to systemd too, and why does it have to be a shell script? (not that systemd can't use bash, but why should it)

Fixing getrandom()

Posted Sep 29, 2019 9:26 UTC (Sun) by mezcalero (subscriber, #45103) [Link] (5 responses)

systemd reads a random seed off disk just fine for you, no need to write any script for that. Problem with the approach is that it's waaaay too late: we can only credit a random seed read from disk when we can also update it on disk, so that it is never reused. This means /var needs to be writable, which is really later during boot, long after we already needed entropy, and long after the initrd.

Hence, no, systemd is not causing this, systemd does what it can, but it can't magically create entropy where there is none.

Or to say this differently: that "arbitrary shell script" you are envisioning, what is it supposed to do? Where would it derive entropy from where neither the kernel nor systemd do or could do it at least as good?

if you care, have a look here, about the approach systemd takes to help you with the general problem: https://systemd.io/RANDOM_SEEDS.html

Lennart

Fixing getrandom()

Posted Sep 29, 2019 19:37 UTC (Sun) by flussence (guest, #85566) [Link]

So, I had a lingering question that the last paragraph of that page answers sufficiently.

I've got a system where the NVRAM is probably fine, but it has a broken EFI implementation (AMI), where nobody bothered to implement deallocating deleted vars, so eventually it'd start returning -ENOSPC for every write operation. Me naively leaving pstore panic logging enabled soon flushed that out (followed by real panic at efibootmgr failing, and a day of downtime trying to figure out what went wrong and tearing the room up to get at a CMOS jumper).

The kernel help text for EFI features could use a gentle reminder that yes, EFI firmware *is* written by the same nincompoops as the bad old BIOSes of the 90s, and should be equally mistrusted.

Fixing getrandom()

Posted Sep 30, 2019 19:30 UTC (Mon) by wahern (subscriber, #37304) [Link] (3 responses)

Who cares if it's reused? The same 32-byte random seed cryptographically mixed with a non-repeating nonce, like the system clock, would have the same strength as a new 32-byte random seed, presuming the seed remains confidential. Long-term it's better to change the seed in case confidentiality was unknowingly violated or the nonce repeats (clock reset), but as a practical matter those aren't prerequisites to having a reasonably sane and secure system, especially considering the alternative--a multiplicity of userland CSPRNGs all scavenging for entropy independently.

Fixing getrandom()

Posted Oct 1, 2019 17:28 UTC (Tue) by alonz (subscriber, #815) [Link] (2 responses)

You're assuming that there is a reasonably-initialized system clock at the point where entropy is required – this is just as wrong as any of the other assumptions regarding entropy.

Fixing getrandom()

Posted Oct 1, 2019 19:07 UTC (Tue) by wahern (subscriber, #37304) [Link] (1 responses)

I am assuming it, but I think it's a reasonable assumption--that there are adequate sources available to minimize the chance of a repeating nonce. The system clock was just an example. And of course there are systems where the assumption can't hold at all. So what? How many systems do there need to be to hold back everybody else? 1%? 0.1?%? 0.01%? 0%?

Some systems are just hopelessly broken when it comes to entropy. And that can't be fixed. But those systems are increasingly (and at this point likely *entirely*), small, embedded systems. It was always the responsibility of the designers of those systems to either make sure there's an entropy source available or design their firmware so that it wasn't necessary (i.e. no sshd generating a new private key on first boot). Are we going to let them hold back the inevitable *forever*? At some point we have to hold the stragglers' feet to the fire and cut our losses on the installed base--most of which would never upgrade, anyhow, and are unlikely to even be using getrandom(2) in the first place.

With the prevalence of not only RDRAND and similar on-chip sources, but also many other sources (e.g. Intel QuickAssist provided a hardware generator on the NIC controller since *before* it was even branded QuickAssist, EFI provides randomness, which in some cases comes from a hardware source--but that's a QoI issue), it's time to make the switch over to assuming (*loudly* assuming) that strong entropy is available at boot or will be available very shortly after boot (see CPU jitter hack as a last-ditch effort). Almost all of userland already makes this assumption, and has for quite some time, rightly or wrongly; now the ball is in the kernel's court to make good on that assumption to the best of its ability.

This *will* happen eventually, the only question is how long we'll wring our hands over misplaced concern for embedded platforms that are and were fundamentally broken. It's been almost 15 years since the VIA C3 included an on-chip RNG. Embedded designers have had ample warning about the necessity of providing strong entropy for a long time.

Fixing getrandom()

Posted Oct 1, 2019 20:23 UTC (Tue) by wahern (subscriber, #37304) [Link]

Also, just to be clear, the context for the boot seed was systemd. The overlap of embedded systems lacking both hardware entropy such as RDRAND and a reliable system clock but still running systemd is likely not very large. But then you also need to discount that by the odds of consecutive boots where systemd couldn't re-save a seed. *And* you need to discount it further by the odds the system was doing anything security critical. *And* you need to discount this by the odds that such a scenario would be distinguishable and exploitable.

Can this scenario exist? Sure. Does it exist? We should assume so. The only question is what's the risk, and does that risk outweigh the risk of not improving other aspects of the system's randomness semantics with the consequence that software will attempt to compensate *poorly*. And, again, what's that relative risk within the context of embedded system + systemd - RNG - clock?