LWN: Comments on "Another try for getrandom() in the vDSO"

RFC 7539

steve1440 — Fri, 02 Aug 2024 19:48:21 +0000

RFC 7539 (ChaCha20 stream cipher link) has been obsoleted by RFC 8439.
https://datatracker.ietf.org/doc/html/rfc8439

End goal

NYKevin — Fri, 05 Jul 2024 23:31:31 +0000

(Just in case I wasn't clear enough: Attackers usually want to compromise as many users as possible, so an attack that only affects a tiny fraction of the computer-using population is simply not worth developing. Especially when a significant portion of that tiny fraction is made up of security researchers, whom malware authors generally try to avoid hitting in order to further obfuscate their work.)

End goal

NYKevin — Fri, 05 Jul 2024 23:28:39 +0000

Simpler, and far less common. The people running desktop VMs are mostly security researchers and a few power users and hobbyists. Especially if it's desktop *Linux*. In the real world, to a first approximation, a Linux VM is nearly always a cloud VM.

End goal

comex — Fri, 05 Jul 2024 20:43:59 +0000

I think you're still envisioning a server. That's definitely one possible scenario, but as I described in the rest of my comment, a simpler scenario is a desktop VM where the user is manually pausing the VM and either restoring it from snapshot or cloning it. Yes, this normally drops TCP connections, but not for no apparent reason.

End goal

NYKevin — Fri, 05 Jul 2024 20:28:12 +0000

> When I mentioned TLS, I was imagining a scenario where the VM just happens to fork while some software on it is coincidentally in the middle of a TLS connection, and meanwhile there is an on-path network attacker specifically waiting for it to fork, with a custom TCP implementation designed to paper over the broken sequencing.

That would require the application to be originally deployed in a broken state where it randomly drops TCP connections for no apparent reason. Maybe there are some people who do that, but I wouldn't want to work there.

Make vm fork information available instead?

bluca — Fri, 05 Jul 2024 15:32:09 +0000

It is a very sad situation indeed, especially because in the end, even though it could be really trivially fixed, the result is that from that point of view, Windows works better as a VM guest than Linux.

Make vm fork information available instead?

bluca — Fri, 05 Jul 2024 15:29:53 +0000

Those are excellent questions. The issue is that the random subsystem maintainer also happens to the vmgenid acpi driver maintainer.

Make vm fork information available instead?

WolfWings — Fri, 05 Jul 2024 15:24:19 +0000

Especially because it seems like they moved their proposal entirely out of random's purvue since they're not tracking reseeds but an entirely different and specific event, so I don't see why the NACK from random matters when it's no longer integrating with their subsystem?

Known number of threads - bad assumption?

corbet — Fri, 05 Jul 2024 11:28:07 +0000

You can always allocate more space for more threads, so that should not be a problem.

Not sure this solves more headaches than it creates

neverpanic — Fri, 05 Jul 2024 11:15:26 +0000

I'm not convinced this actually solves more headaches than it creates. Yes, random number generation is kind of slow, and it would be great if cryptographic libraries could just point to getrandom() whenever they needed random numbers without being concerned that it is going to be slow.

Linus' insists on actual users speaking up, but I'm convinced those do exist. Heck, we looked at the OpenSSL RNG setup and thought about ripping out their three-tiered primary-per-process and public/private-per-thread DRBG tree and just replacing it, or parts of it, with getrandom(). Now, we considered this because of FIPS certification and the requirements it imposes on chained DRBGs, all of which become much simpler if you just don't chain.

But, and this brings me to the two points I want to make: Merging this patchset in Linux does not solve the issue of a slow RNG on other platforms, so cryptographic libraries with cross platform support (i.e., essentially all of them) will still keep a multi-level setup, seed from the kernel and then run a user space DRBG, or just continue accepting that they are slow. So yeah, it's great if this would exist, but anybody who takes this seriously and isn't just optimizing for a single platform already has to deal with this in user space anyway, so there really isn't as big of a gain as the author claims.

And second, the author claims this would use the exact same DRBG algorithm as used by the kernel, but completely ignores that the kernel will use a different DRBG in FIPS mode (because ChaCha20 isn't FIPS-compliant). This is also the source of the headache I have with the proposal. What's currently in the kernel sort of works for us, but if this lands we need to take a closer look to either figure out how to use a CTR-DRBG in user space instead of the ChaCha20 one, plus fulfill the DRBG chaining requirements that NIST requires, or figure out a way to completely disable this.

For these reasons, I don't think this should land. Just the generation counter that Linus' proposed, that sounds like a good improvement for user space to do the right thing™ on fork or VM cloning, but the rest of the stuff is possibly an improvement for the top 0.5% of cloud load balancers, and pointless for everybody else.

Make vm fork information available instead?

fw — Fri, 05 Jul 2024 10:53:22 +0000

MADV_WIPEONFORK was added to Linux 4.14: https://www.man7.org/linux/man-pages/man2/madvise.2.html

It does not cover VM forks, though. People disagree whether this is a problem.

droppable sounds great

skissane — Fri, 05 Jul 2024 09:01:28 +0000

See this patch https://lore.kernel.org/linux-mm/20240703183115.1075219-2...

The difference in semantics with MADV_FREE is explained in the comments on the last file in the patch (mm/rmap.c):

1. "Unlike MADV_FREE mappings, VM_DROPPABLE ones can be dropped even if they've been dirtied."
2. "Unlike MADV_FREE mappings, VM_DROPPABLE ones never get swap backed on failure to drop."

Linus commenting....

skissane — Fri, 05 Jul 2024 08:51:46 +0000

Linus disagrees with vgetrandom_alloc() syscall. He seems to be suggesting, instead, exposing VM_DROPPABLE as a new mmap() flag, MAP_DROPPABLE, so anyone can use it, not just vgetrandom.

https://lore.kernel.org/all/CAHk-=win2mesMNEfL-KZQ_jk1YH8...

droppable sounds great

jtaylor — Fri, 05 Jul 2024 08:15:19 +0000

can't you do what you can do with VM_DROPPABLE already with madvise(MADV_FREE)?

I'm not quite clear on the difference, I assume VM_DROPPABLE can have memory reset to zero on read at any time while MADV_FREE can only be zeroed after the madvise once?

Known number of threads - bad assumption?

taladar — Fri, 05 Jul 2024 07:44:49 +0000

I imagine whoever wrote that requirement to specify the number of threads does not work on code bases outside of C where every library has its innards hanging out for everyone to see. How are you supposed to even know how many threads the overall program uses if you have proper encapsulation that does include thread use, possibly by indirect dependencies?

droppable sounds great

smurf — Fri, 05 Jul 2024 07:12:37 +0000

Also, when you have VM_DROPPABLE (and a generation counter in the VDSO) the rest can easily be implemented in libc.

End goal

comex — Fri, 05 Jul 2024 05:51:59 +0000

When I mentioned TLS, I was imagining a scenario where the VM just happens to fork while some software on it is coincidentally in the middle of a TLS connection, and meanwhile there is an on-path network attacker specifically waiting for it to fork, with a custom TCP implementation designed to paper over the broken sequencing.

As for why the VM forks in the first place, well, as one possibility, it could be a desktop VM which the user manually chose to fork (while some service was talking to the network in the background). Some desktop VM software offers cloning as an option. Or even without cloning, the risks seem similar if the VM is just restored from a snapshot.

Admittedly, waiting for a desktop VM to be forked/restored seems like a pretty niche thing for an attacker to do, but not completely unrealistic. I'm sure there are people who make a habit of regularly restoring their VMs from snapshot.

droppable sounds great

ringerc — Fri, 05 Jul 2024 05:00:04 +0000

I'm actually more interested in VM_DROPPABLE than the randomness patch.

That sounds like something that would be exceedingly handy for garbage collected languages' object reuse lists, various in-app caches, etc.

Make vm fork information available instead?

donald.buczek — Fri, 05 Jul 2024 04:49:40 +0000

OMG, this is really sad.

End goal

NYKevin — Fri, 05 Jul 2024 04:17:21 +0000

Personally, I'm unclear on how a mid-request fork would even function. Both parent and child VMs are (presumably) going to be trying to talk to the remote host at the same time. If your guests have distinct public IP addresses, then at least one of the two connections should break right there. If not, then you'd have interleaved TCP streams with duplicate sequence numbers, and the whole thing should fall apart pretty quickly.

The only case I can think of where this could possibly work is when the application is in cahoots with the VM management plane and intentionally causes VM forks to occur as part of its request handling logic. But in that case, this is trivial: At worst, you have to design your request handling logic to ensure that the fork does not happen at the same time as some delicate crypto code is running (e.g. take a lock). Which you were maybe already doing anyway, since some code in this genre uses a userspace CSPRNG instead of getrandom (for the performance reasons cited in the article), and that absolutely requires you to be aware of forking and mitigate it.

End goal

josh — Thu, 04 Jul 2024 23:56:31 +0000

In general, uses of vmfork that care about security are forking at request time, not at some random time in the middle.

End goal

comex — Thu, 04 Jul 2024 22:39:28 +0000

I still don’t understand how this design allows programs to be truly safe in the face of VM cloning. Sure, with kernel support you can fix things up if the clone happens before or during the call to getrandom(). But what if it happens right after? When the program has gotten some random bytes but has yet to do anything with them?

I suppose it’s enough for some cases. Suppose you need to sign a message with a cryptographic scheme that requires a random nonce, where it’s very bad to sign two messages with the same (or perhaps related) nonce. Then you can ensure that your code generates the nonce with a single call to getrandom(), *after* reading in or otherwise determining the message to be signed. If the VM forks before or during the call to getrandom(), you’ll get a fresh new nonce (theoretically). If it forks after, then you’ll get the same nonce, but it’ll be used to sign the same message, which isn’t (necessarily) insecure.

But what about more complex protocols, particularly interactive ones (e.g. TLS)? Is a single call to getrandom() a flexible enough API? It feels like programs need a way to say “abort this connection if a VM fork has happened at any time since the connection started”. Perhaps even something that’s atomic with send() somehow.

Disclaimer: Not a cryptography expert.

Make vm fork information available instead?

bluca — Thu, 04 Jul 2024 22:12:14 +0000

That was proposed and implemented (and even almost merged) several times in several different forms, but always rejected by the random maintainer:

https://lore.kernel.org/lkml/e1c03136-b873-1f1d-8b06-d918...
https://lore.kernel.org/lkml/e09ce9fd-14cb-47aa-a22d-d295...

Make vm fork information available instead?

mb — Thu, 04 Jul 2024 19:54:19 +0000

>virtual-machine forks, that can compromise a random-number generator and make a reseeding necessary

Wouldn't it be better to make fork information more easily available to userspace?
A userspace RNG could react on it and it could potentially have more uses beyond RNG.

This all sounds like it should be solved at a lower level.

Linus commenting....

knewt — Thu, 04 Jul 2024 19:38:45 +0000

Funnily enough, Linus started commenting on it less than an hour after this article was posted 😂

The "deconflicting new syscall numbers for 6.11" thread on lkml: https://lore.kernel.org/lkml/CAHk-=wiGk+1eNy4Vk6QsEgM=Ru3...