Voodoo coding

Posted Jul 14, 2014 11:20 UTC (Mon) by cesarb (subscriber, #6266)
In reply to: Voodoo coding by mezcalero
Parent article: First Release of LibreSSL Portable Available

> This is so confused. The only correct thing to do if things fail, is well, to let them fail. Return an error, there's really no shame in that. People should be prepared that things fail.

The problem here is that C does not have exceptions.

It does no good to return an error code if everyone ignores it. It's especially bad in crypto: failure to seed the RNG results in something which _looks_ like a valid key/iv/nonce, works like a valid key/iv/nonce, but completely breaks the underlying mathematical assumptions the crypto algorithms depend on, by being easily guessable and/or not unique. Two years later, someone finally notices, and the whole Internet has to generate new keys (this has happened before).

With exceptions, ignoring the error kills the program. Without exceptions, the only sane way out is to pretend it was an uncaught exception and kill the program.

@busterb, if you are reading this, I can see where mezcalero is coming from: he's a systemd developer, and it's really bad if the init process is killed (though not nearly as bad as a crypto key compromise), so init system developers tend to develop allergies to libraries which kill their own process.

How about this suggestion: only the initial seed (and the first reseed after a fork) should kill the process on a failure return from getentropy(). If it fails on other reseeds, accept the failure (generating an extra few bytes with the RNG itself and using then as the new seed) and keep going. This way, a developer using libressl would only have to force a reseed (by trying to get a random number) at the start of the program (if you can't open a fd at that point, you have bigger problems and it's best to just dump core) and after a fork, and the developer would know the library won't randomly (heh) kill the program after that point.

> And yuck. A new syscall? Of course this can fail too. For example very likely on all kernels that dont't have that yet, i.e. all of today's...

Today's kernels shouldn't fail because they have the sysctl syscall (the idea is to try first the new syscall, then fallback to /dev/urandom, then fallback to sysctl). The idea is to get the getentropy() syscall (or equivalent; I'd propose a syscall with an extra flags parameter) into the kernel before sysctl is gone for good, so there won't be kernel versions where it all fails.

----

As an aside: it probably can't be done because of API compatibility concerns, but the way I'd do it if it was possible and didn't cause any new problems would be to open the fd to /dev/urandom early _and keep it open_ (let the kernel close it on exit or exec). If reading from an open /dev/urandom fd fails, you probably have bigger problems.

Voodoo coding

Posted Jul 14, 2014 19:15 UTC (Mon) by ledow (guest, #11753) [Link] (1 responses)

I'd be much more wary of a supposedly secure program not checking the return code of a function vital to its operation than an OS that deliberately and carefully returns that code in the first place.

Most programs in the world do not care about the randomness of a RNG. Only one type really does - those that handle public key encryption. If that program fails to check THE most important part of its initialisation and not at least throw out a warning string on stderr, then there's a bigger problem than how we signal that kind of error to it.

And, personally, I'd much prefer a warning of the "deprecation" kind in my logs from init if something goes wrong with that function, than any application crashing because it can't handle a particular syscall. If people are running secure systems and ignore printk messages that tell them the program used a function that it shouldn't, then they get what they deserve.

The "exceptions in C" thing is really just another dig at the language of choice in all these matters. There are plenty of ways for a C program to signal there was a problem - for instance failing any further calls until it has been properly initialised, setting a particular flag, returning a code to callers, etc. If people still AREN'T BOTHERING to check - whatever that method is - that's pretty much the death-knell to any kind of supposedly "secure" program, to my eyes.

Voodoo coding

Posted Jul 14, 2014 19:57 UTC (Mon) by alonz (subscriber, #815) [Link]

Just to correct one misconception—public key encryption is not the only case where randomness is mandatory. Quite a few other crypto primitives/schemes will fail subtly when used with bad randomness. A nice overview can be found here.

Deterministic public-key encryption is an active research area; for many uses (including common cases, such as key exchange) it actually is feasible.

Voodoo coding

Posted Jul 14, 2014 19:46 UTC (Mon) by wahern (subscriber, #37304) [Link] (6 responses)

The problem with sysctl is that RedHat has removed sysctl syscalls by default. sysctl(2) will _always_ fail on modern stock RedHat systems. It also fails on all the Gentoo systems I've tried, but I'm not sure if that's the default or a deliberate decision by our sysadmins. I only realized this recently as I use Debian and Debian-derivatives, and despite knowing about the kernel option I never fathomed that large vendors (especially ones which make claims to stable ABIs and APIs) would knowingly disable sysctl syscalls, considering all the software (like Tor) which depended on it at the time.

So sysctl({CTL_KERN, KERN_RANDOM, RANDOM_UUID}) is no longer a viable alternative. The only way to directly access kernel randomness is through an open reference to /dev/urandom or /proc/sys/kernel/random/uuid (the /proc sysctl interface).

That's the crux of the issue. If sysctl was still available then all would be well, other than some bickering over a sysctl versus a dedicated syscall interface.

In short: sysctl(2) is dead for all practical purposes on Linux. Now Linux behaves pretty much like Solaris, which never had sysctl (a later BSD extension). A lack of sysctl is one of the most annoying things about Solaris (although that's a long list).

OS X's arc4random also relies on /dev/urandom, since it copied an early FreeBSD implementation from before FreeBSD added sysctl({CTL_RAND, KERN_ARND}). And it will silently fail if /dev/urandom isn't visible when it initially seeds! And although I've long tried to support systems like OS X, Solaris, and FreeBSD<10.0 which lacked a kernel entropy syscall, I've always considered them second-class citizens in this regard, and willing to live with a disclaimer about possible issues. But now that Linux is second-class in this regard, it's a much more intolerable situation.

Voodoo coding

Posted Jul 14, 2014 20:05 UTC (Mon) by alonz (subscriber, #815) [Link] (4 responses)

By the way—another underutilized source of entropy in Linux programs is the vector returned by getauxval(AT_RANDOM). Sure, it is intended for use by libc (e.g. to produce stack canaries), but when nothing else is available, it can be very valuable.

Voodoo coding

Posted Jul 14, 2014 20:42 UTC (Mon) by wahern (subscriber, #37304) [Link] (3 responses)

Nice. I was unaware of that interface, although it doesn't help with forking, etc.

But it looks like Linux finally supports a fork-safe issetugid implementation. Linux was one of the last systems which didn't provide issetugid or a similar interface for detecting whether the current process or (crucially) an ancestor was setuid or setgid. glibc had a hack in its loader for supporting secure_getenv and similar behavior, but it wasn't guaranteed to work in children because it depended on the real and effective IDs being different, which wouldn't be the case if you effectively dropped privileges.

Voodoo coding

Posted Jul 14, 2014 21:20 UTC (Mon) by wahern (subscriber, #37304) [Link] (2 responses)

Caveat emptor: On OS X issetugid is another broken stub (like pselect) which doesn't actually implement the correct behavior, but apparently thrown in so software can compile while remaining silently, delightfully bug ridden. Although at least the pselect man page documents the broken behavior.

The BSDs and Solaris implement the correct behavior, as does Linux's new getauxval(AT_SECURE). That is, the status is inherited across fork but not exec.

Voodoo coding

Posted Jul 15, 2014 16:41 UTC (Tue) by busterb (subscriber, #560) [Link] (1 responses)

Hmm, that is interesting, I'll check it out.

Solaris 10 and 11.0 also apparently have issues with issetugid, though it kind-of works (they apparently didn't patch it for 10 because not enough software used it yet?)

http://mcarpenter.org/blog/2013/01/15/solaris-issetugid(2)-bug

Though there are more issues building on Solaris 10 so far, so we haven't crossed that bridge yet.

Voodoo coding

Posted Jul 15, 2014 16:55 UTC (Tue) by busterb (subscriber, #560) [Link]

Huh, ran the same test as above for Solaris on OS X 10.9.4, it would appear to have the same issue at first glance:

test: main: issetugid: 1
test: parent: issetugid: 1
test: parent: uid: 1000
test: parent: euid: 0
test: child: issetugid: 0
test: child: uid: 1000
test: child: euid: 0

Voodoo coding

Posted Jul 14, 2014 20:23 UTC (Mon) by wahern (subscriber, #37304) [Link]

For the record (lest somebody try to use me as a strawman), all my code always checked the return value of sysctl and fell back on /dev/urandom. If that failed my apps then went through the typical horrible hacks of manually collecting entropy, although I realize now that was a poor engineering decision--obscuring the Red Hat kernel changes for far too long--and am changing all such code to bail by default.