Voodoo coding

Posted Jul 13, 2014 20:46 UTC (Sun) by mezcalero (subscriber, #45103)
In reply to: Voodoo coding by cesarb
Parent article: First Release of LibreSSL Portable Available

This is so confused. The only correct thing to do if things fail, is well, to let them fail. Return an error, there's really no shame in that. People should be prepared that things fail.

And yuck. A new syscall? Of course this can fail too. For example very likely on all kernels that dont't have that yet, i.e. all of today's...

This sounds like an awful lot of noise around something that isnt really a real problem anyway, since that chroot example is awfully constructed...

If these are the problems the libressl folks think are important, then libressl is npt going to look any cleaner in a few years than openssl does now...

Voodoo coding

Posted Jul 14, 2014 3:42 UTC (Mon) by wahern (subscriber, #37304) [Link] (13 responses)

CSPRNG calls are so deeply embedded within program logic that allowing the calls to fail in the normal course of execution (I.e. hitting a descriptor limit) is as sane as allowing unsigned addition to throw an error. It's simply not a sane interface. (Of course if the CSPRNG cannot be seeded it should abort the program)

Also, chroot jails _should_ be within volumes mounted nodev. Nor should /proc be visible. The point of a chroot jail is to minimize kernel attack surface.

This is probably why OpenBSD just added a getentropy syscall to replace their sysctl interface; to allow simple, wholesale disabling of the sysctl interface entirely using systrace.

Researchers have settled on the sane behavior of a CSPRNG syscall: block until initial seeding, then never block again. And DJB has argued that once seeded the kernel CSPRNG should never be seeded again, as it would be superfluous and might provide more opportunity for malicious hardware to exfiltrate bits undetectably.

The issue is simply no longer debatable. The proper API is precisely something like getentropy.

Voodoo coding

Posted Jul 14, 2014 4:39 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (3 responses)

> CSPRNG calls are so deeply embedded within program logic that allowing the calls to fail in the normal course of execution (I.e. hitting a descriptor limit) is as sane as allowing unsigned addition to throw an error. It's simply not a sane interface. (Of course if the CSPRNG cannot be seeded it should abort the program)

What? You compare a single instruction issue with a call that does a long series of complex mathematical computations? With kernel interaction, entropy estimation, et. al.? Really?
If you write safety critical code that's written in a way that makes it impossible or infeasible to check for errors when using a CSPRNG: Please stay away from anything I will possibly use.

I don't have particularly strong feelings for/against getentropy() but this argument isn't doing it any favors.

> Also, chroot jails _should_ be within volumes mounted nodev. Nor should /proc be visible. The point of a chroot jail is to minimize kernel attack surface.

There's some value in that argument, but I think in reality the likelihood of opening new holes in software because /dev/null, /dev/urandom, /proc/self et al. aren't available is much higher than the security benefit.

Voodoo coding

Posted Jul 14, 2014 19:10 UTC (Mon) by wahern (subscriber, #37304) [Link] (2 responses)

Anyone who manages to turn an algorithm requiring O(1) space into O(N) space, especially an algorithm existing in a definite and fixed problem space that does not nor will ever benefit from any type of abstraction (in the manner of file objects), probably shouldn't be writing software, period. (Granted, we got to where we are by contingent history, so I don't blame the people who came up with /dev/urandom, only the people who defend it despite overwhelming experience and reason.)

At the end of the day it's a QoI issue.

If I have a non-blocking server and an already established socket to a browser and want to establish a secure channel with perfect forward secrecy, and I try to generate some random numbers, but the operation of simply generating a random number could fail, do you have any idea how f'ing ugly it is to insert a _timer_ and a loop trying acquire that resource? Of course it's possible. But it's infinitely nastier than dealing with other kinds of failures, and completely unnecessary. (And compound all of this by trying to do this in a library, lest you simply argue that one should open /dev/urandom and leave it open, which is sensible but still problematic.)

But thanks for the ad hominem. Even though I check every malloc call, handle multiplicative overflow when I can't prove it's safe, and try to regularly test these failure paths (which is the most difficult of all); and despite the fact that you'd probably have to stop using Apple products, Google products, and several other services and products if you wanted to avoid using my software directly or indirectly; and not withstanding the fact that my /dev/urandom wrappers have been used in all manner of software, including derivatives in some extremely popular open source software; I guess I never thought about how easy it is to overcome the design problems with /dev/urandom.

Voodoo coding

Posted Jul 14, 2014 19:37 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (1 responses)

> If I have a non-blocking server and an already established socket to a browser and want to establish a secure channel with perfect forward secrecy, and I try to generate some random numbers, but the operation of simply generating a random number could fail, do you have any idea how f'ing ugly it is to insert a _timer_ and a loop trying acquire that resource?

Why do you need a timer? Why is this different than any of the other dozen or two of things you need to do to establish a encrypted connection to another host?
If error handling in any of these parts - many of which are quite likely to fail (dns, connection establishment, public/private key crypto, session key negotiation, renegotiation) - is a fundamental structural problem something went seriously wrong.

> Of course it's possible. But it's infinitely nastier than dealing with other kinds of failures, and completely unnecessary. (And compound all of this by trying to do this in a library, lest you simply argue that one should open /dev/urandom and leave it open, which is sensible but still problematic.)

You argued that it's required to this without /dev/urandom because it is *impossible* to do error handling there. Which has zap to do with being asynchronous btw.
Note that /dev/urandom - if it actually would block significantly for the amounts of data we're talking about here - would allow for *more* of an async API than a dedicated getentropy() call. The latter basically has zero chance of ever getting that. You're making arguments up.

Voodoo coding

Posted Jul 14, 2014 20:12 UTC (Mon) by wahern (subscriber, #37304) [Link]

I never said it was impossible. I said it wasn't a sane interface.

And I stand by that claim. Why make something which could fail when you don't have to and it's trivial not to?

I always try to write my server programs in a manner which can handle request failures without interrupting service to existing connections. There are various patterns to make this more convenient and less error prone, but one of the most effective is RAII (although I don't use C++), where you acquire all the necessary resources as early as possible, channeling your failure paths into as few areas as possible. I also use specialized list, tree, and hash routines which I can guarantee will allow me to complete a long set of changes to complex data structures free of OOM concerns. One must rigorously minimize the areas that could encounter failure conditions so as to ensure as few bugs as possible in the few areas that are contingent on success or failure or logical operations.

But how many applications do you know of which bother trying to ensure entropy is available in the very beginning of process startup or request servicing? How do would you even do this in a generic fashion? Is it really sane to open a descriptor for every request, or to cache a separate descriptor inside every component or library that might need randomness? If you seed another generator, how do you handle forking? getpid? pthread_atfork? There's a reason most PRNGs (CSPRNGs included) support automatic seeding; not just for convenience, but for sane behavior in the common case.

Hacks and tweaks to the kernel implementation of /dev/urandom to ensure entropy is ready as soon as possible is a perennial bicker-fest, and yet those can't compare to the contortions applications would need to go through just to maintain a descriptor. And they'd all be doing it differently! That's not a recipe for a secure application ecosystem. And getting people to use third-party libraries (like Nick Matthewson's excellent libottery) would be like herding cats and adds an unnecessary dependency. It harks back to the bad old days of EGD, before even /dev/urandom was available.

Of course it's possible. Lots of things are possible, but not all things are practical given limited human and machine resources, and even less unequivocally contribute to a safer software ecosystem free of hidden traps.

When I talk about CSPRNGs being deeply embedded within other algorithms, imagine things like a random sort, or a random UUID generator. These are almost always implemented through a single routine and normally would never need to communicate a failure because _logically_ they should never fail. And yet they could fail, even with valid input, if you rely on /dev/urandom without taking other extraordinary measures completely unrelated to the core algorithm.

Computational complexity attacks, side-channel attacks, etc, have made use of CSPRNGs useful and in many cases mandatory within many different kinds of algorithms which once upon a time could never fail.

Voodoo coding

Posted Jul 14, 2014 7:28 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Except that chroot is NOT a way to minimize an attack surface. The docs says so. And the root user has tons of ways to escape the chroot on Linux.

Voodoo coding

Posted Jul 14, 2014 14:18 UTC (Mon) by rsidd (subscriber, #2582) [Link]

The OP said "chroot jail", not "chroot" -- presumably meaning something like the FreeBSD version.

Voodoo coding

Posted Jul 14, 2014 18:53 UTC (Mon) by wahern (subscriber, #37304) [Link] (6 responses)

A chroot jail implies dropping privileges. It's not much of a jail if you can walk out.

Voodoo coding

Posted Jul 14, 2014 18:54 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Linux doesn't have chroot jails.

Voodoo coding

Posted Jul 14, 2014 19:16 UTC (Mon) by wahern (subscriber, #37304) [Link] (4 responses)

chdir, chroot, setgid, setuid, etc.

Linux absolutely does support chroot jails. And plenty of software does this, and it's 100% portable to almost all POSIX-compliant or POSIX-aspiring systems. (Notwithstanding the fact that chroot was removed from POSIX.)

Actually, Linux supports chroot jails more than most, as PaX has patches which can prevent even root from breaking out using the normal methods, and there are patches floating around which allow you to keep descriptors to directories outside the chroot jail open by preventing use of fchdir or openat which would allow you to break out.

Voodoo coding

Posted Jul 14, 2014 20:21 UTC (Mon) by PaXTeam (guest, #24616) [Link]

PaX itself doesn't have the hardened chroot feature, grsecurity does.

Voodoo coding

Posted Jul 15, 2014 18:28 UTC (Tue) by drag (guest, #31333) [Link] (2 responses)

root in chroot still has root privileges. Unless you are extremely careful breaking out of a chroot 'jail' is _VERY_ easy.

If chroot made sense from a security perspective we wouldn't have any need for things like 'LXC containers'.

Voodoo coding

Posted Jul 15, 2014 20:34 UTC (Tue) by wahern (subscriber, #37304) [Link]

If you setgid and setuid to a non-privileged user and don't have any open directory descriptors, how easy is it to get out?

There are issues with signal and ptrace, but those are easily fixed by using a specialized UID and GID per service.

Arguing that root can break out of a chroot jail is a strawman. Nobody runs as root inside a chroot jail.

And if you're really paranoid, neither LXC nor even full-blown virtualization is sufficient, because the Linux kernel (like all software) is riddled with bugs, and last time I checked sophisticated hackers didn't find themselves defeated by the presence of VMWare or KVM.

Voodoo coding

Posted Jul 15, 2014 23:39 UTC (Tue) by dlang (guest, #313) [Link]

well, you would get out of root as quickly as you can after establishing the chroot, and if you properly minimize the things accessible inside the chroot you make it harder to find a local exploit to get back to root.

Voodoo coding

Posted Jul 14, 2014 11:20 UTC (Mon) by cesarb (subscriber, #6266) [Link] (9 responses)

> This is so confused. The only correct thing to do if things fail, is well, to let them fail. Return an error, there's really no shame in that. People should be prepared that things fail.

The problem here is that C does not have exceptions.

It does no good to return an error code if everyone ignores it. It's especially bad in crypto: failure to seed the RNG results in something which _looks_ like a valid key/iv/nonce, works like a valid key/iv/nonce, but completely breaks the underlying mathematical assumptions the crypto algorithms depend on, by being easily guessable and/or not unique. Two years later, someone finally notices, and the whole Internet has to generate new keys (this has happened before).

With exceptions, ignoring the error kills the program. Without exceptions, the only sane way out is to pretend it was an uncaught exception and kill the program.

@busterb, if you are reading this, I can see where mezcalero is coming from: he's a systemd developer, and it's really bad if the init process is killed (though not nearly as bad as a crypto key compromise), so init system developers tend to develop allergies to libraries which kill their own process.

How about this suggestion: only the initial seed (and the first reseed after a fork) should kill the process on a failure return from getentropy(). If it fails on other reseeds, accept the failure (generating an extra few bytes with the RNG itself and using then as the new seed) and keep going. This way, a developer using libressl would only have to force a reseed (by trying to get a random number) at the start of the program (if you can't open a fd at that point, you have bigger problems and it's best to just dump core) and after a fork, and the developer would know the library won't randomly (heh) kill the program after that point.

> And yuck. A new syscall? Of course this can fail too. For example very likely on all kernels that dont't have that yet, i.e. all of today's...

Today's kernels shouldn't fail because they have the sysctl syscall (the idea is to try first the new syscall, then fallback to /dev/urandom, then fallback to sysctl). The idea is to get the getentropy() syscall (or equivalent; I'd propose a syscall with an extra flags parameter) into the kernel before sysctl is gone for good, so there won't be kernel versions where it all fails.

----

As an aside: it probably can't be done because of API compatibility concerns, but the way I'd do it if it was possible and didn't cause any new problems would be to open the fd to /dev/urandom early _and keep it open_ (let the kernel close it on exit or exec). If reading from an open /dev/urandom fd fails, you probably have bigger problems.

Voodoo coding

Posted Jul 14, 2014 19:15 UTC (Mon) by ledow (guest, #11753) [Link] (1 responses)

I'd be much more wary of a supposedly secure program not checking the return code of a function vital to its operation than an OS that deliberately and carefully returns that code in the first place.

Most programs in the world do not care about the randomness of a RNG. Only one type really does - those that handle public key encryption. If that program fails to check THE most important part of its initialisation and not at least throw out a warning string on stderr, then there's a bigger problem than how we signal that kind of error to it.

And, personally, I'd much prefer a warning of the "deprecation" kind in my logs from init if something goes wrong with that function, than any application crashing because it can't handle a particular syscall. If people are running secure systems and ignore printk messages that tell them the program used a function that it shouldn't, then they get what they deserve.

The "exceptions in C" thing is really just another dig at the language of choice in all these matters. There are plenty of ways for a C program to signal there was a problem - for instance failing any further calls until it has been properly initialised, setting a particular flag, returning a code to callers, etc. If people still AREN'T BOTHERING to check - whatever that method is - that's pretty much the death-knell to any kind of supposedly "secure" program, to my eyes.

Voodoo coding

Posted Jul 14, 2014 19:57 UTC (Mon) by alonz (subscriber, #815) [Link]

Just to correct one misconception—public key encryption is not the only case where randomness is mandatory. Quite a few other crypto primitives/schemes will fail subtly when used with bad randomness. A nice overview can be found here.

Deterministic public-key encryption is an active research area; for many uses (including common cases, such as key exchange) it actually is feasible.

Voodoo coding

Posted Jul 14, 2014 19:46 UTC (Mon) by wahern (subscriber, #37304) [Link] (6 responses)

The problem with sysctl is that RedHat has removed sysctl syscalls by default. sysctl(2) will _always_ fail on modern stock RedHat systems. It also fails on all the Gentoo systems I've tried, but I'm not sure if that's the default or a deliberate decision by our sysadmins. I only realized this recently as I use Debian and Debian-derivatives, and despite knowing about the kernel option I never fathomed that large vendors (especially ones which make claims to stable ABIs and APIs) would knowingly disable sysctl syscalls, considering all the software (like Tor) which depended on it at the time.

So sysctl({CTL_KERN, KERN_RANDOM, RANDOM_UUID}) is no longer a viable alternative. The only way to directly access kernel randomness is through an open reference to /dev/urandom or /proc/sys/kernel/random/uuid (the /proc sysctl interface).

That's the crux of the issue. If sysctl was still available then all would be well, other than some bickering over a sysctl versus a dedicated syscall interface.

In short: sysctl(2) is dead for all practical purposes on Linux. Now Linux behaves pretty much like Solaris, which never had sysctl (a later BSD extension). A lack of sysctl is one of the most annoying things about Solaris (although that's a long list).

OS X's arc4random also relies on /dev/urandom, since it copied an early FreeBSD implementation from before FreeBSD added sysctl({CTL_RAND, KERN_ARND}). And it will silently fail if /dev/urandom isn't visible when it initially seeds! And although I've long tried to support systems like OS X, Solaris, and FreeBSD<10.0 which lacked a kernel entropy syscall, I've always considered them second-class citizens in this regard, and willing to live with a disclaimer about possible issues. But now that Linux is second-class in this regard, it's a much more intolerable situation.

Voodoo coding

Posted Jul 14, 2014 20:05 UTC (Mon) by alonz (subscriber, #815) [Link] (4 responses)

By the way—another underutilized source of entropy in Linux programs is the vector returned by getauxval(AT_RANDOM). Sure, it is intended for use by libc (e.g. to produce stack canaries), but when nothing else is available, it can be very valuable.

Voodoo coding

Posted Jul 14, 2014 20:42 UTC (Mon) by wahern (subscriber, #37304) [Link] (3 responses)

Nice. I was unaware of that interface, although it doesn't help with forking, etc.

But it looks like Linux finally supports a fork-safe issetugid implementation. Linux was one of the last systems which didn't provide issetugid or a similar interface for detecting whether the current process or (crucially) an ancestor was setuid or setgid. glibc had a hack in its loader for supporting secure_getenv and similar behavior, but it wasn't guaranteed to work in children because it depended on the real and effective IDs being different, which wouldn't be the case if you effectively dropped privileges.

Voodoo coding

Posted Jul 14, 2014 21:20 UTC (Mon) by wahern (subscriber, #37304) [Link] (2 responses)

Caveat emptor: On OS X issetugid is another broken stub (like pselect) which doesn't actually implement the correct behavior, but apparently thrown in so software can compile while remaining silently, delightfully bug ridden. Although at least the pselect man page documents the broken behavior.

The BSDs and Solaris implement the correct behavior, as does Linux's new getauxval(AT_SECURE). That is, the status is inherited across fork but not exec.

Voodoo coding

Posted Jul 15, 2014 16:41 UTC (Tue) by busterb (subscriber, #560) [Link] (1 responses)

Hmm, that is interesting, I'll check it out.

Solaris 10 and 11.0 also apparently have issues with issetugid, though it kind-of works (they apparently didn't patch it for 10 because not enough software used it yet?)

http://mcarpenter.org/blog/2013/01/15/solaris-issetugid(2)-bug

Though there are more issues building on Solaris 10 so far, so we haven't crossed that bridge yet.

Voodoo coding

Posted Jul 15, 2014 16:55 UTC (Tue) by busterb (subscriber, #560) [Link]

Huh, ran the same test as above for Solaris on OS X 10.9.4, it would appear to have the same issue at first glance:

test: main: issetugid: 1
test: parent: issetugid: 1
test: parent: uid: 1000
test: parent: euid: 0
test: child: issetugid: 0
test: child: uid: 1000
test: child: euid: 0

Voodoo coding

Posted Jul 14, 2014 20:23 UTC (Mon) by wahern (subscriber, #37304) [Link]

For the record (lest somebody try to use me as a strawman), all my code always checked the return value of sysctl and fell back on /dev/urandom. If that failed my apps then went through the typical horrible hacks of manually collecting entropy, although I realize now that was a poor engineering decision--obscuring the Red Hat kernel changes for far too long--and am changing all such code to bail by default.