OpenBSD routes around POSIX
Theo de Raadt is unhappy with the current state of the standards-mandated "random" number generation in OpenBSD, so he is planning to largely ignore those standards going forward. Essentially he contends that by conflating two different modes of using random numbers, the standards consign all random-number-using programs to getting poor random numbers. He intends to change the behavior such that programs have to opt-in to the poorer quality randomness, but that means OpenBSD's random-number API will no longer be compliant with standards like POSIX, C89, and others.
The API in question is really three families of functions: rand()/srand(), random()/srandom(), and the xrand48()/srand48() functions. According to De Raadt's message to the OpenBSD tech mailing list, each of the three provides the same basic functionality and is used in one of two ways by programs in the OpenBSD ports tree: either always getting seeded (using srand() and friends) with something hard to predict (thus indicating an interest in good random numbers) or providing a means for the user to set the seed (to reproduce a sequence of "random" numbers). Because the same interface has to produce both, it dooms those wanting good random numbers to using a random number generator (RNG) that can produce reproducible results—"bad" random numbers in his eyes.
If seed values are constant or saved and passed into the seeding functions (e.g. srand()), the same sequence of random numbers must be produced, according to the standards—or maybe not; it could be implementation-dependent. Reusing the same random numbers can be useful for benchmarking, debugging, re-running simulations, or replaying that perfect game of Nethack. But it requires a certain kind of underlying RNG algorithm—one that is not periodically mixed with hard-to-predict (and impossible to reproduce) values, as modern cryptographic RNGs are. Some programs truly need the ability to replay random streams, which necessitates using weaker algorithms, but those that don't need it should not have to suffer with weaker random numbers.
While De Raadt speculated that perhaps this API came about from a similar process that produced the backdoored Dual_EC_DRBG algorithm, that seems rather unlikely. But the state of RNGs has certainly advanced since the standards were published, as has the understanding of (some of) those using them. De Raadt is an advocate of the arc4random() API, which does not provide any way to replay a stream of random numbers. His plan is to essentially replace the POSIX, et al. API with an arc4random()-based implementation—ignoring any requests for replaying the stream.
arc4random() is a cryptographic RNG that gets reseeded periodically from the OpenBSD kernel random number subsystem. It is similar to the Linux /dev/urandom device and the recently added getrandom() system call.
The idea is that no programs in the ports tree (or otherwise built for OpenBSD) would need to change, but when calling rand() and friends would suddenly start getting stronger random numbers coming from arc4random(). Any seed value provided would be ignored, so programs that depend on that mode will need to change. A call to srand_deterministic() (or the equivalent for the other families) would switch the RNG from arc4random() mode to the older, weaker algorithm that can provide reproducible random streams.
Based on his analysis, a little over 100 of the 8800 packages in ports will require an addition of a call to srand_deterministic() to get the old behavior. That means 8700 packages will now get strong random numbers if they use random numbers at all (1285 packages use the random API directly, but some of those are libraries, which may be used by other packages). The cost is:
Since OpenBSD releases its kernel and ports in lockstep, there should be fewer problems than if Linux (or Glibc, really) tried to do this. There is also the small matter of standards compliance. The Linux kernel has sometimes ignored standards when there was a good reason, but Glibc tends to be a lot more conservative about such changes, so switching to non-deterministic random numbers for rand() and friends is unlikely to ever happen for Linux. That's too bad, in some ways, since De Raadt is right about the weakness of the random API. Someday, attackers may find a way to predict the random numbers generated by some program on Linux—with disastrous consequences.
Index entries for this article | |
---|---|
Security | Random number generation |
Posted Dec 11, 2014 2:02 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link] (6 responses)
Seems to me that it would be better to use a new API for applications that need truly random numbers rather than breaking portability on the old one.
Posted Dec 11, 2014 3:45 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 11, 2014 4:38 UTC (Thu)
by jzbiciak (guest, #5246)
[Link] (1 responses)
I actually use a deterministic random number generator in many of my programs, exactly for the purpose of reproducibility. But, I'm writing programs such as "random test generators" where a given input plus a given seed should produce the same test every time. I also therefore use a dedicated random number class. (C++11 makes this very easy and very nice.) In fact, I use what I call "forked" random number generators, where a master generator (seeded by the test case seed) seeds child generators that I then hand to various subsystems. (Those children can be forked further for sub-subsystems, etc.) The behavior of the whole tree depends on the original seed, but a tweak in behavior in one subsystem doesn't disturb the stream of pseudo-random numbers in another subsystem. This makes debugging and testing go much more smoothly. All this to say: If you're writing simulation software that relies on the quality of the random numbers, reproducibility, and portability across multiple platforms, you likely abandoned the standard library rand()/random()/xrand48() a long time ago. You probably grabbed a copy of Mersenne Twister or any number of other, better generators and wrote your own wrappers around these things. I know I did eons ago. (And now with C++11's excellent random number library, I'm doing it with much less pain.) I guess a part of me is a little surprised that 8800 packages showed up using rand() and friends. Then again, how many of them are actually negatively affected by bad random numbers?
Posted Dec 11, 2014 17:46 UTC (Thu)
by alankila (guest, #47141)
[Link]
Posted Dec 11, 2014 5:44 UTC (Thu)
by rsidd (subscriber, #2582)
[Link]
Posted Dec 22, 2014 15:51 UTC (Mon)
by mirabilos (subscriber, #84359)
[Link] (1 responses)
Besides, POSIX doesn’t guarantee reproducibility between exec() anyway…
Posted Dec 22, 2014 16:33 UTC (Mon)
by raven667 (subscriber, #5198)
[Link]
Posted Dec 11, 2014 2:28 UTC (Thu)
by josh (subscriber, #17465)
[Link] (6 responses)
1) For applications that never seed the RNG, automatically use a cryptographically secure RNG.
2) For applications that do seed the RNG, use the result to seed a deterministic but otherwise secure PRNG (*not* a simple linear congruential generator).
3) Emit a warning whenever an application makes a call to time() and subsequently makes a call to seed the RNG with the same value.
Posted Dec 11, 2014 7:29 UTC (Thu)
by butlerm (subscriber, #13312)
[Link] (4 responses)
"If rand() is called before any calls to srand() are made, the same sequence shall be generated as when srand() is first called with a seed value of 1."
Emitting a warning could be problematic too, unless it were done during compilation instead of at runtime of course.
Posted Dec 11, 2014 9:57 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (3 responses)
I imagine that the number of packages rely on this (not terribly useful...) property is even smaller than the number of OpenBSD that Theo wants to "break".
Breaking backward compatibility is bad. Preserving "insecure by default" behaviours is worse?
Posted Dec 11, 2014 14:14 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
Posted Dec 11, 2014 17:40 UTC (Thu)
by itvirta (guest, #49997)
[Link]
But it's hard to know if a given use of rand() is security-sensitive or not.
Posted Dec 11, 2014 18:01 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
This is why OpenBSD strives to be "secure by default" in all possible ways. Seems reasonably successful so far.
Posted Dec 11, 2014 15:52 UTC (Thu)
by lambda (subscriber, #40735)
[Link]
Ted Unangst wrote a good blog post on all of the broken, bizarre ways that people seed random number generators in programs. Theo de Raadt also listed all of the arbitrary constant values people use as seeds. It goes way beyond just seeding it with time().
The sheer amount of dubious use of constant and other arbitrary seed values, coupled with the relatively low number of packages that actually depend on deterministic behavior, is part of the justification they used for doing the breaking change. Now, such a breaking change really wouldn't work well in the Linux world, but if you want to root this out, you're going to need to do better than just looking for the output of time().
Posted Dec 12, 2014 12:38 UTC (Fri)
by epa (subscriber, #39769)
[Link] (1 responses)
I want to be able to choose a seed so that I can repeat the same pseudo-random test case I did last time, but that doesn't mean I want any https: requests made by my test code to silently start using a weaker RNG. Partly this can be solved by having an explicit API for stronger random numbers: arc4random() will always be strong, no matter what srand() type things have gone on in the background. But it would also be good to have an explicit API for pseudo-random number generation, ideally with the new seed value returned each time so it can be passed along by the caller.
Posted Dec 12, 2014 12:57 UTC (Fri)
by cesarb (subscriber, #6266)
[Link]
If your program or library uses the rand()/random()/etc family of calls, it's supposed to be able to work fine with a deterministic RNG (after all, that's what you get in non-OpenBSD systems).
The non-deterministic RNG for rand()/random()/etc is a bonus, like ASLR: if your program's security depends on it, you're doing something wrong.
For uses of random numbers that are unrelated to cryptography, the ability to replay the sequence is an important feature. This is particularly true for simulation: if a bug is found from random simulation, fuzzing, or the like, you want to be able to find it again. If the seed is recorded the interesting traces can be replayed, and seeds can be produced using truly random processes to sample more of the simulation space.
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
OpenBSD routes around POSIX
http://pubs.opengroup.org/onlinepubs/9699919799/functions...
OpenBSD routes around POSIX
Anyone who uses rand() for anything remotely security sensitive is sufficiently clueless about security (and C programming) that trying to remove the "sharp implements" from their environment is a lost cause.
OpenBSD routes around POSIX
OpenBSD routes around POSIX
It seems the argument here is that this is an easy solution to a certain class of
possible problems. Certainly easier than actually going through all the uses and,
educating the coders and replacing the calls with something more suitable.
Supposedly computers are fast enough to use a better rng, too.
OpenBSD routes around POSIX
OpenBSD routes around POSIX
3) Emit a warning whenever an application makes a call to time() and subsequently makes a call to seed the RNG with the same value.
Per-process choosing could be problematic
Per-process choosing could be problematic