OpenBSD routes around POSIX

By Jake Edge
December 10, 2014

Theo de Raadt is unhappy with the current state of the standards-mandated "random" number generation in OpenBSD, so he is planning to largely ignore those standards going forward. Essentially he contends that by conflating two different modes of using random numbers, the standards consign all random-number-using programs to getting poor random numbers. He intends to change the behavior such that programs have to opt-in to the poorer quality randomness, but that means OpenBSD's random-number API will no longer be compliant with standards like POSIX, C89, and others.

The API in question is really three families of functions: rand()/srand(), random()/srandom(), and the xrand48()/srand48() functions. According to De Raadt's message to the OpenBSD tech mailing list, each of the three provides the same basic functionality and is used in one of two ways by programs in the OpenBSD ports tree: either always getting seeded (using srand() and friends) with something hard to predict (thus indicating an interest in good random numbers) or providing a means for the user to set the seed (to reproduce a sequence of "random" numbers). Because the same interface has to produce both, it dooms those wanting good random numbers to using a random number generator (RNG) that can produce reproducible results—"bad" random numbers in his eyes.

If seed values are constant or saved and passed into the seeding functions (e.g. srand()), the same sequence of random numbers must be produced, according to the standards—or maybe not; it could be implementation-dependent. Reusing the same random numbers can be useful for benchmarking, debugging, re-running simulations, or replaying that perfect game of Nethack. But it requires a certain kind of underlying RNG algorithm—one that is not periodically mixed with hard-to-predict (and impossible to reproduce) values, as modern cryptographic RNGs are. Some programs truly need the ability to replay random streams, which necessitates using weaker algorithms, but those that don't need it should not have to suffer with weaker random numbers.

While De Raadt speculated that perhaps this API came about from a similar process that produced the backdoored Dual_EC_DRBG algorithm, that seems rather unlikely. But the state of RNGs has certainly advanced since the standards were published, as has the understanding of (some of) those using them. De Raadt is an advocate of the arc4random() API, which does not provide any way to replay a stream of random numbers. His plan is to essentially replace the POSIX, et al. API with an arc4random()-based implementation—ignoring any requests for replaying the stream.

arc4random() is a cryptographic RNG that gets reseeded periodically from the OpenBSD kernel random number subsystem. It is similar to the Linux /dev/urandom device and the recently added getrandom() system call.

The idea is that no programs in the ports tree (or otherwise built for OpenBSD) would need to change, but when calling rand() and friends would suddenly start getting stronger random numbers coming from arc4random(). Any seed value provided would be ignored, so programs that depend on that mode will need to change. A call to srand_deterministic() (or the equivalent for the other families) would switch the RNG from arc4random() mode to the older, weaker algorithm that can provide reproducible random streams.

Based on his analysis, a little over 100 of the 8800 packages in ports will require an addition of a call to srand_deterministic() to get the old behavior. That means 8700 packages will now get strong random numbers if they use random numbers at all (1285 packages use the random API directly, but some of those are libraries, which may be used by other packages). The cost is:

There may be a few broken because the have a deeply hidden dependency on determinism. Sorry, but that is what it is going to take.

Since OpenBSD releases its kernel and ports in lockstep, there should be fewer problems than if Linux (or Glibc, really) tried to do this. There is also the small matter of standards compliance. The Linux kernel has sometimes ignored standards when there was a good reason, but Glibc tends to be a lot more conservative about such changes, so switching to non-deterministic random numbers for rand() and friends is unlikely to ever happen for Linux. That's too bad, in some ways, since De Raadt is right about the weakness of the random API. Someday, attackers may find a way to predict the random numbers generated by some program on Linux—with disastrous consequences.

Index entries for this article
Security	Random number generation

OpenBSD routes around POSIX

Posted Dec 11, 2014 2:02 UTC (Thu) by JoeBuck (subscriber, #2330) [Link] (6 responses)

For uses of random numbers that are unrelated to cryptography, the ability to replay the sequence is an important feature. This is particularly true for simulation: if a bug is found from random simulation, fuzzing, or the like, you want to be able to find it again. If the seed is recorded the interesting traces can be replayed, and seeds can be produced using truly random processes to sample more of the simulation space.

Seems to me that it would be better to use a new API for applications that need truly random numbers rather than breaking portability on the old one.

OpenBSD routes around POSIX

Posted Dec 11, 2014 3:45 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

I think that de Raadt probably sees the problem as serious enough to warrant the change. There are functions such as srand_deterministic and rand_deterministic already. But these have always been platform-specific anyways (so expecting Windows to give the same stream as Linux (or even glibc and musl) was already a lost cause for replays and such). de Raadt was also nice enough to come up with a list of projects which seem to actually expect the "deterministic" API, but that is of course limited to the OpenBSD ports selection.

OpenBSD routes around POSIX

Posted Dec 11, 2014 4:38 UTC (Thu) by jzbiciak (guest, #5246) [Link] (1 responses)

I actually use a deterministic random number generator in many of my programs, exactly for the purpose of reproducibility. But, I'm writing programs such as "random test generators" where a given input plus a given seed should produce the same test every time.

I also therefore use a dedicated random number class. (C++11 makes this very easy and very nice.) In fact, I use what I call "forked" random number generators, where a master generator (seeded by the test case seed) seeds child generators that I then hand to various subsystems. (Those children can be forked further for sub-subsystems, etc.)

The behavior of the whole tree depends on the original seed, but a tweak in behavior in one subsystem doesn't disturb the stream of pseudo-random numbers in another subsystem. This makes debugging and testing go much more smoothly.

All this to say: If you're writing simulation software that relies on the quality of the random numbers, reproducibility, and portability across multiple platforms, you likely abandoned the standard library rand()/random()/xrand48() a long time ago. You probably grabbed a copy of Mersenne Twister or any number of other, better generators and wrote your own wrappers around these things. I know I did eons ago. (And now with C++11's excellent random number library, I'm doing it with much less pain.)

I guess a part of me is a little surprised that 8800 packages showed up using rand() and friends. Then again, how many of them are actually negatively affected by bad random numbers?

OpenBSD routes around POSIX

Posted Dec 11, 2014 17:46 UTC (Thu) by alankila (guest, #47141) [Link]

Since you mention MT, I thought I'll add a bit here. My new geek hero, Melissa O'Neill, recently invented a new family of random number generators called PCG, where P stands for permuted. It's basically linear congruential generator combined to an output improving step, usually some variant of xor and shifting, but the consequence is a generator whose performance is near theoretical optimum, as it seems to pass statistical tests using fewest possible bits of state, e.g. mere 32 bits suffice to pass the SmallCrush test suite.

OpenBSD routes around POSIX

Posted Dec 11, 2014 5:44 UTC (Thu) by rsidd (subscriber, #2582) [Link]

Most people who do scientific simulations don't use the OS RNG functions -- they use, eg, the GSL versions (which support specifying a seed and replaying) or something else. Mainly, the OS-supplied versions are unreliable -- even if the OpenBSD version is excellent, the program needs to be portable.

OpenBSD routes around POSIX

Posted Dec 22, 2014 15:51 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (1 responses)

If you want something that can be replayed, it’s not “random”. You list valid use cases, but I question that they can be use cases for an RNG. Sure, do it, but please don’t call it RNG, not even PRNG. Implementing an LFSR, LCG, etc. is dead easy.

Besides, POSIX doesn’t guarantee reproducibility between exec() anyway…

OpenBSD routes around POSIX

Posted Dec 22, 2014 16:33 UTC (Mon) by raven667 (subscriber, #5198) [Link]

I'm pretty sure that any PRNG given the same inputs will produce the same outputs and that is randomness that is suitable for crypto purposes. The randomness is introduced by seeding with something unpredictable and not knowing how far into the stream you have read not that the values produced will be unpredictable if you know the seed and how many bytes have been read.

OpenBSD routes around POSIX

Posted Dec 11, 2014 2:28 UTC (Thu) by josh (subscriber, #17465) [Link] (6 responses)

It seems like glibc could safely use a similar but slightly more conservative approach. rand() and family are deterministic but system-specific; the same seed value is *not* guaranteed to produce the same result on other systems, or even on a different version of the same system. So:

1) For applications that never seed the RNG, automatically use a cryptographically secure RNG.

2) For applications that do seed the RNG, use the result to seed a deterministic but otherwise secure PRNG (*not* a simple linear congruential generator).

3) Emit a warning whenever an application makes a call to time() and subsequently makes a call to seed the RNG with the same value.

OpenBSD routes around POSIX

Posted Dec 11, 2014 7:29 UTC (Thu) by butlerm (subscriber, #13312) [Link] (4 responses)

Generating non-deterministic output if no seed is set is a violation of the POSIX and ISO C standards, unfortunately, which state:

"If rand() is called before any calls to srand() are made, the same sequence shall be generated as when srand() is first called with a seed value of 1."
http://pubs.opengroup.org/onlinepubs/9699919799/functions...

Emitting a warning could be problematic too, unless it were done during compilation instead of at runtime of course.

OpenBSD routes around POSIX

Posted Dec 11, 2014 9:57 UTC (Thu) by marcH (subscriber, #57642) [Link] (3 responses)

> "If rand() is called before any calls to srand() are made, the same sequence shall be generated as when srand() is first called with a seed value of 1."

I imagine that the number of packages rely on this (not terribly useful...) property is even smaller than the number of OpenBSD that Theo wants to "break".

Breaking backward compatibility is bad. Preserving "insecure by default" behaviours is worse?

OpenBSD routes around POSIX

Posted Dec 11, 2014 14:14 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (2 responses)

Anyone who uses rand() for anything remotely security sensitive is sufficiently clueless about security (and C programming) that trying to remove the "sharp implements" from their environment is a lost cause.

OpenBSD routes around POSIX

Posted Dec 11, 2014 17:40 UTC (Thu) by itvirta (guest, #49997) [Link]

Yes.

But it's hard to know if a given use of rand() is security-sensitive or not.
It seems the argument here is that this is an easy solution to a certain class of
possible problems. Certainly easier than actually going through all the uses and,
educating the coders and replacing the calls with something more suitable.
Supposedly computers are fast enough to use a better rng, too.

OpenBSD routes around POSIX

Posted Dec 11, 2014 18:01 UTC (Thu) by marcH (subscriber, #57642) [Link]

Every time anyone says or writes "Everyone just do this!", a kitten dies. Or worse.

This is why OpenBSD strives to be "secure by default" in all possible ways. Seems reasonably successful so far.

OpenBSD routes around POSIX

Posted Dec 11, 2014 15:52 UTC (Thu) by lambda (subscriber, #40735) [Link]

3) Emit a warning whenever an application makes a call to time() and subsequently makes a call to seed the RNG with the same value.

Ted Unangst wrote a good blog post on all of the broken, bizarre ways that people seed random number generators in programs. Theo de Raadt also listed all of the arbitrary constant values people use as seeds. It goes way beyond just seeding it with time().

The sheer amount of dubious use of constant and other arbitrary seed values, coupled with the relatively low number of packages that actually depend on deterministic behavior, is part of the justification they used for doing the breaking change. Now, such a breaking change really wouldn't work well in the Linux world, but if you want to root this out, you're going to need to do better than just looking for the output of time().

Per-process choosing could be problematic

Posted Dec 12, 2014 12:38 UTC (Fri) by epa (subscriber, #39769) [Link] (1 responses)

So what happens when some library calls srand_deterministic() and changes the behaviour for the rest of the program? Or when the main program calls srand_deterministic() and thus affects any libraries it calls?

I want to be able to choose a seed so that I can repeat the same pseudo-random test case I did last time, but that doesn't mean I want any https: requests made by my test code to silently start using a weaker RNG. Partly this can be solved by having an explicit API for stronger random numbers: arc4random() will always be strong, no matter what srand() type things have gone on in the background. But it would also be good to have an explicit API for pseudo-random number generation, ideally with the new seed value returned each time so it can be passed along by the caller.

Per-process choosing could be problematic

Posted Dec 12, 2014 12:57 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> So what happens when some library calls srand_deterministic() and changes the behaviour for the rest of the program? Or when the main program calls srand_deterministic() and thus affects any libraries it calls?

If your program or library uses the rand()/random()/etc family of calls, it's supposed to be able to work fine with a deterministic RNG (after all, that's what you get in non-OpenBSD systems).

The non-deterministic RNG for rand()/random()/etc is a bonus, like ASLR: if your program's security depends on it, you're doing something wrong.