Python and crypto-strength random numbers by default
There are various types of random number generators (RNGs) that target different use cases, but a programming language can only have one default. For high-security random numbers (e.g. cryptographic keys and the like), it is a grievous error to use the wrong kind of RNG, while other use cases are typically more forgiving. The Python community is in the middle of a debate about how it should be handling random numbers within the language's standard library.
The random module
On the python-ideas mailing list, Guido van Rossum kicked off the discussion by noting that he had been contacted by OpenBSD founder Theo de Raadt about the random.random() RNG that ships with Python. That RNG is based on the Mersenne Twister (MT) algorithm, which is a pseudo-RNG (PRNG) with an enormously long period. The PRNG can either be seeded by the user (with a call to random.seed()) or, by default, it is seeded with 2500 bytes of entropy from the os.urandom() pool (the strong, non-blocking random pool supplied by the operating system). From a given seed, the same sequence of random numbers will be generated, which makes it deterministic.
Using MT-derived random numbers for cryptographic or other security purposes is definitely not recommended by security experts. Given enough output from the RNG, the seed can be recovered, which means that all further random numbers can be perfectly predicted—generally with disastrous results. But for many uses, such as games, testing, or simulations, the random module provides a simple-to-use way to get perfectly adequate random numbers. The Python documentation for the module clearly indicates that it should not be used for security purposes and suggests either os.urandom() or the random.SystemRandom class.
But Van Rossum noted that "as the meme goes -- nobody reads the
docs
", so De Raadt's concern is that people will simply reach for
the default provided by the language—no matter what they are using the
random numbers for. According to Van Rossum, De Raadt is advocating the
arc4random()
RNG that OpenBSD uses (and, in fact, has switched
to by default). Some of that advocacy can be seen in a forwarded email message that Van Rossum
posted. Though De Raadt was invited to participate in the thread,
"he's too
busy
" to do so, which left Van Rossum in a bit of a quandary:
That set off quite a discussion, which now spans four threads in python-ideas.
Donald Stufft responded that
he sees a problem with the current approach, but that he doesn't have any
"great
solution
" for it:
But, as Paul Moore and others pointed out, those with non-security needs for random numbers should be able to expect to access them easily from the standard library. He often uses the random module for games of various sorts, as well as for things like Monte Carlo simulations where millions of results are used. While the reproducibility feature is not particularly important to him, he would expect that a simple API would provide it. People doing crypto or other security programming ought to be looking for an RNG beyond the default:
Steven D'Aprano concurred:
Part of the problem, of course, is that there is a large body of existing code that expects the random module to behave in a certain way. Some of it requires the reproducibility feature and isn't being used for security purposes. Changing the default would break those programs a few years down the road (if the change follows the normal Python deprecation schedule). As Stephen J. Turnbull put it:
Stufft's proposal
That first thread got rather long, so Stufft attempted something of a reboot in a message that was rather long itself. He had talked to De Raadt to try to better understand his concerns, which Stufft said were not being reflected in the earlier thread. He broke down the users of random numbers into three groups: those who need deterministic output (and reproducibility), those who need cryptographic-strength random numbers (and don't want determinism), and those who aren't sure if their use case is security-sensitive or not. The last group generally doesn't need deterministic output, but currently that's what they get by default. It is Stufft's belief (that is shared with De Raadt, he said) that the members of that group would be better served with a crypto-strength RNG.
Those in group #3 might be told to use os.urandom() (or the random.SystemRandom class that is based on it), but they may end up using the standard random functions due to either the performance of the stronger alternative or bad advice (if, indeed, their need is security-sensitive). In fact, Stufft said, the performance of os.urandom() is one of the main reasons not to make it the default:
Stufft looked at StackOverflow and other sites with Python code to see what kinds of advice there was about RNG choices. His post had several snippets of code that used the module-level random functions in dubious ways.
That led him to propose that Python change its random-number handling in several ways. It would provide a means to access a fast user-space RNG like arc4random() and add a new class (random.SomeKindOfRandom) that uses it. The proposal would also move the MT-based RNG to a random.DeterministicRandom class and deprecate the module-level functions in the random module. That would mean that users would (eventually) have to choose the kind of randomness they wanted before they would be able to get random numbers. They would call functions like:
random.SomeKindOfRandom().random() random.DeterministicRandom().random()SystemRandom would still be available for those who wanted that RNG mechanism. Obviously, SomeKindOfRandom would need a better name, though there are objections to the whole idea of doing user-space RNG. Stufft pointed out that security expert Thomas Ptacek is not in favor of user-space RNGs like arc4random().
As might be guessed, Stufft's suggestions were not met with universal acclaim. There were concerns about backward compatibility and for how today's code could be switched, with minimal changes, to run (correctly) in this new world.
But Nick
Coghlan strongly objected to the idea of
deprecating all of the module-level functions. As he pointed out, nearly
all of the module-level functions could provide any kind of random numbers,
though they need to have good performance.
Making someone choose what type they want, before they can
even get
a random number is "a *huge* regression in Python's usability for
educational use cases
". He said that a common "hello world" kind
of program for using random numbers in Python is to roll a six-side die:
>>> from random import randint
>>> randint(1, 6)
6
He continued:
He noted that there are calls at the module-level that imply the need for a deterministic RNG (like seed() and others that are stateful) and which can be deprecated to help those who really need them to move to an alternative API. Essentially, he is asking that those who don't care about the RNG wars (and those new to Python in particular) be left out—they can still call the functions they call today.
Breaking compatibility "in the name of the public good
" is
the wrong approach, Antoine Pitrou said.
He and others are tired of seeing security trump all other considerations
whenever a topic like this is raised. Since a change like Stufft has
suggested can't fix the problem of bad advice on the internet, compromising
on a change that maintains compatibility is preferable.
Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6.
(in case you're wondering, trying to make all published code on the Internet secure by appropriately changing the interpreter's "behaviour" to match erroneous expectations - even *documented* as erroneous - is *not* reasonable - no matter how hard you try, there will always be occurrences of broken code that people copy and paste around)
Coghlan's proposals
Coghlan started another thread as something of a "pre-PEP" on enhancing the random module. It would create three types of RNGs in Python: seedable, seedless, and system. The seedless version would be the default and guidance would be provided that those who need deterministic random numbers should use the seeded version, while those with strong security needs should use the system RNG.
Coghlan's proposal lays out plans for Python 3.6, which is likely destined for late 2016 (or early 2017), and for 3.7, which is probably due sometime in 2018—a change of this nature for Python can take a while, for sure. Essentially, his idea is to deprecate the stateful methods and module-level functions (seed(), getstate(), and setstate()), except for the SeedableRandom class. Those functions would warn the user, but not actually cause an error—except if the default type has been changed.
The most controversial part of Coghlan's proposal would provide a random.set_default_instance() function to specify which of the three types would be used by the module-level functions. Both Stufft and D'Aprano are concerned that would allow any module to affect all of the random numbers generated from that point on, potentially nullifying the choice another module has made.
Beyond that, though, Moore would like to see some kind of cost/benefit analysis. He is
trying to remain open-minded, but is concerned about the disruption for
users, particularly those who don't have any security requirements for
their use of random numbers. While he recognizes that it is somewhat
emotionally stated, the benefits to him seem to be: "Users of code
written based on bad advice will be protected from
the consequences (as long as the code runs on a sufficiently new
version of Python)
".
Based on some of that feedback, Coghlan went ahead and drafted PEP 504 that simplifies earlier proposals. It dispenses with the user-space CSPRNG (e.g. arc4random() or something similar) and changes the default to the system RNG (i.e. random.urandom()). If a program uses random.seed() or the other stateful calls (i.e. getstate() or setstate()), the random module will switch to the existing MT-based deterministic PRNG.
He proposes that this be done for 3.6, with a
silent-by-default deprecation warning for the fallback to MT. In 3.7 it
would instead be a visible-by-default runtime warning.
In his announcement of the PEP, he said: "That approach would provide a definite security improvement over the
status quo, while restricting the compatibility break to a performance
regression in applications that use the module level API without
calling seed(), getstate() or setstate().
"
Van Rossum's reaction
But it turns out that Van Rossum is not
particularly happy with that PEP as the end result of the discussion.
He stopped following the "mega-threads
", but doesn't see much
value in changing things:
I am fine with adding more secure ways of generating random numbers. But we already have random.SystemRandom(), so there doesn’t seem to be a hurry?
Though Stufft reiterated his belief that
the existing module "guides you towards using an
insecure source of random numbers rather than a secure one
",
Van Rossum was not impressed with that
argument:
The discussion is continuing at the time of this writing, but there is a
sense that adding access to a fast user-space PRNG, perhaps based
on ChaCha20
(like arc4random()), may be in the offing. A change in the
default would leave some
users and programs behind eventually, but it would seem that most believe users who
require deterministic
random numbers make up a small minority. Van Rossum, though, seems
unconvinced that there is much urgency for a solution. He suggested adding
the new method for 3.6, but waiting "a few releases
" to decide
if any change to the default was warranted. "Security isn't served
well by panicky over-reaction
", he said.
There is plenty of time left in the 3.6 development schedule, so perhaps other proposals and ideas will eventually win the day. But with Van Rossum firmly opposed to the changes that have been proposed so far, a big change seems unlikely for 3.6. Only time will tell.
Index entries for this article | |
---|---|
Security | Python |
Security | Random number generation |
Posted Sep 16, 2015 17:38 UTC (Wed)
by jtaylor (subscriber, #91739)
[Link] (1 responses)
Posted Sep 16, 2015 18:49 UTC (Wed)
by jimparis (guest, #38647)
[Link]
This paper describes PRNG attacks and has some real-world examples of a many PHP applications with PRNGs that were vulnerable in some form. It seems like the most frequent attack is in things like password reset tokens: request a password reset yourself, check your email and figure out the server's PRNG state, request a password reset for your victim, and use the known PRNG state to predict their token:
This page describes an online betting-type game where the attacker was able to predict results from previous ones:
These slides describe an attack on WPS that involves figuring out the PRNG state (slide 15):
Posted Sep 16, 2015 20:56 UTC (Wed)
by fredrik (subscriber, #232)
[Link]
As a python novice, my impression is that I spend more time reading documentation for python modules than for other languages. Perhaps because I find the documentation for a lot of modules a pleasure to read. They give a swift introduction, offer relevant examples, mention relevant implementation details, and sometimes suggest other modules that perhaps are better suited to some more tangential use case.
So, perhaps an alternative proposal is to ensure that the documentation for random.random() is very entertaining, so entertaining that it competes with 'from __future__ import braces' et.al.
Posted Sep 16, 2015 22:51 UTC (Wed)
by xorbe (guest, #3165)
[Link] (1 responses)
Posted Sep 17, 2015 14:42 UTC (Thu)
by robbe (guest, #16131)
[Link]
Posted Sep 17, 2015 8:20 UTC (Thu)
by hthoma (subscriber, #4743)
[Link] (3 responses)
Posted Sep 17, 2015 13:03 UTC (Thu)
by njh (subscriber, #4425)
[Link] (2 responses)
The article suggest that the existing MT algorithm is seeded from urandom by default, if no seed value is provided.
Why not return values from the MT algorithm if random.random() is called after an explicit seed has been set, but switch to a newer CSPRNG algorithm if random.random() is called without seeding? That would mean that existing Monte Carlo simulation code (and similar) would continue to give results that matched runs from before the change, but stronger random numbers would be produced by default in cases where reproducibility isn't a concern.
Posted Sep 17, 2015 19:10 UTC (Thu)
by droundy (subscriber, #4559)
[Link]
Posted Sep 17, 2015 21:21 UTC (Thu)
by njs (subscriber, #40338)
[Link]
AFAICT, the only reason not to (besides the usual one that it would take some work to implement and maintain) is that Guido considers it "a hack" :-(
Posted Sep 17, 2015 12:12 UTC (Thu)
by ianmcc (subscriber, #88379)
[Link]
For numerical applications (Monte Carlo) you want deterministic random numbers, but seeded non-deterministically (but with the seed stashed away somewhere, so that you can reproduce the calculation later if necessary).
For a game, it would vary - if its a networked multiplayer game then you may well want it to be deterministic, so that multiple clients can do the same 'dice rolls' and stay in sync. If its an online poker game then you definitely want it to be non-deterministic (but you'd probably get the random numbers from a server somewhere anyway).
In all of these cases you'd want to choose very carefully the random number generator and how its used. No 'default' generator will cover all of these cases. The default generator should only be used for 'throwaway' applications, where you need some randomness but don't care whether its deterministic or secure, as long as it meets some minimum level of quality. And for that, MT is already overkill.
Posted Sep 17, 2015 14:39 UTC (Thu)
by ssam (guest, #46587)
[Link] (2 responses)
Posted Sep 17, 2015 14:56 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted Sep 17, 2015 23:44 UTC (Thu)
by xorbe (guest, #3165)
[Link]
Posted Sep 18, 2015 15:46 UTC (Fri)
by diegor (subscriber, #1967)
[Link] (1 responses)
http://www.rocketaware.com/man/man3/random.3.htm
If we follow the Theo's reasoning, we should expect that random() in standard C library was using arc4random.
So I wonder why Theo suggests that python should do it?
Posted Sep 18, 2015 22:21 UTC (Fri)
by wahern (subscriber, #37304)
[Link]
OpenBSD changed the behavior starting with OpenBSD 5.7.
Posted Sep 18, 2015 19:20 UTC (Fri)
by david.a.wheeler (subscriber, #72896)
[Link] (8 responses)
E.G., in Java, SecureRandom provides a cryptographically strong random number generator (RNG), while Random does not. See: http://docs.oracle.com/javase/7/docs/api/java/security/Se...
Posted Sep 18, 2015 22:36 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (7 responses)
Theo did his homework before arriving at his decision: "I have spent the last week researching all the uses of the srand(), srandom(), and srand48() subsystems in the ports tree." (https://lwn.net/Articles/625562/). So he had a better idea than most about the potential impact.
Posted Sep 19, 2015 0:14 UTC (Sat)
by anselm (subscriber, #2796)
[Link] (6 responses)
If people are sloppy about random numbers in their cryptographic code, why would anyone want to assume that they are not just as sloppy about the rest of their cryptographic code? Sure, we can arrange for the RNG to get “upgraded” to fix this, but this may end up being the least of our worries.
Posted Sep 19, 2015 2:51 UTC (Sat)
by wahern (subscriber, #37304)
[Link] (2 responses)
2) Similar to #1, sometimes even if your task doesn't require cryptographic resilience it's still a bad idea to use random, et al. The seeding functions often leak state about your process (see jimparis' first 1st). I never use non-CSPRNGs when the results will leak directly or indirectly (e.g. sorting order) over the network. But most people don't even think about this, even experienced engineers. Leaking your PID, system time, etc, is not smart, particularly when it's trivial to avoid doing so.
3) You could make this argument about anything--we can't do it perfectly, so let's do nothing. In this case it's a trivial solution with minimal costs that could offer much benefit. Reasonable people can disagree, of course. But at the end of the day opinions should give way to empirical data. Theo compiled some empirical data and made a decision. Furthermore, implementing this doesn't preclude taking other measures in other contexts.
Posted Sep 19, 2015 11:12 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
Posted Sep 20, 2015 16:57 UTC (Sun)
by apoelstra (subscriber, #75205)
[Link]
4) Failure modes for RNGs are often undetectable by testing. You can (and should) look for really specific failure modes, like always outputting the same value, but in general if you use a bad RNG the rest of your program's performance and correctness will not be affected in the least. Just security.
5) The output of a bad RNG can stick around forever. If you've got a long-lived private key, if it turns out it was generated poorly then you are screwed, even if this was years ago. (Now that I think about it, I have no recollection of what software or version I used to generate my PGP keys. So in principle any RNG failure in the news could be affecting me.) Combine this with #4 and you have a very dangerous situation indeed.
Compare this to other crypto failure modes, like leaving secure data in memory, timing side-channels, etc., which go away once the software is fixed (or even when the software stops running).
Posted Sep 20, 2015 5:29 UTC (Sun)
by njs (subscriber, #40338)
[Link] (2 responses)
Obviously you and I know that computer "random numbers" are this weird thing that are almost-like-random but with extremely subtle failures that only matter in certain obscure but high-stakes contexts... but it's not really newbie programmers' fault that they don't automatically know this. (Sure, there's a warning in the docs, but if you don't know to look for it...)
Posted Sep 23, 2015 1:35 UTC (Wed)
by dvdeug (guest, #10998)
[Link] (1 responses)
* E.g. exists a, b, i, j such that (a > 0.0) && (b + a == b) or (i > 0) && (j > 0) && (i + j < 0).
Posted Sep 23, 2015 15:26 UTC (Wed)
by apoelstra (subscriber, #75205)
[Link]
CSPRNGs are also deterministic. Determinism is not what burns cryptographically-inexperienced programmers.
Posted Sep 19, 2015 5:51 UTC (Sat)
by reubenhwk (guest, #75803)
[Link] (4 responses)
Posted Sep 19, 2015 5:54 UTC (Sat)
by reubenhwk (guest, #75803)
[Link] (3 responses)
Posted Sep 24, 2015 14:29 UTC (Thu)
by dag- (guest, #30207)
[Link] (2 responses)
Yours is an excellent use-case :-)
Posted Oct 2, 2015 8:07 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Immutability, good. Side-effects, bad.
Posted Oct 2, 2015 14:47 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link]
> Because of dangling replies.
Well, dag- did say "if there was no reply to it", which would address the problem of dangling replies.
Even without that restriction, a simple "this comment has been retracted by the author" placeholder message with a link to the original text would seem to me to be a reasonable compromise.
Python and crypto-strength random numbers by default
By good I mean for example a properly seeded MT or similar, not hilariously broken stuff like linear congruential with poor parameters or the Debian ssl key bug.
Python and crypto-strength random numbers by default
http://www.icir.org/vern/papers/witty-imc05.pdf
https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_...
http://jonasnick.github.io/blog/2015/07/08/exploiting-csg...
http://www.slideshare.net/0xcite/offline-bruteforce-attac...
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
The default implementation should be
Python and crypto-strength random numbers by default
def random():
return 4
If that does not meet your requirements, then you can select the random generator based on what you need.
Cue the obligatory Dilbert link.
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Random vs. Cryptographically random are typically separate
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default
Python and crypto-strength random numbers by default