LWN: Comments on "Python and crypto-strength random numbers by default"

Python and crypto-strength random numbers by default

nybble41 — Fri, 02 Oct 2015 14:47:44 +0000

>> Do you also wonder why one cannot remove a comment (e.g. if there was no reply to it) ?

> Because of dangling replies.

Well, dag- did say "if there was no reply to it", which would address the problem of dangling replies.

Even without that restriction, a simple "this comment has been retracted by the author" placeholder message with a link to the original text would seem to me to be a reasonable compromise.

Python and crypto-strength random numbers by default

marcH — Fri, 02 Oct 2015 08:07:43 +0000

Because of dangling replies.

Immutability, good. Side-effects, bad.

Python and crypto-strength random numbers by default

dag- — Thu, 24 Sep 2015 14:29:21 +0000

Do you also wonder why one cannot remove a comment (e.g. if there was no reply to it) ?

Yours is an excellent use-case :-)

Random vs. Cryptographically random are typically separate

apoelstra — Wed, 23 Sep 2015 15:26:28 +0000

> If they don't understand that random.random uses a deterministic RNG, then they probably have a lot of other problems in programming.

CSPRNGs are also deterministic. Determinism is not what burns cryptographically-inexperienced programmers.

Random vs. Cryptographically random are typically separate

dvdeug — Wed, 23 Sep 2015 01:35:42 +0000

Computer "numbers" are these weird things that are almost-like-numbers with fairly non-subtle failure modes in many contexts.* A lot of learning computer programming is learning all these details where things don't work the way one would naively expect them to. If they don't understand that random.random uses a deterministic RNG, then they probably have a lot of other problems in programming.

* E.g. exists a, b, i, j such that (a > 0.0) && (b + a == b) or (i > 0) && (j > 0) && (i + j < 0).

Random vs. Cryptographically random are typically separate

apoelstra — Sun, 20 Sep 2015 16:57:10 +0000

I have a couple more:

4) Failure modes for RNGs are often undetectable by testing. You can (and should) look for really specific failure modes, like always outputting the same value, but in general if you use a bad RNG the rest of your program's performance and correctness will not be affected in the least. Just security.

5) The output of a bad RNG can stick around forever. If you've got a long-lived private key, if it turns out it was generated poorly then you are screwed, even if this was years ago. (Now that I think about it, I have no recollection of what software or version I used to generate my PGP keys. So in principle any RNG failure in the news could be affecting me.) Combine this with #4 and you have a very dangerous situation indeed.

Compare this to other crypto failure modes, like leaving secure data in memory, timing side-channels, etc., which go away once the software is fixed (or even when the software stops running).

Random vs. Cryptographically random are typically separate

njs — Sun, 20 Sep 2015 05:29:31 +0000

I've come around to feeling that it's unfair to call programmers "sloppy" because they thought that a function called random.random() would return random numbers. Everyone understands how dice and coin flips work, and that's how random number generators are always taught to new programmers, and if random.random() actually worked the way dice and coin flips worked then it would be safe to use for any purpose :-/.

Obviously you and I know that computer "random numbers" are this weird thing that are almost-like-random but with extremely subtle failures that only matter in certain obscure but high-stakes contexts... but it's not really newbie programmers' fault that they don't automatically know this. (Sure, there's a warning in the docs, but if you don't know to look for it...)

Random vs. Cryptographically random are typically separate

kleptog — Sat, 19 Sep 2015 11:12:02 +0000

There's still the documentation question, I see that the warning about it not being cryptographically secure was added in the 2.7 docs, prior to that you had to read the wall of text to discover that. The default behaviour of random() now if you don't seed it is to read 32 bytes of urandom (if available) and use that as seed. If you're not generating lots of random numbers this should be sufficient. You wouldn't want to generate crypto keys that way, but it's not that bad.

Python and crypto-strength random numbers by default

reubenhwk — Sat, 19 Sep 2015 05:54:13 +0000

Nevermind... Got to the end of the article...

Python and crypto-strength random numbers by default

reubenhwk — Sat, 19 Sep 2015 05:51:21 +0000

Why arc4random? Chacha8 is much more secure and faster.

Random vs. Cryptographically random are typically separate

wahern — Sat, 19 Sep 2015 02:51:36 +0000

1) Sometimes people don't realize that their code requires cryptographic resilience. Just look at the examples posted above by jimparis. None of those involve obvious cases like generating cipher keys.

2) Similar to #1, sometimes even if your task doesn't require cryptographic resilience it's still a bad idea to use random, et al. The seeding functions often leak state about your process (see jimparis' first 1st). I never use non-CSPRNGs when the results will leak directly or indirectly (e.g. sorting order) over the network. But most people don't even think about this, even experienced engineers. Leaking your PID, system time, etc, is not smart, particularly when it's trivial to avoid doing so.

3) You could make this argument about anything--we can't do it perfectly, so let's do nothing. In this case it's a trivial solution with minimal costs that could offer much benefit. Reasonable people can disagree, of course. But at the end of the day opinions should give way to empirical data. Theo compiled some empirical data and made a decision. Furthermore, implementing this doesn't preclude taking other measures in other contexts.

Random vs. Cryptographically random are typically separate

anselm — Sat, 19 Sep 2015 00:14:05 +0000

If people are sloppy about random numbers in their cryptographic code, why would anyone want to assume that they are not just as sloppy about the rest of their cryptographic code? Sure, we can arrange for the RNG to get “upgraded” to fix this, but this may end up being the least of our worries.

Random vs. Cryptographically random are typically separate

wahern — Fri, 18 Sep 2015 22:36:42 +0000

But many people don't think about security consequences, or if they do they arrive at the wrong conclusion. I think the point here is that by making a CSPRNG the default, you mitigate the impact of sloppy analysis. Of course, attempting to second guess programmers this way is often a bad idea. I understand the push back. But in this case, all things considered, it seems like an easy win.

Theo did his homework before arriving at his decision: "I have spent the last week researching all the uses of the srand(), srandom(), and srand48() subsystems in the ports tree." (https://lwn.net/Articles/625562/). So he had a better idea than most about the potential impact.

Python and crypto-strength random numbers by default

wahern — Fri, 18 Sep 2015 22:21:44 +0000

It helps if you read a version of the man page from this decade: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/ma...

OpenBSD changed the behavior starting with OpenBSD 5.7.

Random vs. Cryptographically random are typically separate

david.a.wheeler — Fri, 18 Sep 2015 19:20:26 +0000

In many languages, "random" and "cryptographically random" are different.

E.G., in Java, SecureRandom provides a cryptographically strong random number generator (RNG), while Random does not. See: http://docs.oracle.com/javase/7/docs/api/java/security/Se...

Python and crypto-strength random numbers by default

diegor — Fri, 18 Sep 2015 15:46:29 +0000

It seems to me, that not even OpenBSD replaced the default random generator with a "high-security" generator:

http://www.rocketaware.com/man/man3/random.3.htm

If we follow the Theo's reasoning, we should expect that random() in standard C library was using arc4random.

So I wonder why Theo suggests that python should do it?

Python and crypto-strength random numbers by default

xorbe — Thu, 17 Sep 2015 23:44:25 +0000

No, ob XKCD link

https://xkcd.com/221/

Python and crypto-strength random numbers by default

njs — Thu, 17 Sep 2015 21:21:57 +0000

> Why not return values from the MT algorithm if random.random() is called after an explicit seed has been set, but switch to a newer CSPRNG algorithm if random.random() is called without seeding?

AFAICT, the only reason not to (besides the usual one that it would take some work to implement and maintain) is that Guido considers it "a hack" :-(

Python and crypto-strength random numbers by default

droundy — Thu, 17 Sep 2015 19:10:10 +0000

This was precisely my thought. It doesn't require changing the documentation at all, as you still should use the explicitly secure one if you need it, to avoid attacks that rely on a module silently degrading random.random. But it "fixes" the vast majority of the naive code out there.

Python and crypto-strength random numbers by default

corbet — Thu, 17 Sep 2015 14:56:36 +0000

Cue the obligatory Dilbert link.

Python and crypto-strength random numbers by default

robbe — Thu, 17 Sep 2015 14:42:26 +0000

The former will not care whether their dice rolls can be predicted with some effort (or is gambling allowed in your local kindergärten?). The latter will read the docs. So let's keep things like they are.

Python and crypto-strength random numbers by default

ssam — Thu, 17 Sep 2015 14:39:49 +0000

The default implementation should be

def random():
    return 4

If that does not meet your requirements, then you can select the random generator based on what you need.

Python and crypto-strength random numbers by default

njh — Thu, 17 Sep 2015 13:03:30 +0000

The tweak that immediately occurred to me would be to change which RNG algorithm is used, based on whether it is explicitly seeded or not.

The article suggest that the existing MT algorithm is seeded from urandom by default, if no seed value is provided.

Why not return values from the MT algorithm if random.random() is called after an explicit seed has been set, but switch to a newer CSPRNG algorithm if random.random() is called without seeding? That would mean that existing Monte Carlo simulation code (and similar) would continue to give results that matched runs from before the change, but stronger random numbers would be produced by default in cases where reproducibility isn't a concern.

Python and crypto-strength random numbers by default

ianmcc — Thu, 17 Sep 2015 12:12:22 +0000

I think its a bad idea to try to make the default RNG cryptographically secure. People who don't know what they're doing will misuse it anyway, and if they're too clueless to read the documentation and learn how to do cryptography properly (which probably involves stepping away from the keyboard and using a pre-existing and known good library instead) then they'll surely make lots of other mistakes (from elementary to subtle) with the cryptography too. Indeed, keeping the standard random number generator not suitable for crypto is a good thing, because its an easy-to-spot marker that indicates that there are likely a whole range of insecurities in the code. Like finding brown M&M's in the bowl. http://www.snopes.com/music/artists/vanhalen.asp

For numerical applications (Monte Carlo) you want deterministic random numbers, but seeded non-deterministically (but with the seed stashed away somewhere, so that you can reproduce the calculation later if necessary).

For a game, it would vary - if its a networked multiplayer game then you may well want it to be deterministic, so that multiple clients can do the same 'dice rolls' and stay in sync. If its an online poker game then you definitely want it to be non-deterministic (but you'd probably get the random numbers from a server somewhere anyway).

In all of these cases you'd want to choose very carefully the random number generator and how its used. No 'default' generator will cover all of these cases. The default generator should only be used for 'throwaway' applications, where you need some randomness but don't care whether its deterministic or secure, as long as it meets some minimum level of quality. And for that, MT is already overkill.

Python and crypto-strength random numbers by default

hthoma — Thu, 17 Sep 2015 08:20:58 +0000

I don't read the documentation either. But if I see an API that has a seed() function, I strongly assume that subsequent calls to random() are deterministic. Just because if it would not be deterministic, then there is no point in having a seed() function at all, IMHO.

Python and crypto-strength random numbers by default

xorbe — Wed, 16 Sep 2015 22:51:59 +0000

One flavor of Python for the kindergarten classroom. One for professional production.

Python and crypto-strength random numbers by default

fredrik — Wed, 16 Sep 2015 20:56:04 +0000

I have no opinion on the core topic but want to object to the argument that people don't read the docs.

As a python novice, my impression is that I spend more time reading documentation for python modules than for other languages. Perhaps because I find the documentation for a lot of modules a pleasure to read. They give a swift introduction, offer relevant examples, mention relevant implementation details, and sometimes suggest other modules that perhaps are better suited to some more tangential use case.

So, perhaps an alternative proposal is to ensure that the documentation for random.random() is very entertaining, so entertaining that it competes with 'from __future__ import braces' et.al.

Python and crypto-strength random numbers by default

jimparis — Wed, 16 Sep 2015 18:49:57 +0000

This paper describes reverse-engineering a malicious worm by watching a /8 to see which IPs the worm probed. From there, they determined PRNG state and seeds, and used that to deduce everything from the system uptime and disk count of the infected hosts, to tracking down the "patient zero" computer where the worm started:
http://www.icir.org/vern/papers/witty-imc05.pdf

This paper describes PRNG attacks and has some real-world examples of a many PHP applications with PRNGs that were vulnerable in some form. It seems like the most frequent attack is in things like password reset tokens: request a password reset yourself, check your email and figure out the server's PRNG state, request a password reset for your victim, and use the known PRNG state to predict their token:
https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_...

This page describes an online betting-type game where the attacker was able to predict results from previous ones:
http://jonasnick.github.io/blog/2015/07/08/exploiting-csg...

These slides describe an attack on WPS that involves figuring out the PRNG state (slide 15):
http://www.slideshare.net/0xcite/offline-bruteforce-attac...

Python and crypto-strength random numbers by default

jtaylor — Wed, 16 Sep 2015 17:38:32 +0000

I'm curious, has there ever been a documented issue/exploit/breakin caused by the use of a good-but-not-crypto-good random number generation in the real world?
By good I mean for example a properly seeded MT or similar, not hilariously broken stuff like linear congruential with poor parameters or the Debian ssl key bug.