| Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
Python applications, like those written in other languages, often need to obtain random data for purposes ranging from cryptographic key generation to initialization of scientific models. For years, the standard way of getting that data is via a call to os.urandom(), which is documented to "return a string of n random bytes suitable for cryptographic use." An enhancement in Python 3.5 caused a subtle change in how os.urandom() behaves on Linux systems, leading to some long, heated discussions about how randomness should be obtained in Python programs. When the dust settles, Python benevolent dictator for life (BDFL) Guido van Rossum will have the unenviable task of choosing between two competing proposals.
Traditionally, os.urandom() has been implemented on Linux by opening /dev/urandom and reading the requested amount of data. This interface is non-blocking; it will not wait if the amount of entropy in the system's entropy pool is low. The implementation of /dev/urandom is such that the quality of the random data it returns will be high even if the entropy pool is depleted — with one possible exception. Immediately after the system boots, when the entropy pool will contain little or no entropy, /dev/urandom may return relatively predictable data. In most systems, this window of poor randomness is only open for a few seconds at most, but exceptions do exist.
In the Python 3.5.0 release, os.urandom() was changed to use the relatively new getrandom() system call on Linux. Unless it has been called with the GRND_NONBLOCK flag, getrandom() will wait, if need be, for the system entropy pool to be initialized. os.urandom() does not supply that flag, meaning that it can block if the entropy pool has not yet accumulated enough randomness. It seems like a relatively small change, with the prospect of being sure of living up to the "suitable for cryptographic use" promise in compensation. But one need only look at Python issue 26839 to see that the implications are not quite as simple as one would expect.
It turns out that, in some distribution configurations, Python scripts are run at the very beginning of the user-space bootstrap process. If the entropy pool is not yet ready, those scripts will block until entropy-pool initialization is complete. If there is nothing else going on, and especially if the system is booting as a virtualized guest, it may take a long time to accumulate enough entropy to proceed. In the incident that led to the bug report, the boot process simply hung for 90 seconds until systemd lost patience and killed the blocking process. That kind of behavior, created in the search for unpredictable random numbers, has quite predictable effects in the form of unhappy users.
From this sprung the bug-tracker entry referenced above, which turned into a fierce discussion on the wisdom of the API change and whether os.urandom() should return crypto-quality randomness at any cost. The discussion spilled over onto the python-dev list when 3.5 release manager Larry Hastings despaired of reaching any sort of consensus and asked Van Rossum to simply rule on the matter. The resulting thread led many participants to question whether they wanted to continue following the list at all but, in the end, it did come to some useful conclusions.
If one is doing cryptographically sensitive work early in the bootstrap process — generating an SSH host key, for example — then blocking the boot almost certainly makes sense. The consequences of the alternative — generating weak keys — can be severe. In this case, though, it turns out that such high-quality randomness was not needed. Nobody was generating keys; instead, Python was initializing its own internal random-number generator and setting up dictionary randomization to defend against hash-collision attacks. These internal calls were (inadvertently) changed when os.urandom() was changed, but there seems to be a rough consensus that they do not need blocking behavior.
So the proper fix for the observed boot hang is to do these internal initializations without blocking on the entropy pool. For Python 3.5, the os.urandom() change will also be partially reverted, in that the function will, once again, be non-blocking. It will call getrandom() with the GRND_NONBLOCK flag and, if that call fails, fall back to reading /dev/urandom as before. With these fixes in place, the blocking part of the change is effectively reverted and the immediate problem has been solved.
That still leaves open the issue of how os.urandom() should behave; developers who are concerned about security are adamant that it should not return data when the entropy pool is not yet ready. So there is still pressure on Van Rossum (and the community as a whole) to specify blocking behavior starting with the upcoming 3.6 release. Python's benevolent dictator seems inclined to downplay the issue:
It became clear in the discussion, though, that opposition to returning questionable randomness from os.urandom() is strong. It seems likely that, in 3.6 and later releases, os.urandom() will no longer return data drawn from an uninitialized entropy pool. The question of how it will behave is, as yet, unresolved, though. In the end, Van Rossum asked the proponents of two different approaches to write up their ideas as Python enhancement proposals (PEPs); he will then choose between the two.
The first approach, favored by Victor Stinner, is to simply make os.urandom() blocking and be done with it — after ensuring that Python itself uses non-blocking behavior during its initialization. Changing to blocking behavior is arguably an incompatible change in a longstanding Python API but, as Stinner points out: "First of all, no user complained yet that 'os.urandom()' blocks. This point is currently theoretical." As long as the problems with starting Python itself are resolved, the thinking goes, there should not be problems for other users.
The alternative comes from Nick Coghlan. With this proposal, os.urandom() will raise a BlockingIOError exception if random data cannot be had without blocking. Adding a new exception to an established API has its own hazards; no existing code will be expecting that exception, so surprising explosions might result. But, for a problem that should only be possible during the bootstrap process, Coghlan believes that this is the best approach:
This proposal also envisions adding a function to the in-development "secrets" module — secrets.wait_for_system_rng() — that would simply block until the system's entropy pool is fully initialized and ready. The small (possibly nonexistent) body of code that breaks with unhandled BlockingIOError exceptions could call this function to ensure the availability of strong random data from os.urandom().
It is not clear when a decision between these two proposals will be made. It is worth noting, though, that Coghlan has indicated that he is happy enough with Stinner's proposal that he can support it should that be the one that is accepted in the end. So the discussion may have been long and painful, but the end result should be strong random data in Python in a way that the community as a whole is able to agree upon. Hopefully that means everybody can rest and prepare for the inevitable debate over whether this change should be backported to Python 2.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 15:01 UTC (Sun) by fandingo (guest, #67019) [Link]
The os module isn't designed to do anything fancy or add complexities on top of syscalls. It's just there to make them available. The less opinion the run time injects into this sort of thing the better.
Focus this effort on the secrets module. That's where people expect them to invest the time on these thorny issues.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 17:00 UTC (Sun) by Otus (subscriber, #67685) [Link]
Arguably, /dev/urandom itself should not behave as it does. If it weren't for strict API guarantees preventing changes, it should similarly block in early boot until sufficient entropy exists, like getrandom().
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:05 UTC (Sun) by SLi (subscriber, #53131) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:23 UTC (Sun) by josh (subscriber, #17465) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:53 UTC (Sun) by SLi (subscriber, #53131) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 23:17 UTC (Sun) by josh (subscriber, #17465) [Link]
That should have said: /dev/urandom never blocks; it should block only when it hasn't been seeded yet, as it works fine as long as it has been seeded. getrandom behaves the way urandom should always have worked.
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 5:57 UTC (Mon) by Otus (subscriber, #67685) [Link]
/dev/random keeps an entropy tally and can block whenever that goes low. /dev/urandom only needs initial entropy and should (but does not) only block early when it lacks even that.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:14 UTC (Sun) by arjan (subscriber, #36785) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 10:21 UTC (Mon) by keeperofdakeys (subscriber, #82635) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 14:17 UTC (Mon) by hkario (subscriber, #94864) [Link]
the problem is with embedded systems and as you pointed out, when you run inside initrd image
Python's os.urandom() in the absence of entropy
Posted Jul 15, 2016 23:58 UTC (Fri) by tytso (✭ supporter ✭, #9993) [Link]
getrandom(2) is a new system call interface (now, if only glibc would get off their collective backsides and support it), so I could make it block until the entropy pool is initialized. That way, we don't break backwards compatibility. The problem is that there are userspace programs that use /dev/urandom for various non-cryptographic use cases, and we don't want to break them --- especially if it means breaking a distribution in early boot.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:22 UTC (Sun) by cardoe (guest, #72234) [Link]
Something similar recently came up in Rust as well. A few programs I wrote that were started up in an early boot environment were blocking for long enough to raise the ire of systemd. On the Rust side it was nearly impossible to use any of the standard crates (think of Python Packages). After talking to some of the Rust maintainers the solution they wanted me to implement was to not cause the internal libstd bits to block but any users of the random crate would get the blocking implementation. The result is any users of HashMap are not guaranteed to have a cryptographically secure seed. For the relavant pull request see: Rust Lang PR 33086 and the original issue: Rust Lang Issue 32953. Honestly I know my use case is special but it matters to me (and clearly others from this post). You don't always need cryptographically secure numbers for every case that you need a random value. And acting like the world is black and white with regards to this is what gives the security folks a bad rap.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 22:39 UTC (Sun) by arjan (subscriber, #36785) [Link]
there is a lot of entropy in most modern systems, from the prng instructions in modern CPUs to the timestamp at the time of interrupts... yes you can argue that you don't want to trust any of that for generating long lasting keys (e.g. /dev/random), and I appreciate that a lot. But for almost everything else, throwing a lot of the "some entropy, just not provable how much" things together will be better than having people stop using random numbers (or worse, implement a custom PRNG per app) in their code.
(the provably perfect /dev/random is one that never returns anything. it's also provably the most useless one.... )
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 3:03 UTC (Mon) by mathstuf (subscriber, #69389) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 11:21 UTC (Mon) by cortana (subscriber, #24596) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 11:23 UTC (Mon) by cortana (subscriber, #24596) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 12:29 UTC (Mon) by mathstuf (subscriber, #69389) [Link]
https://www.freedesktop.org/software/systemd/man/systemd-...
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 12:32 UTC (Mon) by mathstuf (subscriber, #69389) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 23:43 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
As for the whole random/urandom stuff - attacks on random number generators range from "impossible" to "requires a lab to pull off in a controlled situation". There's simply no reason whatsoever to use /dev/random.
Waiting _once_ until there's enough initial entropy in urandom pool makes sense, but 128 bits are plenty enough for that and you can get them quickly enough.
Python's os.urandom() in the absence of entropy
Posted Jul 10, 2016 23:59 UTC (Sun) by droundy (subscriber, #4559) [Link]
Wrong place
Posted Jul 11, 2016 5:18 UTC (Mon) by ncm (subscriber, #165) [Link]
Even if you boot from a LiveCD and so have no place to put a file, RTCs can store a hundred bits (or so) across power-cycles. Even the act of getting bits out of the RTC can generate some entropy, because of mismatched clocks.
Wrong place
Posted Jul 11, 2016 10:27 UTC (Mon) by keeperofdakeys (subscriber, #82635) [Link]
Wrong place
Posted Jul 13, 2016 20:49 UTC (Wed) by akkornel (subscriber, #75292) [Link]
cat /var/lib/random_pool_from_shutdown > /dev/urandom
That will mix data into the pool, but it does not actually credit any entropy. So when you do `cat /proc/sys/kernel/random/entropy_avail`, it won't show any increase.
If you do want to add random data to the pool _and_ credit entropy, you have to do an ioctl. The random(4) man page has the details!
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 7:55 UTC (Mon) by lkundrak (subscriber, #43452) [Link]
Python's os.urandom() in the absence of entropy
Posted Jul 11, 2016 21:57 UTC (Mon) by flussence (subscriber, #85566) [Link]
VirtIO RNG
Posted Jul 11, 2016 9:47 UTC (Mon) by tialaramex (subscriber, #21167) [Link]
The imaginary box in which your virtualised computer exists doesn't have any reliable source of actual random numbers. So you should suck entropy from the host, which does. As I understand it VirtIO RNG isn't in terribly good shape, but it's not a network or disk, it only needs to successfully move a few random bits from the host to kick things off in early boot, so it doesn't need to be a work of entrancing efficiency so long as it isn't full of security holes.
Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the
Creative
Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds