|Benefits for LWN subscribers|
The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
Cryptography is on a lot of minds these days, but even before Heartbleed (which wasn't a crypto botch, but has certainly brought attention to the topic), some folks were already looking at the open-source cryptography landscape. The OpenStack developers were among them; two project members, Jarret Raim and Paul Kehrer, came to PyCon 2014 to report on "The state of crypto in Python". They described the existing Python cryptography libraries, as well as the underlying libraries, before reporting on the development of a new Python library called, somewhat confusingly, "cryptography".
Raim is the cloud security product manager for Rackspace, while Kehrer is a software developer for Rackspace working on the Barbican key management service for OpenStack. He also "unfortunately [does] a lot of crypto" in his spare time. Raim said that it was something of a running joke in the community that crypto libraries are "created by people who make poor life choices".
There is a need in Barbican and other OpenStack components for cryptography support, so the team looked at the libraries available. OpenStack needs the standard algorithms, in an open-source form. Everything in OpenStack is Apache-v2-licensed, so a crypto library would need to be compatible with that. Also, from a security standpoint, open source means that the code can be audited. The project needs something with Python support, since that is the language OpenStack is written in, but it needs more than just support for Python 2.7. Support for both Python 3 and PyPy is also important. Since OpenStack will build atop whatever library gets chosen, it is looking for something that is well-maintained and can be relied upon for some time to come.
All of the major cryptographic libraries they looked at were written in C (or, in the case of Botan, in C++). That's because C allows low-level control that managed languages can't provide, Kehrer said. One area where that makes a big difference is in avoiding timing attacks, which requires operations that take constant time (and, ideally, constant power). The code in those C libraries is generally good and has been well-reviewed, though maybe not in the area of TLS extensions, he said with a chuckle. In the future, there is some possibility of using cryptographic libraries in Rust or Go, but that is not a real possibility now.
So they looked at seven different libraries, grading them in six different categories. The libraries were OpenSSL, Network Security Services (NSS), NaCl, Botan, Apple's CommonCrypto, Microsoft's CSP, and Libgcrypt. The criteria were: open source, cross-platform, maintained, ubiquitous, support for the standard crypto algorithms, and FIPS certification.
Neither Raim nor Kehrer was particularly fond of the FIPS versions of the libraries, with both agreeing that "you do not want FIPS". All of the libraries failed one or more of the criteria, except the dominant player: OpenSSL. Its ubiquity gave all the others a lower classification (the slide, #4, used green, yellow, and red to indicate decreasing compliance). NSS had a yellow for ubiquitous, as did NaCl and Libgcrypt, both of which lacked FIPS certification as well (which is evidently important to some OpenStack users). NaCl also lacked support for the standard algorithms.
They then looked at Python libraries that provide cryptography services. Five libraries were examined: M2Crypto, PyCrypto, pyOpenSSL, python-nss, and Botan's Python bindings; five criteria were evaluated as well: which C backend, how well maintained, Python support, reviewed, and completeness. All failed the "reviewed" category, as independent review is expensive, but it is also not critical since the actual crypto is not being done in Python "if you are doing it right", Raim said.
The breadth of Python support is one criterion where all of the libraries ran aground, as most only support Python 2.x. Backend support is OpenSSL for M2Crypto and pyOpenSSL, PyCrypto has its own C language backend (which is a bit worrisome from a security standpoint, since it is not used in other places). The other two use NSS and Botan (not surprising, given their names). All of the libraries expect the Python programmer to have deep understanding of how the underlying C library works; they also tend not to be Pythonic, and can crash Python easily. The biggest problem may be that they do not offer a "non-foot-gun" interface, so there is no easy way for those not steeped in cryptography and the underlying library to use them without risking a serious security hole in their code.
Those problems with the Python libraries led the team toward creating a new library. With a chuckle, Raim put up the xkcd "How Standards Proliferate" comic and asked if a new library was truly needed. The answer, it turns out, is "yes".
The hope behind the new library is that it will become the cryptographic standard for Python. It will incorporate modern algorithms (e.g. AES-GCM and forward security) that have been "important for a long time", Raim said, but have been ignored in alternative libraries. Cryptography will be thoroughly tested and come with "sane defaults" and a "sane API". It will support Python 2.x, Python 3.x, and PyPy. It will be well-maintained and, unlike most of the alternatives, will not have known broken options, he said.
The library is open to contributions, and not just from crypto experts, Raim said. There are lots of other areas that need attention, including packaging, documentation, API design, and so on. You definitely do not need to be "a crypto master" to help out, he said.
Kehrer then described the structure of the library. To start with, it uses the C foreign function interface (CFFI), which is supported by both CPython and PyPy, to call out to the underlying C cryptographic libraries. CFFI allows easily passing data back and forth between C and Python. It also allows developers to register objects for garbage collection.
The cryptography library is made up of three layers. The lowest layer is the "bindings layer" that provides bindings to OpenSSL functions (though there is support for other underlying C libraries). OpenSSL has an unstable API that changes frequently, which "required a lot of ifdefs unfortunately". PyOpenSSL has recently switched over to using the bindings layer. That means pyOpenSSL will run on all of the Python versions that cryptography supports and the project will never have to deal with C bindings again. Kehrer suggested that M2Crypto could do the same if that project was interested.
The next layer up is called the "hazmat layer" (hazmat is slang for "hazardous materials"). When using this layer, "shooting yourself in the foot is pretty likely", he said. The project would prefer that people not use the hazmat layer (which is literally accessed via "import cryptography.hazmat"), but it has written a lot of documentation covering each function, cipher, padding mode, and so on for those who do decide to use it.
Ideally, though, users of the library shouldn't have to deal with the low-level details exposed in hazmat. That's because the "recipes layer" is the layer the project prefers people use. The recipes are designed around the idea of a "box/unbox model", where the user sends some data in to be encrypted or decrypted and gets back the data needed. Unfortunately, there is currently only one recipe available, so using the hazmat layer is required "to get anything done", Kehrer said.
The existing recipe is Fernet, which does symmetric encryption. It uses authenticated encryption with associated data (AEAD). To use it, the user just generates a key to store (by calling a Fernet function), and passes that key to the encrypt and decrypt routines. Users don't have to know about initialization vectors or nonces (or what the difference is between the two). Ideas for more recipes is one area the project needs help, he said.
The cryptography module is designed to be "agnostic to the underlying C library", Kehrer said. It also allows multiple backends, so it can "stitch together" OpenSSL with NaCl, for example, to provide support for the ciphers and primitives from Daniel J. Bernstein in addition to those provided in OpenSSL. A backend can implement all of the APIs needed by cryptography or just one, he said, and the library will fall back to choices from other backends.
Testing was an important consideration for cryptography, Raim said. Currently, a "single test run" is about 66,000 tests, but that test run is repeated multiple times for different backends, platforms, and so on, so it is actually run 77 times each time a build is done. Builds are done roughly fifteen times a day, so over 500 million tests are done weekly on the code base, he said. That testing is all done using the Travis and Jenkins continuous integration tools.
Kehrer said that patches cannot land without tests that cover 100% of the code being added. There is also a review process that normally has three or four people looking at the code before it lands. However, that level of code coverage does not extend to the tests for the C backends.
Raim went through a laundry list of what is currently supported in cryptography (slide #12), which includes most of the ciphers, hash-based message authentication code (HMAC) algorithms, key-derivation functions, one-time password mechanisms, and so on that would be expected. He highlighted the multi-backend and OpenSSL support, Apache v2 license, 30+ contributors, and the wide Python support (2.6, 2.7, 3.2-3.4, and PyPy) of the library.
There is "lots still to do" in the future, Kehrer said. While there is RSA signing/verification support, there is no way to load keys (other than as integers) as yet, so that needs work. Digital signature algorithm (DSA) signing and verification work is needed as well. DSA signing is "unbelievably sensitive" to the quality of the random number generator used (that's how the Playstation 3 was cracked, he said). Support for X.509 and TLS are on the drawing board as well. Both are available in OpenSSL, but the question is how to expose them in a sane API, he said.
While some might question the need for another Python cryptography library, it seems clear that a well-reviewed, maintained library that tries to avoid most of the pitfalls of crypto will be quite useful. There is a lot of momentum behind the cryptography library right now, much of which is coming from its parent project, OpenStack. Riding that momentum to get something for Python, perhaps even as a standard library some day, seems like it will be quite a boon, both for the language and for Python developers.
Video of the talk is also available for those interested.
Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds