User: Password:
|
|
Subscribe / Log in / New account

The state of crypto in Python

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jake Edge
April 30, 2014
PyCon 2014

Cryptography is on a lot of minds these days, but even before Heartbleed (which wasn't a crypto botch, but has certainly brought attention to the topic), some folks were already looking at the open-source cryptography landscape. The OpenStack developers were among them; two project members, Jarret Raim and Paul Kehrer, came to PyCon 2014 to report on "The state of crypto in Python". They described the existing Python cryptography libraries, as well as the underlying libraries, before reporting on the development of a new Python library called, somewhat confusingly, "cryptography".

Raim is the cloud security product manager for Rackspace, while Kehrer is a software developer for Rackspace working on the Barbican key management service for OpenStack. He also "unfortunately [does] a lot of crypto" in his spare time. Raim said that it was something of a running joke in the community that crypto libraries are "created by people who make poor life choices".

Scanning the cryptography landscape

[Jarret
Raim] There is a need in Barbican and other OpenStack components for cryptography support, so the team looked at the libraries available. OpenStack needs the standard algorithms, in an open-source form. Everything in OpenStack is Apache-v2-licensed, so a crypto library would need to be compatible with that. Also, from a security standpoint, open source means that the code can be audited. The project needs something with Python support, since that is the language OpenStack is written in, but it needs more than just support for Python 2.7. Support for both Python 3 and PyPy is also important. Since OpenStack will build atop whatever library gets chosen, it is looking for something that is well-maintained and can be relied upon for some time to come.

All of the major cryptographic libraries they looked at were written in C (or, in the case of Botan, in C++). That's because C allows low-level control that managed languages can't provide, Kehrer said. One area where that makes a big difference is in avoiding timing attacks, which requires operations that take constant time (and, ideally, constant power). The code in those C libraries is generally good and has been well-reviewed, though maybe not in the area of TLS extensions, he said with a chuckle. In the future, there is some possibility of using cryptographic libraries in Rust or Go, but that is not a real possibility now.

So they looked at seven different libraries, grading them in six different categories. The libraries were OpenSSL, Network Security Services (NSS), NaCl, Botan, Apple's CommonCrypto, Microsoft's CSP, and Libgcrypt. The criteria were: open source, cross-platform, maintained, ubiquitous, support for the standard crypto algorithms, and FIPS certification.

Neither Raim nor Kehrer was particularly fond of the FIPS versions of the libraries, with both agreeing that "you do not want FIPS". All of the libraries failed one or more of the criteria, except the dominant player: OpenSSL. Its ubiquity gave all the others a lower classification (the slide, #4, used green, yellow, and red to indicate decreasing compliance). NSS had a yellow for ubiquitous, as did NaCl and Libgcrypt, both of which lacked FIPS certification as well (which is evidently important to some OpenStack users). NaCl also lacked support for the standard algorithms.

They then looked at Python libraries that provide cryptography services. Five libraries were examined: M2Crypto, PyCrypto, pyOpenSSL, python-nss, and Botan's Python bindings; five criteria were evaluated as well: which C backend, how well maintained, Python support, reviewed, and completeness. All failed the "reviewed" category, as independent review is expensive, but it is also not critical since the actual crypto is not being done in Python "if you are doing it right", Raim said.

The breadth of Python support is one criterion where all of the libraries ran aground, as most only support Python 2.x. Backend support is OpenSSL for M2Crypto and pyOpenSSL, PyCrypto has its own C language backend (which is a bit worrisome from a security standpoint, since it is not used in other places). The other two use NSS and Botan (not surprising, given their names). All of the libraries expect the Python programmer to have deep understanding of how the underlying C library works; they also tend not to be Pythonic, and can crash Python easily. The biggest problem may be that they do not offer a "non-foot-gun" interface, so there is no easy way for those not steeped in cryptography and the underlying library to use them without risking a serious security hole in their code.

A library is born

Those problems with the Python libraries led the team toward creating a new library. With a chuckle, Raim put up the xkcd "How Standards Proliferate" comic and asked if a new library was truly needed. The answer, it turns out, is "yes".

The hope behind the new library is that it will become the cryptographic standard for Python. It will incorporate modern algorithms (e.g. AES-GCM and forward security) that have been "important for a long time", Raim said, but have been ignored in alternative libraries. Cryptography will be thoroughly tested and come with "sane defaults" and a "sane API". It will support Python 2.x, Python 3.x, and PyPy. It will be well-maintained and, unlike most of the alternatives, will not have known broken options, he said.

The library is open to contributions, and not just from crypto experts, Raim said. There are lots of other areas that need attention, including packaging, documentation, API design, and so on. You definitely do not need to be "a crypto master" to help out, he said.

[Paul
Kehrer]

Kehrer then described the structure of the library. To start with, it uses the C foreign function interface (CFFI), which is supported by both CPython and PyPy, to call out to the underlying C cryptographic libraries. CFFI allows easily passing data back and forth between C and Python. It also allows developers to register objects for garbage collection.

The cryptography library is made up of three layers. The lowest layer is the "bindings layer" that provides bindings to OpenSSL functions (though there is support for other underlying C libraries). OpenSSL has an unstable API that changes frequently, which "required a lot of ifdefs unfortunately". PyOpenSSL has recently switched over to using the bindings layer. That means pyOpenSSL will run on all of the Python versions that cryptography supports and the project will never have to deal with C bindings again. Kehrer suggested that M2Crypto could do the same if that project was interested.

The next layer up is called the "hazmat layer" (hazmat is slang for "hazardous materials"). When using this layer, "shooting yourself in the foot is pretty likely", he said. The project would prefer that people not use the hazmat layer (which is literally accessed via "import cryptography.hazmat"), but it has written a lot of documentation covering each function, cipher, padding mode, and so on for those who do decide to use it.

Ideally, though, users of the library shouldn't have to deal with the low-level details exposed in hazmat. That's because the "recipes layer" is the layer the project prefers people use. The recipes are designed around the idea of a "box/unbox model", where the user sends some data in to be encrypted or decrypted and gets back the data needed. Unfortunately, there is currently only one recipe available, so using the hazmat layer is required "to get anything done", Kehrer said.

The existing recipe is Fernet, which does symmetric encryption. It uses authenticated encryption with associated data (AEAD). To use it, the user just generates a key to store (by calling a Fernet function), and passes that key to the encrypt and decrypt routines. Users don't have to know about initialization vectors or nonces (or what the difference is between the two). Ideas for more recipes is one area the project needs help, he said.

The cryptography module is designed to be "agnostic to the underlying C library", Kehrer said. It also allows multiple backends, so it can "stitch together" OpenSSL with NaCl, for example, to provide support for the ciphers and primitives from Daniel J. Bernstein in addition to those provided in OpenSSL. A backend can implement all of the APIs needed by cryptography or just one, he said, and the library will fall back to choices from other backends.

Testing was an important consideration for cryptography, Raim said. Currently, a "single test run" is about 66,000 tests, but that test run is repeated multiple times for different backends, platforms, and so on, so it is actually run 77 times each time a build is done. Builds are done roughly fifteen times a day, so over 500 million tests are done weekly on the code base, he said. That testing is all done using the Travis and Jenkins continuous integration tools.

Kehrer said that patches cannot land without tests that cover 100% of the code being added. There is also a review process that normally has three or four people looking at the code before it lands. However, that level of code coverage does not extend to the tests for the C backends.

Raim went through a laundry list of what is currently supported in cryptography (slide #12), which includes most of the ciphers, hash-based message authentication code (HMAC) algorithms, key-derivation functions, one-time password mechanisms, and so on that would be expected. He highlighted the multi-backend and OpenSSL support, Apache v2 license, 30+ contributors, and the wide Python support (2.6, 2.7, 3.2-3.4, and PyPy) of the library.

There is "lots still to do" in the future, Kehrer said. While there is RSA signing/verification support, there is no way to load keys (other than as integers) as yet, so that needs work. Digital signature algorithm (DSA) signing and verification work is needed as well. DSA signing is "unbelievably sensitive" to the quality of the random number generator used (that's how the Playstation 3 was cracked, he said). Support for X.509 and TLS are on the drawing board as well. Both are available in OpenSSL, but the question is how to expose them in a sane API, he said.

While some might question the need for another Python cryptography library, it seems clear that a well-reviewed, maintained library that tries to avoid most of the pitfalls of crypto will be quite useful. There is a lot of momentum behind the cryptography library right now, much of which is coming from its parent project, OpenStack. Riding that momentum to get something for Python, perhaps even as a standard library some day, seems like it will be quite a boon, both for the language and for Python developers.

Video of the talk is also available for those interested.


(Log in to post comments)

The state of crypto in Python

Posted May 1, 2014 16:18 UTC (Thu) by nmav (subscriber, #34036) [Link]

Despite what their slides show, libgcrypt is FIPS140-2 certified. The only certain thing is that we don't need yet another crypto primitives library.

The state of crypto in Python

Posted May 1, 2014 18:34 UTC (Thu) by Wummel (subscriber, #7591) [Link]

I'm not sure a new library is needed as well. What I sorely miss is a complete real-world example showing how to use any of those libraries in a secure manner. Especially for libraries who claim to be easy to use this provokes unsecure usage.

If a library claims to be easy to use it attracts programmers without a strong security background who most likely copy-and-paste from the examples, perhaps with one or two peeks at the documentation.
Here are some things I encountered when looking at examples of the mentioned libraries:
  • The example ignores all errors and therefore does not show how errors should be handled. This is especially true for languages with exception handling (C++, Python).
  • The example does not show how to generate, store or transfer generated keys securely. Eg. the Botan example code reads the password from a command line parameter which is not a secure way to do it.
  • The example has a short "this is a secret" string to encrypt. Why not show how to encrypt a complete file and store the encrypted data? I am always wondering how potentially large files can/should be encrypted.

The state of crypto in Python

Posted May 2, 2014 17:28 UTC (Fri) by nmav (subscriber, #34036) [Link]

This is pretty much true. We have quite many crypto primitives libraries (libgcrypt, botan, nettle, libcrypto/openssl), and their authors are pretty much overwhelmed by the amount of maintenance needed. Helping them to improve the documentation, or the weak spots of their libraries would benefit all. Creating yet another library most probably would result to the current situation, solely with the addition of a new library in the list above.

The state of crypto in Python

Posted May 8, 2014 19:56 UTC (Thu) by jraim (guest, #97013) [Link]

Thanks for the data! Sorry we missed that one. I updated our slides so we get it right in the future.

It is important to note that we aren't building another primitives library, the cryptography project uses the existing primitive libraries to fulfill requests, we are only developing a common, python front-end for those libraries.

The state of crypto in Python

Posted May 9, 2014 13:28 UTC (Fri) by mirabilos (subscriber, #84359) [Link]

The slides HTTP 503 at the moment, though.

The state of crypto in Python

Posted May 4, 2014 5:45 UTC (Sun) by dlang (subscriber, #313) [Link]

One factor I didn't see mentioned here is the ability to make use of hardware cryto modules if they are available.

some CPUs include AES in hardware (close to memory bandwith AES processing), and there are a bunch of different pubic key crypto accelerator cards and devices out there (some with FIPS certification for those that need it). I've seen SCSI attached, internal cards, and network devices

how do these different software options stack up in terms of being able to make use such capabilities?

The state of crypto in Python

Posted May 8, 2014 19:58 UTC (Thu) by jraim (guest, #97013) [Link]

Many of the C backends that cryptography uses are capable of using hardware acceleration, OpenSSL among them. Assuming the library is compiled correctly, those features should just be used.

As to HSMs, those can be used through PKCS interfaces (something we need anyway for NSS support), so those should be usable as well.

The state of crypto in Python

Posted May 5, 2014 16:31 UTC (Mon) by Baylink (guest, #755) [Link]

> The biggest problem may be that they do not offer a "non-foot-gun" interface, so there is no easy way for those not steeped in cryptography and the underlying library to use them without risking a serious security hole in their code.

Um... Is it not the case that you *cannot* safely implement crypto in an app without a fairly deep understanding of how that crypto works?

Cause that'd be my bet...

The state of crypto in Python

Posted Jul 15, 2014 17:06 UTC (Tue) by zimmerfrei (guest, #97883) [Link]

> One area where that makes a big difference is in avoiding timing attacks, which requires operations that take constant time (and, ideally, constant power). The code in those C libraries is generally good and has been well-reviewed

The idea that a C library is inherently more robust w.r.t timing attacks is flawed and probably gives a false self-sense of security. You still depend a lot on the behaviour of the compiler (e.g. the optimizations it does), the C library, the OS (e.g. thread interactions), and the underlying hardware architecture (e.g. cache, etc). Those are all factors beyond the reach of the library developer.

Besides that, I am not aware of any evidence that common FOSS C libraries (e.g. openssl, polarssl) have really been *fully* vetted as being robust against side-channel attacks. Openssl and gcrypt have passed the FIPS certifications, but that says nothing. Most libraries do have countermeasures, but typically only for the most elementary attacks (for instance, I see very little against fault attacks), and in most cases there is nothing stopping you from implementing them in higher-level languages too.

Finally, I think it is about time the FOSS community realizes that C is not anymore suitable for writing system security infrastructure: it is really hard to write good C code and it is even more difficult to manually review it. We should move as much as possible of the implementations to high-level languages that are easier to review and test *and* that incorporate mechanisms against the most common security bugs (e.g. buffer overflows). Python in this case, although Rust would be by far my preference right now.

The state of crypto in Python

Posted Jul 15, 2014 18:09 UTC (Tue) by zlynx (subscriber, #2285) [Link]

First you talk about side channel attacks and compiler optimizations which might disable coded defenses.

Then you recommend writing in higher level languages? What sense does that make?

What I gather from the information you provided is that we should be writing security critical code in assembly.

The state of crypto in Python

Posted Jul 15, 2014 19:23 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Rust compiles to assembly you can review, not bytecode. It may be higher-level, but it isn't not low-level. Writing things in Python or other interpreted languages (and probably any language with a GC and no support for custom allocators as well) to avoid side-channel or timings attacks does make less sense though.

The state of crypto in Python

Posted Jul 17, 2014 17:23 UTC (Thu) by zimmerfrei (guest, #97883) [Link]

> First you talk about side channel attacks and compiler optimizations which might disable coded defenses. Then you recommend writing in higher level languages? What sense does that make?

There are actually two concerns at play here. Let me revert the order of my ramblings.

First, it is a fact that the large majority of incidents are due to the extensive usage of C in the implementation of security protocols. Writing secure C code is hard. Reviewing C code is even harder. It therefore makes sense to use as much as possible high-level languages, because you can benefit from all strict controls enforced by the language itself. You are also relieved from low-level tasks like memory management. Finally, you have fewer and more readable LOCs.

Second, though a numerical minority, side-channel attacks are still an important class that should not be overlooked. My claim though is that simply saying “I use C and therefore I am protected from side-channel attacks” is wrong (as it happens at 33:30 in the video). C is not such a fairy. It gives you a little more control on some types of information that may leak secrets (e.g. memory alignment) but you have to deal with optimization strategies that existing C compilers are capable of. BTW, the average C compiler is much more aggressive that your average bytecode compiler.

For instance, one very common timing attack concerns the comparison of two MAC tag as explained here http://emerose.com/timing-attacks-explained .

A piece of C code that handles comparison in a somewhat time-independent way is:

#define TRUE 1
#define FALSE 0

int util_cmp_const(const void * a, const void *b, const size_t size)
{
const unsigned char *_a = (const unsigned char *) a;
const unsigned char *_b = (const unsigned char *) b;
unsigned char result = 0;
size_t i;

for (i = 0; i < size; i++) {
result |= _a[i] ^ _b[i];
}

return (result ? FALSE : TRUE);
}

Is that secure? Most probably, but you cannot exclude that some compiler will still outsmart you by leaving the loop as soon as the variable “result” becomes non-zero, because it sees you are only using it as condition in the ternary operator. On the other hand, the same logic can be implemented in Ruby or Java or Python too and it will be no more or less secure (see also moment 36:00 in the video, where the fact that C has short circuits too is totally neglected).

Will assembly be more secure for that? Mosty probably, but still you cannot be sure 100% without measuring for real because most CPU instructions are not executed in fixed time (it is not 1990 anymore). Even after measuring, you will not be sure that different CPU architectures will behave in the same way.

Bottom line, I just think that:
1) A security stack should be implemented in a high-level language to the greatest possible extent (e.g. X.509 parsing. It is insane to do it it C) unless performance really bites you, in which case you can fall back to C.
2) You should be very very very wary of claiming that your implementation is robust against side-channel attacks, no matter which language you use (Ruby, C, Rust, and even assembly).

The state of crypto in Python

Posted Jul 17, 2014 17:35 UTC (Thu) by raven667 (subscriber, #5198) [Link]

The synthesis of all this seems to be that are too many abstraction layers fighting against you to make any guarantees when it comes to side channels, you will never know enough about the system to reliably engineer around them all. You can do your best to make the attacker work hard, but no iron-clad guarantees.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds