Protecting Python package downloads

By Jake Edge
January 14, 2015

Python is looking at ways to protect its users from installing compromised packages from the Python Package Index (PyPI) repository. Currently, packages are downloaded using SSL/TLS encryption, which is enough to ensure package integrity between PyPI and the client, as long as PyPI itself—or some mirror or content-delivery network (CDN) server—is not compromised. But dealing with a compromise of the repository or its mirrors requires another level of security, which is what is now under discussion.

A two-part proposal has been made by three NYU researchers (Vladimir Diaz, Trishank Kuppusamy, and Justin Cappos) with assistance from Python core developer Donald Stufft. The first part takes the form of Python Enhancement Proposal (PEP) 458, which provides a mechanism to sign packages in PyPI using The Update Framework (TUF). It is largely non-controversial, partly because there is no visible impact on users or package developers.

TUF is a project by those same researchers to provide a library that can be used to handle the software-update process securely. It is designed so that the updating client can securely determine that an update is available and download a verified copy of the latest version. The intent is to place most of the work into the library, so that the software-update problem can be easily dealt with for a wide variety of different projects. TUF was also mentioned as a possible solution for the problems outlined in our article last week about the state of Docker image verification.

The second piece of the proposal is contained in PEP 480. It would change the workflow for package developers, which is part of why it seems headed for the back burner until the user-interface issues can be fully considered. The problems largely come down to key management—something that is always difficult in cryptographic verification schemes.

The basic idea behind PEP 458 is that the PyPI administrators would attach TUF metadata (which includes signatures) to packages in the repository. The pip installer (which is now shipped with any recent version of Python), as well as other installers, could then be changed to use the library to look at the metadata and verify the information found therein. This would thwart a wide variety of attacks, but still leave PyPI users vulnerable to others, which is what PEP 480 (the "maximum security model") is meant to address.

The changes needed on the client side are not directly addressed in PEP 458, though there has been work done to make pip work with TUF metadata. Those changes are fairly small, largely because the TUF library handles most of the complexity. By far the biggest pieces of the change are files containing various trusted public keys; the actual download function just needed a tuf.interposition.open_url decorator placed on it.

The bigger piece of the puzzle (and most of what is contained in the PEP) is changing PyPI to handle, store, and serve the TUF metadata. That metadata is signed by various kinds of keys that are described in the "PyPI and TUF Metadata" section of the PEP. There is a "root" key that is stored offline (and its public portion is distributed with any update clients like pip) that provides the root of trust. It signs all of the top-level keys.

Another offline key is "targets", which is used to sign the metadata files for the available packages. To allow uploaded packages to be immediately available, the signing ability of the targets key is actually delegated to the online "bins" key. For scalability reasons, that key has its signing authority delegated to up to 1024 subsidiary "bin" keys that are actually used to sign package metadata.

The package metadata consists of sizes and hashes of each file that a client will actually download. Those values can then be verified on the client side to ensure that the proper files were downloaded. The PEP specifies SHA-256 for the hashing algorithm, but does not recommend a specific digital signature algorithm, though it assumes RSA is used. The PEP says that other algorithms could be substituted since the state of cryptography changes over time.

There are also two other metadata files, each with its own key, that need to be maintained by the repository. The "snapshot" file provides information on the latest version of all the metadata files for each package, which ensures that a client gets a consistent view of the entire repository. Similarly, the "timestamp" file simply provides the latest version number for the snapshot file, so that clients get the latest even in the presence of multiple simultaneous updaters. Those files are signed with separate keys (named, unsurprisingly, snapshot and timestamp). They are stored online (to allow instant availability of new packages) and signed by the root key.

The idea behind all of the different keys is to try to prevent a compromise of one (other than root, obviously) leading to the compromise of all of the different pieces of metadata. The metadata will all expire with some frequency (yearly for root or target, daily for the others) as a way to reduce the impact of key disclosure. Unless the offline keys are disclosed, the short expiration times of the metadata will limit the window of time in which attacks can take place because the attacker cannot sign new versions without access to the offline keys. The PEP also contains an analysis of the effects of compromising individual keys and combinations of those keys.

In fact, the PEP contains a lot of information about TUF and how the authors recommend it be applied to PyPI. Those interested in more details should refer to that document and the TUF specification.

Ideally, packages should be end-to-end signed, so that users can ensure that the same code uploaded by a package developer is what gets installed. That requires developers to have their own keys that can be verified, distributed, revoked, and expired by PyPI. That is the subject of PEP 480, but there are lots of questions about how, exactly, that all might work. In the meantime, though, implementing PEP 458 (the "minimum security model") still protects users against malicious mirrors and CDNs once update clients start incorporating the validation.

The kinds of attacks that can be prevented are those where a compromised repository can cause the client to install malicious code. That includes installing arbitrary code controlled by the attacker or older, known-vulnerable code. TUF also prevents things like the repository specifying a dependency on a malicious or known-vulnerable package or sending a file that is not what was requested by the client.

The biggest complaint about the proposals in the Python distutils mailing list thread is that there are two of them that are being discussed at the same time. Overall, PEP 458 and the TUF security model have been largely met with approval, but PEP 480 is another story. As Nick Coghlan put it:

PEP 458 is almost certainly a solid enhancement to PyPI's overall security (assuming we can come up with an acceptable answer for external hosting). It's significantly less clear that PEP 480 is the right answer for delegating more signing authority to developer groups - for example, it may be better to come up with a federated *hosting* model, where external hosting is the answer if developers choose to use their own signing keys rather than PyPI's automated online keys. Things like the Rackspace developer program, or the emerging next generation of Docker-based Platform-as-a-Service offerings, make it easier to recommend such federated hosting models.

As a result, my perspective is that it's the UX [user experience] design concept that will make or break PEP 480 - the security model of TUF looks great to me, what gives me pause is concern over the usability and maintainability of signed uploads for "developers in a hurry".

As Coghlan noted, there is still an unresolved issue with regard to externally hosted packages that are listed at PyPI. There are a number of alternatives listed in PEP 458 to handle those kinds of packages, but one needs to be chosen in coordination with those who host those packages. That particular problem has come up before; we looked at it last May and it is the subject of PEP 470.

The confusion stemming from both PEPs being discussed at once led Stufft to propose putting PEP 480 on the back burner while PEP 458 gets polished and finalized. That was met with multiple "+1" posts as well as agreement by Diaz, who is the researcher who posted the PEPs and who has been fielding questions and concerns. Working out the end-to-end problem can come later.

Given that TUF has been suggested for Docker and has been prototyped for Ruby Gems, it would seem to be a solution that numerous projects are interested in. While TUF uses well-studied cryptographic primitives, it is not entirely clear how much vetting by the security and cryptographic communities has been done on the overall framework. Obviously the researchers have looked it over carefully, but one hopes that other, independent security folks have or will do so as well. As we have seen over the years, it is not just cryptographic primitives that need scrutiny, the algorithms that are built atop them can be vulnerable too.

Index entries for this article
Security	Python
Security	Signing code
Security	The Update Framework (TUF)