Signing and distributing Gentoo
The compromise of the Gentoo's GitHub mirror was certainly embarrassing, but its overall impact on Gentoo users was likely fairly limited. Gentoo and GitHub responded quickly and forcefully to the breach, which greatly limited the damage that could be done; the fact that it was a mirror and not the master copy of Gentoo's repositories made it relatively straightforward to recover from. But the black eye that it gave the project has led some to consider ways to make it even harder for an attacker to add malicious content to Gentoo—even if the distribution's own infrastructure were to be compromised.
Unlike other distributions, Gentoo is focused on each user building the software packages they want using the Portage software-management tool. This is done by using the emerge tool, which is the usual interface to Portage. Software "packages" are stored as ebuilds, which are sets of files that contain the information and code needed by Portage to build the software. The GitHub compromise altered the ebuilds for three packages to add malicious content so that users who pulled from those repositories would get it.
Ebuilds are stored in the /usr/portage directory on each system. That local repository is updated using emerge --sync (which uses rsync under the hood), either from Gentoo's infrastructure or one of its mirrors. Alternatively, users can use emerge-webrsync to get snapshots of the Gentoo repository, which are updated daily. Snapshots are individually signed by the Gentoo infrastructure OpenPGP keys, while the /usr/portage tree is signed by way of Manifest files that list the hash of each file in a directory. The top-level Manifest is signed by the infrastructure team, so following and verifying the chain of hashes down to a particular file (while also making sure there are no unlisted files) ensures that the right files are present in the tree.
Another mechanism to get a Portage tree is to clone a Git repository that contains one. These Git mirrors (such as the one at GitHub) can be used to create a local /usr/portage tree by doing an emerge --sync while pointing at the clone as the Portage source. Finally, there is also the canonical Portage tree Git repository, which is somewhat less convenient to use, since it does not have everything that is needed. It needs some data repositories and for the Portage cache to be updated; those things are handled by the infrastructure team for the Git mirrors. On the other hand, all commits to the canonical tree are signed by Gentoo developers directly, so the infrastructure keys need not be trusted.
Trustless
Jason A. Donenfeld posted an idea for a
"trustless infrastructure
" to the gentoo-dev mailing list on
July 2.
The core of his suggestion is that, instead of having the Gentoo
infrastructure team sign the Portage tree that the distribution
provides, developers of the ebuilds would sign them directly. That way,
if the infrastructure was compromised, there would be no signing keys
available to be abused.
His proposal is that every file in an ebuild would be signed by the developer responsible, so that each file would have a corresponding .asc file that would be distributed with the tree as usual. He also suggested that files not end up in /usr/portage until they have had their signatures verified; instead, they should be copied into a shadow directory to do the verification, then put into /usr/portage if it succeeds. A keyring of the public keys of Gentoo developers would be created and disseminated; eventually, the corresponding private keys would hopefully be stored by the developers on some kind of hardware token.
- Signatures are made by developers, not by infra.
- Portage doesn't see any files that haven't yet been verified.
The reaction to the proposal was somewhat mixed but generally on the
negative side. Rich Freeman pointed out
that a change of this sort would require a flag day of sorts; it could not
easily be added slowly and "grow organically
". But he also
noted that using the existing Git signatures would provide much of what
Donenfeld is
looking for. Freeman also thinks that syncing using Git, rather than
rsync should be considered:
Donenfeld's first reply is a bit
dismissive; it complains about the length of Freeman's reply, for example,
which is not much larger than the proposal itself. Similarly, when Michał
Górny asked about how the keyring would be
distributed and protected, Donenfeld's reply was terse: "Same model as
Arch.
" He did eventually elaborate
on that somewhat, but it did not convince
Górny:
Others also poked holes in the proposal, mostly with regard to key management. Hanno Böck posted a number of questions on key and signature management, particularly with regard to expired, revoked, and newly untrusted keys. Is there some kind of re-signing process that would have to be done? How would that be handled? He concluded:
Kristian Fiskerstrand was more pointed:
"I'll say it, it is unworkable
". He said that there was
always going to be a need for some centralized keys to ensure the
integrity of the repositories. Ulrich Mueller also said that Donenfeld's proposal was unworkable
because it would violate
the Gentoo Package
Manager Specification: "we cannot change that retroactively, because it would break
existing implementations
". Furthermore, Mueller wondered whether
adding another 100,000 files to the tree made sense; it would result in
400MB of extra space on a 4KB-block filesystem, he said.
Overall, it doesn't seem like the proposal is going anywhere, though there are elements of it that are attractive. In particular, removing the infrastructure-key bottleneck and, thus, danger from a compromise of those keys (and/or repositories) is of interest, but there is a lot of work to be done to get there. And, as always, key management is a difficult problem to solve.
Git versus rsync
In a related thread, William Hubbs picked up on Freeman's thinking and asked why Gentoo still relied on rsync rather than using Git directly. It comes down to a number of factors that Freeman summarized. Currently, doing an emerge --sync from a Git clone will leave the tree in a corrupted state if it doesn't verify. Also, rsync is more bandwidth efficient for less-frequent updates; it is not clear where the crossover point is, but he guessed Git would be more efficient if updates were done more often than weekly. There are more rsync mirrors, as well, though he is not sure that makes much of a difference in practice.
Beyond that, Freeman noted that Git history makes for more disk-space usage. He personally uses Git, and others would like to do so, but the disk-space issue makes that harder. Matt Turner said that he has set aside a 1GB partition for the tree, which works fine for the roughly 600MB needed by rsync, but not for Git. A shallow clone of the Git repository is roughly the same (around 660MB), but each pull adds to that, so without some kind of "auto-trimming", Git will grow quickly, Freeman said
All of the key-management issues are still present for the Git tree, as well. Even though the commits are signed by the developers, those keys need to be distributed and managed over time.
The GitHub mirror compromise has clearly led to some thinking (and
rethinking) within the project about its practices and how they might be
improved. It is not clear that there are any real conclusions that have been
reached, much less plans made, but considering the various parts of the
problem is certainly to the good. One concrete thing that has come out of
this incident is a Portage
security page on the Gentoo wiki. It explains how to "dispel
doubts regarding the security of the portage tree on my system
".
There are sections for each of the four ways to keep a Portage tree
updated that shows what needs to be trusted for each (e.g. keys, web of
trust, good
security practices) and how to test to ensure the integrity of the Portage
tree.
| Index entries for this article | |
|---|---|
| Security | Distribution security |
| Security | Integrity management |
Posted Jul 11, 2018 23:35 UTC (Wed)
by Antone87 (guest, #125195)
[Link] (2 responses)
Posted Jul 12, 2018 2:00 UTC (Thu)
by vivo (subscriber, #48315)
[Link]
Posted Jul 12, 2018 16:02 UTC (Thu)
by NightMonkey (subscriber, #23051)
[Link]
Posted Jul 12, 2018 10:18 UTC (Thu)
by moltonel (subscriber, #45207)
[Link] (5 responses)
I'm experiencing this first-hand right now: after 45min of failed attempts, I still didn't manage to update `/usr/portage` at all. With rsync (which I was using to `emerge --sync` until very recently) it doesn't throw away partial downloads between attempts, and eventually suceededs.
Posted Jul 12, 2018 11:01 UTC (Thu)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Jul 12, 2018 13:28 UTC (Thu)
by grawity (subscriber, #80596)
[Link] (2 responses)
AFAIK, that was initially disallowed as a cheap way to prevent someone from fetching unreferenced objects (e.g. after you push something sensitive, then force-push to undo it, but before you can run a garbage collection on the server).
But it'll be possible eventually (I'm guessing in 2.19 or 2.20); grep the commit log of git.git for "promisor objects".
Posted Jul 16, 2018 7:39 UTC (Mon)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jul 16, 2018 20:04 UTC (Mon)
by johill (subscriber, #25196)
[Link]
Posted Jul 17, 2018 11:26 UTC (Tue)
by jond (subscriber, #37669)
[Link]
Posted Jul 12, 2018 14:19 UTC (Thu)
by smitty_one_each (subscriber, #28989)
[Link]
Posted Jul 12, 2018 18:19 UTC (Thu)
by flussence (guest, #85566)
[Link] (2 responses)
So the existing rsync signature mechanism adds about 20k files, and looks like this:
* /Manifest — 1 file, has hashes of /Manifest.files.gz, is the only signed file in the tree
I'll propose a possible alternative structure here: let package owners add their own list(s) of files somewhere suitably granular, say /$category/$owner.packages.asc, and the appropriate owner gets looked up via a package's metadata.xml. The /$category/Manifest.gz would be changed to list metadata.xml files only, that is, the repo-wide signing key only asserts who's allowed to edit each package, for an extra layer of safety.
That would still be a lot of files, but possibly two orders of magnitude less than the original proposal, and also means only one file needs to be updated per commit instead of four (or O(n)!)
There's also the possible optimisation that the manifest files could be rsynced in one pass, and only the directories that changed would get updated in the second. Gentoo's current 2-pass rsync is braindead: it checks a timestamp file, then updates the entire tree, *then* verifies the manifests it got are signed and valid. They're now in the process of adding an extra workaround on top of this to prevent leaving the system in a corrupted state, instead of just doing it in the right order to begin with!
More importantly: this idea would use code that's already there and tested in production. Not saying I doubt Donenfeld's track record with security code or anything, but the existing system is *a lot* simpler and easier to debug than using raw GnuPG.
Posted Jul 12, 2018 21:04 UTC (Thu)
by zx2c4 (subscriber, #82519)
[Link] (1 responses)
Posted Jul 17, 2018 22:56 UTC (Tue)
by eternaleye (guest, #67051)
[Link]
I'd recommend that any attempt should, at very least, learn from The Update Framework[1], or even adopt their architecture directly. It's a well-researched, well-designed system, with strong security properties, a very carefully designed model, and a history of successful deployment (https://theupdateframework.github.io/adoptions.html).
I _would_ argue that any design that meets a less robust security definition than The Update Framework should be replaced by it.
Posted Jul 12, 2018 23:09 UTC (Thu)
by lamawithonel (subscriber, #86149)
[Link] (1 responses)
Posted Jul 12, 2018 23:31 UTC (Thu)
by rich0 (guest, #55509)
[Link]
As the article mentions, I pointed out that right now the portage verification leaves the tree checked out even if it fails verification. However, a patch for this is already in master and will eventually be released which eliminates that particular concern long-term.
There is another downside to using git for syncing that the article doesn't mention. Currently, we distribute both binary metadata cache and the text ebuilds themselves. The syncing git repos have to contain both, which means the history in these repos contain both, which is that much more data which is entirely redundant. This will be a somewhat harder problem to solve as it would probably require splitting the metadata out of the main repo, but at the same time ensure that they're distributed in-sync. This could be done with tags, but assuming we'd want to support multiple sync points per day it would be a lot of tags. I'm not sure if git has any issues with repos with tens of thousands of tags - I'd want to look into that before proposing it...
Signing and distributing Gentoo
Signing and distributing Gentoo
By the way both Gentoo and Arch are much more than signing a package having something in common would be a pleasing move that can spare resources for other tasks.
But not this time, sign every ebuild would be indeed too resource intensive and Manifest files are already present and can be used for that.
Signing and distributing Gentoo
Git vs Rsync
Git vs Rsync
Git vs Rsync
Git vs Rsync
Git vs Rsync
Git vs Rsync
Signing and distributing Gentoo
Clearly, this is a job for blockchain.
[/snark]
Signing and distributing Gentoo
* /Manifest.files.gz — 1 file, contains the hashes of per-category Manifest.gz files, and anything else not covered
* /$category/Manifest.gz — 168 files, contains hashes of $category/metadata.xml and per-package Manifest files
* /$category/$package-name/Manifest — 19497 files, containing everything to do with the package
Signing and distributing Gentoo
Not saying I doubt Donenfeld's track record with security code or anything, but the existing system is *a lot* simpler and easier to debug than using raw GnuPG.
I don't think my track record really matters one way or another. The basic idea is: can developers somehow add signatures to the packages they touch directly, instead of having some potentially hacked infrastructure box handling that from a central location. My opening proposal was just the obvious simpleton one, "just sign each file!" I assume that if something to this extent moves forward, people such as yourself will have all sorts of nice optimizations on the basic core idea, in order to reduce the number of files or promote a better repository layout, etc.
Signing and distributing Gentoo
Gentoo is choice
Gentoo is choice
