An update on compliance for containers
The inability to determine the contents of container images is a topic that annoys Dirk Hohndel. At last year's Legal and Licensing Workshop (LLW), he gave a presentation that highlighted the problem and some work he had been doing to combat it. At this year's LLW, he updated attendees on the progress that has been made and where he hopes things will go from here.
Hohndel is the chief open source officer at VMware, but his talk has little to do with his company. The topic comes out of his sense of annoyance with the problem; when he gets annoyed, he tends to do something about it, which sums up much of his career, he said. He has given his containers and compliance talk to both compliance and security people, because the problem is basically the same for both. When you ship a container image, what is actually in it and where does that code come from? That is obviously a license-compliance problem, but it is also a security concern; distributors of these images should want to know what is in them and recipients should want to as well.
![Dirk Hohndel [Dirk Hohndel]](https://static.lwn.net/images/2019/llw-hohndel-sm.jpg)
When he gives his talk, the responses follow a pattern; first there is a "little embarrassed laughter", then surprise, and, finally, attendees start looking around for someone else to do something about it. He took an informal audience poll to see how many had actually created a compliance process specifically for container images. Usually he gets no hands raised, but he got a few tentative ones at LLW, which he said was the first time that had happened. Everyone is talking about containers, of course, and Kubernetes is apparently the most important technology project of this millennium, he said, but he is "deeply surprised" that the topic does not come up more often in our community.
Container images are really nothing particularly new; they are just a packaging format for software. As with Linux distribution packaging formats or those for macOS or Windows, it is matter of ensuring that all of the dependencies for the software of interest are packaged or available with it. What container systems have done is to make it all easy; they have changed the art of packaging software from an engineering thing to a point-and-click exercise. That greatly reduces the barriers to entry, but the "best practices"—he explicitly pointed out that he was using "air quotes" for that term—available online are terrible. In last year's talk, he gave examples of some of what ends up in these images by following those practices, he said.
Progress
So, Hohndel asked, one year later, "have we made any progress at all?" There is a new Linux Foundation (LF) project, Automated Compliance Tooling (ACT), that has two sponsors, one of which is his employer. The focus of the project is to jointly develop tools to help address these problems. The name "ACT" has "so much irony" that he is unable to even come up with a joke about it, he said.
The fact that it is an LF project gives ACT both visibility and credibility, which is great, but there are few organizations that are working on it, which is frustrating. He highlighted three projects that are part of ACT: Tern, which he talked about a bit last year, Quartermaster (or QMSTR), and FOSSology. He spoke with various other people about them at the recently held Open Source Leadership Summit, which is misnamed, he said, since it is really the annual LF membership meeting. For the first time, though, he got traction with other open-source projects in the compliance space; there were discussions of how to integrate the tools in various ways to make them more useful for everyone.
The container-management tooling that creates container images uses layers to build up the image. The layers are essentially tar files with some extra pieces to describe how they fit and they are designed to replace pieces from the lower layers—without users being aware that it is being done. From a usability perspective, "that is fantastic", he said, but for compliance and security, "not so much". People were interested and curious about the container-compliance problem, but no one seems to think it is an urgent problem that they need to immediately solve.
Trolling
Lawyers are risk averse, he said, whereas engineers will happily take risks, especially when they don't understand them—compliance and security fall into that category in his view. The lawyers in the community don't care about these problems because there has not been any enforcement action against container images. And the engineers say that the problem does not matter because the technology works. In order to combat that, what is needed is an enforcement threat, he said.
He suggested creating a "troll company" to start doing some enforcement activity. The "Trolltech" name suggestion from the audience was met with laughter, but shot down since it had already been used. Hohndel said that he was "kind of joking", but that "the line between a bad joke and the true reality is very very thin". He does not own any substantial copyright in the components of today's container images so he cannot simply do this on his own—sadly, no one is making Subsurface container images.
But it would be fairly easy for someone with bad intentions to go after container distributors for not complying with the licenses in their containers. For example, BusyBox and libraries under the GPLv3 with the runtime exception are in most containers. Enforcement of that sort might lead everyone to start using only permissively licensed code, which is not what we want, he said.
Tooling is the answer, he said. It can create an artifact that goes with the image, which contains all of the copyright notices as well as providing information on where to find the correct and corresponding source code for components that need it. That is what Tern is meant to do. It will dig into a container image, tear it apart, and provide an accurate assessment of what is inside it. For images based on APT and RPM components, Tern will create a bill of materials (BOM) that contains all of the different copyright notices, license information, including Software Package Data Exchange (SPDX) headers once that part gets upstream, and pointers to the source files. But it is also an open-source project that needs a lot more help. Unfortunately, "no one cares", he said.
Needs
Hohndel closed his talk with two requests for the assembled lawyers, engineers, and others interested in licensing issues who make up the attendees at LLW. First is help with the engineering projects. It is not necessary to be an LF member to contribute; these are normal open-source projects that are looking for help.
Second, he would like to get attendees' help in spreading the word about these problems. It can't only be him that is going around to conferences and talking about it. He would like to make it a topic of conversation in our community, but he would also like to get various big players in the container world to listen—and help. For example, getting Docker to take the problem seriously and for the Cloud Native Computing Foundation (CNCF), which is shepherding the Kubernetes project among others, to require that the base container images used in its projects have compliance information available.
With that, he switched to Q&A. One attendee asked about creating compliance artifacts as part of the build process, rather than after the fact. Container images should be compliant at the point they are pushed to image repositories, they said. Tern can do both, Hohndel said; it can build up its information during the creation of the image or later. It is significantly harder to do that after the fact, however.
Karen Sandler of the Software Freedom Conservancy (SFC) said that her organization does care about this problem. She recommended that people report non-compliant images to SFC. There is no need to be a rights-holder in the code involved to make a report, she said.
Another attendee said that the information should somehow be made part of the container image or, at least, accompany the image. This is one of the things that really irritates him, Hohndel said. Information about licenses was proposed for Dockerfiles at one point, but it was removed. He has been thinking that the universal identifiers created by Software Heritage would be a great way to link to the source code, but there is no way to get those into the binary image file. He believes that sidecar files are not a viable solution for end users, because those files have a tendency to get lost along the way. He also noted that there are some container-tools vendors that are touting the ability to strip "extra" information (e.g. package information) from images in order to reduce their size.
Container people are interested in scanning images for security vulnerabilities, another attendee said. They wondered if it might make sense to team up with a project like Clair. Hohndel said that container folks are interested in the BOM, but not the copyright and source code information. Tern is modular, though, so it is easy to add scanning target types; once it was able to do some of the scanning, some of the security projects started talking with Tern developers.
Engaging the major cloud players in helping to enforce compliance on the images that their customers run was another question. Hohndel said that he has talked with people from some of the cloud providers (e.g. Amazon and Google) to try to enlist their aid, but there was little enthusiasm there. When you are in the business of selling compute resources to customers, it is not in your financial interest to make that harder, he said.
The final query was about Alpine Linux, which is used as the base distribution for many container images. The attendee noted that Alpine strips all of the license files from its layer. Hohndel said that it is worse than that, in that Alpine strips all information about versions, except those actually stored in the individual binaries, from its layers. It makes it essentially impossible to comply with the licenses of the packages that it uses (e.g. BusyBox). Another attendee said that they have some contacts at Alpine and suggested that might be a path toward resolving the problem. Hohndel closed by saying that making that contact meant that his presentation was a complete success for him.
[I would like to thank the FSFE and the LLW Diamond sponsors, Intel, the Linux Foundation, and Red Hat, for their travel assistance to Barcelona for the conference.]
Index entries for this article | |
---|---|
Conference | Free Software Legal & Licensing Workshop/2019 |
Posted Apr 17, 2019 3:20 UTC (Wed)
by cyphar (subscriber, #110703)
[Link] (3 responses)
However, I'm hoping that my plans for OCIv2 images would allow for an embedded BOM -- which would help solve not only the licensing problem but also the more fundamental transparency problem (what is actually in this container).
Posted Apr 17, 2019 18:13 UTC (Wed)
by nishak (guest, #122100)
[Link]
Another thing that would be useful in OCIv2 is some method of tracking the provenance of base images. Currently, that information is lost when an image is distributed.
Posted Apr 17, 2019 18:18 UTC (Wed)
by pj (subscriber, #4506)
[Link] (1 responses)
Container images store arbitrary files... but they can't store a BOM? I suspect what you mean is they can't store a BOM _in any kind of standard way_.
Maybe there should be the moral equivalent of a `.well-known` or `META-INF` directory that can store this kind of info? As gzipped text it should be small enough for no one to care very much. All that's needed now is to agree on a standard set of info and a place to put extra/vendor-specific info.... but there are worse ways to get to a good standard than by someone influential taking a stab at it and then listening to everyone complain about what cases their attempt doesn't work for.
Posted Apr 18, 2019 11:34 UTC (Thu)
by cyphar (subscriber, #110703)
[Link]
Well, I said there wasn't an "easy way". There also doesn't happen to be a "standard way" but that's less of an issue if you can't even store it properly. Your suggestion appears to be to put the BOM in the layer data itself but I don't feel that's the best idea in the world -- BOM should be metadata (otherwise you're now talking about parsing the layer tar archives of an image in order to get BOM metadata). Not to mention you'd add even more magic files (.wh.* was enough of a pain, personally).
For an OCI image if you wanted to store BOM as metadata you'd need to store it in an annotation, but annotations aren't descriptors (typed pointers) in OCI which means that almost no client would actually be able to get the relevant data (and even if they could, you wouldn't get the same safety benefits of descriptors). In addition, the layering element is a problem (yet again[1]) because it means that you will have a BOM for each layer when really you'd want a BOM for the runtime rootfs you are actually going to be executing.
I believe all of this can be fixed with the right improvements to the spec, and I hope I can work on improving it in OCIv2.
[1]: https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar
Posted Apr 17, 2019 17:31 UTC (Wed)
by jhhaller (guest, #56103)
[Link] (5 responses)
Posted Apr 17, 2019 18:53 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link] (4 responses)
This is not my area of expertise, but my understanding is that merely making the information available counts as an "export" whether or not it's actually downloaded. I can't imagine that the responsibility for export control could be delegated to the downloader; if you have information subject to export controls then you must confirm that a prospective downloader can legally access that information *before* making it available to them.
Posted Apr 19, 2019 6:15 UTC (Fri)
by mageta (subscriber, #89696)
[Link]
Posted Apr 19, 2019 14:42 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Apr 19, 2019 20:38 UTC (Fri)
by jhhaller (guest, #56103)
[Link]
Posted Apr 23, 2019 22:43 UTC (Tue)
by rahvin (guest, #16953)
[Link]
I looked into containers for some personal stuff a couple years ago but I was boggled at how little information exists about what is in a container. Not only that but it's pretty frequent that containers build on other containers built on other containers, etc (there was a prior article where an example container was built off of something like 8 different containers). In the end there is all kinds of software of unknown versions in the container. There's no real update mechanism for all that software other than a container update and unless you track down the whole build process and duplicate it or do a full audit you often have no idea what software is even inside the container let alone what license, version or security vulnerabilities exist. They are an absolute security nightmare IMO unless you build your own container from scratch and update it yourself for any type of permanent service. Hell even a one off container could create a potential network breach into the internal network.
I think if this project can gain traction and acceptance it will actually move towards solving that problem. A manifest or JSON or something in the header that contained a list of the software and version of everything would do wonders to helping solve the security issue of "what is actually in this container", let alone the tracking issue for license and other compliance.
Posted Apr 18, 2019 9:30 UTC (Thu)
by LtWorf (subscriber, #124958)
[Link]
To sell this directly from google cloud market, google has some compliance procedures in place, so for every FOSS component that is not coming as a package in one of their supported distributions (ubuntu is supported) they require some licensing information.
For software covered by GPL or similar licenses, they require the source code to be included directly onto the image itself, so they avoid being redistributing GPL and later on not be able to provide the sources.
They do not allow any AGPL software to run on their machines.
Because we base on ubuntu and we package the internal proprietary software as .deb files, I started adding some license information as machine readable debian/copyright files, and then created a tool that scans all the packages that are not coming from the ubuntu repositories and makes a report that can be sent to google.
Of course this solution is very hand-made and requires the coypright files to be accurate in the 1st place, but it seems incorrect to say that cloud providers don't have interest in this. Being redistributors, they are the ones that are responsible for providing the sources when requested, so they do have an interest in compliance.
Posted Apr 18, 2019 11:24 UTC (Thu)
by walex (guest, #69836)
[Link] (2 responses)
Posted Apr 23, 2019 22:48 UTC (Tue)
by rahvin (guest, #16953)
[Link] (1 responses)
Containers are used heavily for services with highly variable loads where they can spin up additional nodes quickly and easily. Most of the major tech companies including AWS use containers extensively behind the scenes. They are literally at the forefront of web deployments these days with all the major providers using them on their cloud platforms. People using these things professionally are just building their own containers (so they control the product) instead of downloading and doing "fire and forget" on some random crap they downloaded from a user built repository.
Posted Apr 25, 2019 8:05 UTC (Thu)
by nilsmeyer (guest, #122604)
[Link]
It's very easy to get a base setup for container infrastructure running, you can get all the script and automation to pull up infrastructure very quickly, however there is a very steep learning curve to actually understanding everything you have just deployed. Also, there is a very strong lack of policy and enforcement of policies because that's usually not an area that drives delivery of features. As long as the cost-benefit doesn't shift here many organizations and individuals will continue to wing it and think everything is alright because nothing ever actually happens or if something happens nobody is held accountable.
Posted Apr 18, 2019 13:28 UTC (Thu)
by civodul (guest, #58311)
[Link] (1 responses)
I have often been citing Hohndel's previous talk as an illustration of the lack of transparency that plague container images. It hurts not just licensing, but also security, reproducibility, and user freedom. In GNU Guix we have this Since the vast majority of Guix packages are bit-reproducible, since it can "travel back in time" (you can recreate the Guix of last month or last year, and from there rebuild packages it provides), and since it is now backed by Software Heritage, anyone can recreate a container image and verify that they obtain the same image, bit for bit. The container image can finally be considered a build artifact, and the real source is the Guix manifest and commit ID to use to build it. I hope container image creation tools will converge towards such a declarative and traceable approach. That would be a significant improvement over the current state of affairs.
Posted Apr 18, 2019 14:15 UTC (Thu)
by cyphar (subscriber, #110703)
[Link]
We have a similar tool in openSUSE (kiwi) which works in a very similar fashion (though it's all configured through XML, a sign of when it was first developed). We use it to build container images as well as other media such as ISOs and VM disk images (it also has support for a bunch of other distros, and integrates into OBS). I completely agree that the lack of wide-spread use of such tooling really hurts the current state of container images. Distributions figured out how to track what packages are present in ISOs 20 years ago, but now with container images we have to relive the same issues but this time it's tar archives (and don't get me started about the issues with tar).
Posted Apr 23, 2019 4:21 UTC (Tue)
by marcH (subscriber, #57642)
[Link]
> He suggested creating a "troll company" to start doing some enforcement activity.
Thanks, that was my thought exactly. Nothing ever goes beyond the prototype stage unless money's involved one way or the other. Double whammy if patent lawyers from the eastern district of Texas[*] get "re-purposed" for that.
Same approach for security: scan, name and shame. Responsible disclosure for people making some effort, zero-day for the dangerous idiots stripping version information.
(Stupid) problem solved, let's all move back to actual software engineering. Life is short.
[*] a number of them recently put out of work by the Supreme court - and Apple
Posted Apr 28, 2019 14:03 UTC (Sun)
by rlhamil (guest, #6472)
[Link]
And of course, version all this stuff, 'cause it's likely to evolve.
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers
An update on compliance for containers: but they are meant to be abandonware
An update on compliance for containers: but they are meant to be abandonware
An update on compliance for containers: but they are meant to be abandonware
An update on compliance for containers
guix pack
tool that can build container images (OCI notably). The key difference with things like Dockerfiles is that it's declarative: instead of providing a series of commands to populate your image, you list precisely the packages you want in the image.An update on compliance for containers
An update on compliance for containers
https://lwn.net/Articles/777407/ Cox: Our Software Dependency Problem
https://www.theverge.com/circuitbreaker/2019/2/22/1823642...
An update on compliance for containers