An update on compliance for containers

By Jake Edge
April 16, 2019

The inability to determine the contents of container images is a topic that annoys Dirk Hohndel. At last year's Legal and Licensing Workshop (LLW), he gave a presentation that highlighted the problem and some work he had been doing to combat it. At this year's LLW, he updated attendees on the progress that has been made and where he hopes things will go from here.

Hohndel is the chief open source officer at VMware, but his talk has little to do with his company. The topic comes out of his sense of annoyance with the problem; when he gets annoyed, he tends to do something about it, which sums up much of his career, he said. He has given his containers and compliance talk to both compliance and security people, because the problem is basically the same for both. When you ship a container image, what is actually in it and where does that code come from? That is obviously a license-compliance problem, but it is also a security concern; distributors of these images should want to know what is in them and recipients should want to as well.

When he gives his talk, the responses follow a pattern; first there is a "little embarrassed laughter", then surprise, and, finally, attendees start looking around for someone else to do something about it. He took an informal audience poll to see how many had actually created a compliance process specifically for container images. Usually he gets no hands raised, but he got a few tentative ones at LLW, which he said was the first time that had happened. Everyone is talking about containers, of course, and Kubernetes is apparently the most important technology project of this millennium, he said, but he is "deeply surprised" that the topic does not come up more often in our community.

Container images are really nothing particularly new; they are just a packaging format for software. As with Linux distribution packaging formats or those for macOS or Windows, it is matter of ensuring that all of the dependencies for the software of interest are packaged or available with it. What container systems have done is to make it all easy; they have changed the art of packaging software from an engineering thing to a point-and-click exercise. That greatly reduces the barriers to entry, but the "best practices"—he explicitly pointed out that he was using "air quotes" for that term—available online are terrible. In last year's talk, he gave examples of some of what ends up in these images by following those practices, he said.

Progress

So, Hohndel asked, one year later, "have we made any progress at all?" There is a new Linux Foundation (LF) project, Automated Compliance Tooling (ACT), that has two sponsors, one of which is his employer. The focus of the project is to jointly develop tools to help address these problems. The name "ACT" has "so much irony" that he is unable to even come up with a joke about it, he said.

The fact that it is an LF project gives ACT both visibility and credibility, which is great, but there are few organizations that are working on it, which is frustrating. He highlighted three projects that are part of ACT: Tern, which he talked about a bit last year, Quartermaster (or QMSTR), and FOSSology. He spoke with various other people about them at the recently held Open Source Leadership Summit, which is misnamed, he said, since it is really the annual LF membership meeting. For the first time, though, he got traction with other open-source projects in the compliance space; there were discussions of how to integrate the tools in various ways to make them more useful for everyone.

The container-management tooling that creates container images uses layers to build up the image. The layers are essentially tar files with some extra pieces to describe how they fit and they are designed to replace pieces from the lower layers—without users being aware that it is being done. From a usability perspective, "that is fantastic", he said, but for compliance and security, "not so much". People were interested and curious about the container-compliance problem, but no one seems to think it is an urgent problem that they need to immediately solve.

Trolling

Lawyers are risk averse, he said, whereas engineers will happily take risks, especially when they don't understand them—compliance and security fall into that category in his view. The lawyers in the community don't care about these problems because there has not been any enforcement action against container images. And the engineers say that the problem does not matter because the technology works. In order to combat that, what is needed is an enforcement threat, he said.

He suggested creating a "troll company" to start doing some enforcement activity. The "Trolltech" name suggestion from the audience was met with laughter, but shot down since it had already been used. Hohndel said that he was "kind of joking", but that "the line between a bad joke and the true reality is very very thin". He does not own any substantial copyright in the components of today's container images so he cannot simply do this on his own—sadly, no one is making Subsurface container images.

But it would be fairly easy for someone with bad intentions to go after container distributors for not complying with the licenses in their containers. For example, BusyBox and libraries under the GPLv3 with the runtime exception are in most containers. Enforcement of that sort might lead everyone to start using only permissively licensed code, which is not what we want, he said.

Tooling is the answer, he said. It can create an artifact that goes with the image, which contains all of the copyright notices as well as providing information on where to find the correct and corresponding source code for components that need it. That is what Tern is meant to do. It will dig into a container image, tear it apart, and provide an accurate assessment of what is inside it. For images based on APT and RPM components, Tern will create a bill of materials (BOM) that contains all of the different copyright notices, license information, including Software Package Data Exchange (SPDX) headers once that part gets upstream, and pointers to the source files. But it is also an open-source project that needs a lot more help. Unfortunately, "no one cares", he said.

Needs

Hohndel closed his talk with two requests for the assembled lawyers, engineers, and others interested in licensing issues who make up the attendees at LLW. First is help with the engineering projects. It is not necessary to be an LF member to contribute; these are normal open-source projects that are looking for help.

Second, he would like to get attendees' help in spreading the word about these problems. It can't only be him that is going around to conferences and talking about it. He would like to make it a topic of conversation in our community, but he would also like to get various big players in the container world to listen—and help. For example, getting Docker to take the problem seriously and for the Cloud Native Computing Foundation (CNCF), which is shepherding the Kubernetes project among others, to require that the base container images used in its projects have compliance information available.

With that, he switched to Q&A. One attendee asked about creating compliance artifacts as part of the build process, rather than after the fact. Container images should be compliant at the point they are pushed to image repositories, they said. Tern can do both, Hohndel said; it can build up its information during the creation of the image or later. It is significantly harder to do that after the fact, however.

Karen Sandler of the Software Freedom Conservancy (SFC) said that her organization does care about this problem. She recommended that people report non-compliant images to SFC. There is no need to be a rights-holder in the code involved to make a report, she said.

Another attendee said that the information should somehow be made part of the container image or, at least, accompany the image. This is one of the things that really irritates him, Hohndel said. Information about licenses was proposed for Dockerfiles at one point, but it was removed. He has been thinking that the universal identifiers created by Software Heritage would be a great way to link to the source code, but there is no way to get those into the binary image file. He believes that sidecar files are not a viable solution for end users, because those files have a tendency to get lost along the way. He also noted that there are some container-tools vendors that are touting the ability to strip "extra" information (e.g. package information) from images in order to reduce their size.

Container people are interested in scanning images for security vulnerabilities, another attendee said. They wondered if it might make sense to team up with a project like Clair. Hohndel said that container folks are interested in the BOM, but not the copyright and source code information. Tern is modular, though, so it is easy to add scanning target types; once it was able to do some of the scanning, some of the security projects started talking with Tern developers.

Engaging the major cloud players in helping to enforce compliance on the images that their customers run was another question. Hohndel said that he has talked with people from some of the cloud providers (e.g. Amazon and Google) to try to enlist their aid, but there was little enthusiasm there. When you are in the business of selling compute resources to customers, it is not in your financial interest to make that harder, he said.

The final query was about Alpine Linux, which is used as the base distribution for many container images. The attendee noted that Alpine strips all of the license files from its layer. Hohndel said that it is worse than that, in that Alpine strips all information about versions, except those actually stored in the individual binaries, from its layers. It makes it essentially impossible to comply with the licenses of the packages that it uses (e.g. BusyBox). Another attendee said that they have some contacts at Alpine and suggested that might be a path toward resolving the problem. Hohndel closed by saying that making that contact meant that his presentation was a complete success for him.

[I would like to thank the FSFE and the LLW Diamond sponsors, Intel, the Linux Foundation, and Red Hat, for their travel assistance to Barcelona for the conference.]

Index entries for this article
Conference	Free Software Legal & Licensing Workshop/2019

An update on compliance for containers

Posted Apr 17, 2019 3:20 UTC (Wed) by cyphar (subscriber, #110703) [Link] (3 responses)

Having a BOM is not just useful for licenses, but also for simply knowing what packages are inside a container. Within openSUSE, OBS provides this information (we build our container images using KIWI) which provides some neat features like having automated *image* rebuilds when there is a *package* change in the dependency tree. Unfortunately this information is all done out-of-band at the moment because OCI images don't provide an easy way to store a BOM directly in the image.

However, I'm hoping that my plans for OCIv2 images would allow for an embedded BOM -- which would help solve not only the licensing problem but also the more fundamental transparency problem (what is actually in this container).

An update on compliance for containers

Posted Apr 17, 2019 18:13 UTC (Wed) by nishak (guest, #122100) [Link]

For what it's worth, the output of Tern can provide some baseline on what type of metadata could/should be considered.

Another thing that would be useful in OCIv2 is some method of tracking the provenance of base images. Currently, that information is lost when an image is distributed.

An update on compliance for containers

Posted Apr 17, 2019 18:18 UTC (Wed) by pj (subscriber, #4506) [Link] (1 responses)

>OCI images don't provide an easy way to store a BOM directly in the image.

Container images store arbitrary files... but they can't store a BOM? I suspect what you mean is they can't store a BOM _in any kind of standard way_.

Maybe there should be the moral equivalent of a `.well-known` or `META-INF` directory that can store this kind of info? As gzipped text it should be small enough for no one to care very much. All that's needed now is to agree on a standard set of info and a place to put extra/vendor-specific info.... but there are worse ways to get to a good standard than by someone influential taking a stab at it and then listening to everyone complain about what cases their attempt doesn't work for.

An update on compliance for containers

Posted Apr 18, 2019 11:34 UTC (Thu) by cyphar (subscriber, #110703) [Link]

> I suspect what you mean is they can't store a BOM _in any kind of standard way_.

Well, I said there wasn't an "easy way". There also doesn't happen to be a "standard way" but that's less of an issue if you can't even store it properly. Your suggestion appears to be to put the BOM in the layer data itself but I don't feel that's the best idea in the world -- BOM should be metadata (otherwise you're now talking about parsing the layer tar archives of an image in order to get BOM metadata). Not to mention you'd add even more magic files (.wh.* was enough of a pain, personally).

For an OCI image if you wanted to store BOM as metadata you'd need to store it in an annotation, but annotations aren't descriptors (typed pointers) in OCI which means that almost no client would actually be able to get the relevant data (and even if they could, you wouldn't get the same safety benefits of descriptors). In addition, the layering element is a problem (yet again[1]) because it means that you will have a BOM for each layer when really you'd want a BOM for the runtime rootfs you are actually going to be executing.

I believe all of this can be fixed with the right improvements to the spec, and I hope I can work on improving it in OCIv2.

[1]: https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar

An update on compliance for containers

Posted Apr 17, 2019 17:31 UTC (Wed) by jhhaller (guest, #56103) [Link] (5 responses)

One of the other concerns is export regulations. Docker Hub deals with these somewhat casually, which puts the problem on the downloader, telling them not to download something if it's not legal from the source country to support downloads into the destination country and/or the nationality of the people controlling the download. I doubt the is enough to keep Docker out of trouble. Export of source code is typically under fewer restrictions than binaries, at least from the US, but containers are generally binaries. It gets quite confusing when distribution servers get involved, such as Akamai. What's the source country? The place Akamai copied it from, or the place(s) where there is a cached copy, and was the process of making a cached copy an export?

An update on compliance for containers

Posted Apr 17, 2019 18:53 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (4 responses)

> ... puts the problem on the downloader, telling them not to download something if it's not legal from the source country to support downloads into the destination country ...

This is not my area of expertise, but my understanding is that merely making the information available counts as an "export" whether or not it's actually downloaded. I can't imagine that the responsibility for export control could be delegated to the downloader; if you have information subject to export controls then you must confirm that a prospective downloader can legally access that information *before* making it available to them.

An update on compliance for containers

Posted Apr 19, 2019 6:15 UTC (Fri) by mageta (subscriber, #89696) [Link]

That is my understanding as well (IANAL). One takeaway from my annual export training is: If an export can technically happen, it counts as export done.

An update on compliance for containers

Posted Apr 19, 2019 14:42 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

Export control (ITAR and friends) are, I think, not what is being discussed here. This is rather about the embargoes about specific nation states being a thing. ITAR and such are more about all non-US persons. I'm sure other countries have similar classifications of information.

An update on compliance for containers

Posted Apr 19, 2019 20:38 UTC (Fri) by jhhaller (guest, #56103) [Link]

It's actually both ITAR and embargos. While I'm a US person, my employer is headquartered elsewhere, and has three different countries which are used as the export source. We have a good handle on our software, but have to pay attention to what might be in a base container, or be added from some site. There's a lot of commonality with FOSS license term compliance, but not completely the same.

An update on compliance for containers

Posted Apr 23, 2019 22:43 UTC (Tue) by rahvin (guest, #16953) [Link]

As mentioned it's far more than ITAR based items. Each country has export restrictions, in the US there is a list of countries that you can't export certain things to, including some that you can't export anything to. If you make it available to those countries it's a federal felony and if they ever come after people the penalty is going to be severe to make an example of the first prosecutions (like they did with the banks and Chinese tech companies that had traded secretly with Iran where the banks got fined several billion apiece and the Chinese tech companies were bared from buying/using US technology).

I looked into containers for some personal stuff a couple years ago but I was boggled at how little information exists about what is in a container. Not only that but it's pretty frequent that containers build on other containers built on other containers, etc (there was a prior article where an example container was built off of something like 8 different containers). In the end there is all kinds of software of unknown versions in the container. There's no real update mechanism for all that software other than a container update and unless you track down the whole build process and duplicate it or do a full audit you often have no idea what software is even inside the container let alone what license, version or security vulnerabilities exist. They are an absolute security nightmare IMO unless you build your own container from scratch and update it yourself for any type of permanent service. Hell even a one off container could create a potential network breach into the internal network.

I think if this project can gain traction and acceptance it will actually move towards solving that problem. A manifest or JSON or something in the header that contained a list of the software and version of everything would do wonders to helping solve the security issue of "what is actually in this container", let alone the tracking issue for license and other compliance.

An update on compliance for containers

Posted Apr 18, 2019 9:30 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

My company makes an "appliance" software which is basically an ubuntu-based distribution with some modifications and some proprietary software on top of it.

To sell this directly from google cloud market, google has some compliance procedures in place, so for every FOSS component that is not coming as a package in one of their supported distributions (ubuntu is supported) they require some licensing information.

For software covered by GPL or similar licenses, they require the source code to be included directly onto the image itself, so they avoid being redistributing GPL and later on not be able to provide the sources.

They do not allow any AGPL software to run on their machines.

Because we base on ubuntu and we package the internal proprietary software as .deb files, I started adding some license information as machine readable debian/copyright files, and then created a tool that scans all the packages that are not coming from the ubuntu repositories and makes a report that can be sent to google.

Of course this solution is very hand-made and requires the coypright files to be accurate in the 1st place, but it seems incorrect to say that cloud providers don't have interest in this. Being redistributors, they are the ones that are responsible for providing the sources when requested, so they do have an interest in compliance.

An update on compliance for containers: but they are meant to be abandonware

Posted Apr 18, 2019 11:24 UTC (Thu) by walex (guest, #69836) [Link] (2 responses)

The concern about the legal and security implications of the content of container images, but it is also irrelevant: pretty much the entire business case for containers and container images is for them to be "abandonware" ("fire and forget" is an euphemism) black-boxes, to save a lot of sysadmin costs by not managing or even just maintaining them.

An update on compliance for containers: but they are meant to be abandonware

Posted Apr 23, 2019 22:48 UTC (Tue) by rahvin (guest, #16953) [Link] (1 responses)

That is not accurate at all.

Containers are used heavily for services with highly variable loads where they can spin up additional nodes quickly and easily. Most of the major tech companies including AWS use containers extensively behind the scenes. They are literally at the forefront of web deployments these days with all the major providers using them on their cloud platforms. People using these things professionally are just building their own containers (so they control the product) instead of downloading and doing "fire and forget" on some random crap they downloaded from a user built repository.

An update on compliance for containers: but they are meant to be abandonware

Posted Apr 25, 2019 8:05 UTC (Thu) by nilsmeyer (guest, #122604) [Link]

I think pretty much both cases happen. I especially see inexperienced people get it wrong very often, mostly since they don't know any better or pick the path of least resistance, basically conditioned to mostly copy & paste from tutorials, stack overflow, blog articles etc. These are professionals in the sense that they're being paid for their work.

It's very easy to get a base setup for container infrastructure running, you can get all the script and automation to pull up infrastructure very quickly, however there is a very steep learning curve to actually understanding everything you have just deployed. Also, there is a very strong lack of policy and enforcement of policies because that's usually not an area that drives delivery of features. As long as the cost-benefit doesn't shift here many organizations and individuals will continue to wing it and think everything is alright because nothing ever actually happens or if something happens nobody is held accountable.

An update on compliance for containers

Posted Apr 18, 2019 13:28 UTC (Thu) by civodul (guest, #58311) [Link] (1 responses)

I have often been citing Hohndel's previous talk as an illustration of the lack of transparency that plague container images. It hurts not just licensing, but also security, reproducibility, and user freedom.

In GNU Guix we have this guix pack tool that can build container images (OCI notably). The key difference with things like Dockerfiles is that it's declarative: instead of providing a series of commands to populate your image, you list precisely the packages you want in the image.

Since the vast majority of Guix packages are bit-reproducible, since it can "travel back in time" (you can recreate the Guix of last month or last year, and from there rebuild packages it provides), and since it is now backed by Software Heritage, anyone can recreate a container image and verify that they obtain the same image, bit for bit. The container image can finally be considered a build artifact, and the real source is the Guix manifest and commit ID to use to build it.

I hope container image creation tools will converge towards such a declarative and traceable approach. That would be a significant improvement over the current state of affairs.

An update on compliance for containers

Posted Apr 18, 2019 14:15 UTC (Thu) by cyphar (subscriber, #110703) [Link]

We have a similar tool in openSUSE (kiwi) which works in a very similar fashion (though it's all configured through XML, a sign of when it was first developed). We use it to build container images as well as other media such as ISOs and VM disk images (it also has support for a bunch of other distros, and integrates into OBS).

I completely agree that the lack of wide-spread use of such tooling really hurts the current state of container images. Distributions figured out how to track what packages are present in ISOs 20 years ago, but now with container images we have to relive the same issues but this time it's tar archives (and don't get me started about the issues with tar).

An update on compliance for containers

Posted Apr 23, 2019 4:21 UTC (Tue) by marcH (subscriber, #57642) [Link]

If only it were just containers... "For example, the Node.js dependency manager NPM provides access to over 750,000 packages"
https://lwn.net/Articles/777407/ Cox: Our Software Dependency Problem

> He suggested creating a "troll company" to start doing some enforcement activity.

Thanks, that was my thought exactly. Nothing ever goes beyond the prototype stage unless money's involved one way or the other. Double whammy if patent lawyers from the eastern district of Texas[*] get "re-purposed" for that.

Same approach for security: scan, name and shame. Responsible disclosure for people making some effort, zero-day for the dangerous idiots stripping version information.

(Stupid) problem solved, let's all move back to actual software engineering. Life is short.

[*] a number of them recently put out of work by the Supreme court - and Apple
https://www.theverge.com/circuitbreaker/2019/2/22/1823642...

An update on compliance for containers

Posted Apr 28, 2019 14:03 UTC (Sun) by rlhamil (guest, #6472) [Link]

Maybe I didn't see it, but it seems that in addition to including the licensing info in a retrievable form within a container file, a record of the tools (or a reserved flag for manual entry) used to gather and install the licensing info should also be stored; and finally, the whole container should be signed by its distributor, so that there can be high confidence in its legitimacy. The latter would also encourage a container provider to do a more responsible for doing a reasonably competent AND compliant job of creating the container, since they couldn't so easily claim they didn't do it/didn't know.

And of course, version all this stuff, 'cause it's likely to evolve.