|
|
Subscribe / Log in / New account

Containers and license compliance

By Jake Edge
May 2, 2018

LLW

Containers are, of course, all the rage these days; in fact, during his 2018 Legal and Licensing Workshop (LLW) talk, Dirk Hohndel said with a grin that he hears "containers may take off". But, while containers are easy to set up and use, license compliance for containers is "incredibly hard". He has been spending "way too much time" thinking about container compliance recently and, beyond the standard "let's go shopping" solution to hard problems, has come up with some ideas. Hohndel is a longtime member of the FOSS community who is now the chief open source officer at VMware—a company that ships some container images.

He said that he would be using Docker in his examples, but he is not picking on Docker, it is just a well-known container management system. His talk is targeting those that want to ship an actual container image, rather than simply a Dockerfile that a customer would build into an image. He has heard of some trying to avoid "distributing" free and open-source software that way, but is rather skeptical of that approach.

Docker "hello, world"

So he looked at the Docker equivalent of "hello, world"; he used Debian as the base and had it run the echo command for the string "Hello LLW2018". Running it in Docker gave the string as expected, but digging around under the hood was rather eye-opening. In order to make that run, the image contained 81 separate packages, "just to say 'hi'". It contains Bash, forty different libraries of various kinds including some for C++, and so on, he said. Beyond that, there is support for SELinux and audit, so the container must be "extremely secure in how it prints 'hello world'".

[Dirk Hohndel]

In reality, most containers are far more complex, of course. For example, it is fairly common for Dockerfiles to wget a binary of gosu ("Simple Go-based setuid+setgid+setgroups+exec") to install it. This is bad from a security perspective, but worse from a compliance perspective, Hohndel said.

People do "incredibly dumb stuff" in their Dockerfiles, including adding new repositories with higher priorities than the standard distribution repositories, then doing an update. That means the standard packages might be replaced with others from elsewhere. Once again, that is a security nightmare, but it may also mean that there is no source code available and/or that the license information is missing. This is not something he made up, he said, if you look at the Docker repositories, you will see this kind of thing all over; many will just copy their Dockerfiles from elsewhere.

Even the standard practices are somewhat questionable. Specifying "debian:stable" as the base could change what gets built between two runs. Updating to the latest packages (e.g. using "apt-get update") is good for the security of the system, but it means that you may get different package versions every time you rebuild. Information on versions can be extracted from the package database on most builds, though there are "pico containers" that remove that database in order to save space—making it impossible to know what is present in the image.

It gets worse

But it gets even worse, Hohndel said. Most people start with a Dockerfile they just find somewhere. If you look at the Dockerfile for Elasticsearch, for example, it installs gosu and uses the Dockerfile for OpenJDK 8, which in turn uses other Dockerfiles. One of those is for Debian "stretch", which also updates all of the packages.

There is a "rabbit hole" that you need to follow, Dockerfile to Dockerfile, to figure out what you are actually shipping. He has done a search of official Docker images and did not find a single one that follows compliance best practices. All of the Dockerfiles grab other Dockerfiles—on and on.

No one wants to hear about these problems, Hohndel said; he has tried. He is a big fan of free software, but not really a fan of enforcement; he would rather simply fix the problems. But in order to fix these problems, people have to understand and care about compliance. He has been to KubeCon, and will be again soon, trying to educate folks about these problems. At one of the talks, he asked how many copyleft packages were in a particular Docker image, but he just got blank stares.

In the container image for an uncomplicated three-tier application, he counted 650 packages. The problem is only getting worse, he said. It is "incredibly hard" to get compliance right if it is done at build time, but it is "pretty much impossible" to do after that point. It is important to get people to understand that the complexity of what they are shipping in containers is much greater than what a few simple commands might indicate.

The problems with container images are many. It is hard to figure out which packages are included in the build. The version and which patches are applied are also difficult to determine. Beyond that, the licenses under which those packages are distributed are not obvious. He has seen containers that try to save space by statically linking various pieces that may not be linkable based on their licenses.

The tooling that the industry has developed makes it quick and easy to throw together an image. But it also, "hopefully unintentionally", makes it easy to create a "total compliance nightmare", Hohndel said.

What should be done

Telling people to stop shipping containers is not going to work, so another approach is needed. Containers need to be built starting from a base that has known-good package versions, corresponding source code, and licenses. The anti-pattern of installing stuff from random internet locations needs to be avoided. And software developers need to be trained about the pitfalls of the container build systems, which should not be hard, but is.

Any layers that will be added on top of the base need to be tracked as well. The versions, source location, and licenses should all be stored and a source-code management system should be used to track the information over time. One way to do so is to annotate the Dockerfiles with the meta information about the packages, though creating these annotations is hard, he said.

VMware has started the Tern project to help automate the creation of a bill of materials (BOM) for a container image. It will determine what packages are present in the image from the Dockerfile, but it also understands some of the commands that are used in Dockerfiles to retrieve and install packages, so it can track those too. It is a work in progress, Hohndel said, but may be helpful for container compliance.

[I would like to thank the LLW Platinum sponsors, Intel, the Linux Foundation, and Red Hat, for their travel assistance support to Barcelona for the conference.]

Index entries for this article
ConferenceFree Software Legal & Licensing Workshop/2018


to post comments

Containers and license compliance

Posted May 3, 2018 1:15 UTC (Thu) by Eliot (guest, #19701) [Link] (1 responses)

Are XKCD and LWN cooperating this week? https://xkcd.com/1988

Containers and license compliance

Posted May 3, 2018 17:16 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

I would rather attribute this to awesome KubeCon+CloudNativeConf happening _right now_.

Containers and license compliance

Posted May 3, 2018 5:31 UTC (Thu) by unixbhaskar (guest, #44758) [Link] (3 responses)

Off topic: two years back, when I was fiddling with containers for earning quick bucks ...I heard "container fanboy's", they were talking about running DNS server in containers!!!!! ...irk ...I was horrified ...it was scary ...I mean they have lost their mind completely.

Containers and license compliance

Posted May 3, 2018 13:11 UTC (Thu) by mageta (subscriber, #89696) [Link] (1 responses)

Huh? Sry, I don't get it. Whats wrong with running internet-facing services in containers for a bit of extra isolation?

Containers and license compliance

Posted May 3, 2018 17:15 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

Or using internal DNS for service discovery inside Kubernetes cluster, for that matter?

Containers and license compliance

Posted May 4, 2018 1:18 UTC (Fri) by khim (subscriber, #9252) [Link]

DNS servers (well, bind specifically) were run in containers before containers existed! Yeah, it was called "chroot jail" back then, but most distributions supported that mode for DECADES!

Containers and license compliance

Posted May 3, 2018 11:29 UTC (Thu) by patrakov (subscriber, #97174) [Link]

I think there is some analogy here with the Java world where a similar compliance problem exists. The root is that people just don't care as long as it works.

Containers and license compliance

Posted May 3, 2018 13:29 UTC (Thu) by flussence (guest, #85566) [Link] (3 responses)

At this point, these “Web Dev” Containers seem harder, more bloated, more of a liability, and all-around worse than simply running a distro in a traditional jail/openvz or even Xen setup.

Containers and license compliance

Posted May 3, 2018 15:13 UTC (Thu) by k8to (guest, #15413) [Link] (2 responses)

I think there's tension in that "small containers" have more potential benefits, but require a lot more up-front planning and work. So there's just a steady slide towards containers being entire OS images and fairly unknown behavior in how they're put together.

Containers and license compliance

Posted May 3, 2018 18:41 UTC (Thu) by jdulaney (subscriber, #83672) [Link] (1 responses)

I build my base image myself, and then build everything on top of that. I'm pretty sure I know what is going on with my containers.

Containers and license compliance

Posted May 8, 2018 15:20 UTC (Tue) by k8to (guest, #15413) [Link]

By "small containers" I was referring to minimal containers where there is not even a base image. Maybe a couple of executables and a few config files, or perhaps dumbinit and a single server program. There's a lot of benefit to not having a whole OS image, but this doesn't seem to be on the radar of most folks.

Containers and license compliance

Posted May 3, 2018 15:17 UTC (Thu) by MatyasSelmeci (guest, #86151) [Link] (3 responses)

Docker Hub keeps the build logs around for containers. Can't you inspect those to find out exactly what got installed?

Containers and license compliance

Posted May 4, 2018 0:17 UTC (Fri) by nishak (guest, #122100) [Link] (2 responses)

You could, but for images that are built on top of other images you would have to find the page hosting that image's build log and so on until you come to one that is build using FROM scratch. And if all that Dockerfile had was ADD rootfs.tar.gz and if the manifest for the build is not published then you're SOL.
Incidentally, I haven't found any build logs hosted on Dockerhub for official images. Where do I find those?

Containers and license compliance

Posted May 4, 2018 14:52 UTC (Fri) by MatyasSelmeci (guest, #86151) [Link] (1 responses)

Ah, I see the problem then.

Also, looks like you can only see build logs for "automated build" images so that's unfortunate.

Containers and license compliance

Posted May 5, 2018 0:41 UTC (Sat) by rahvin (guest, #16953) [Link]

It's pointed out in the article that a bunch of these containers pull other containers which pull other containers, etc.. In the end you have a container that's pulling a half dozen other containers and no one even knows what's installed in the container to the point that he provides an example where there are 600+ software packages installed in a single container.

The point of the article is about how hard it is to figure out if you are complying with the licenses in such a situation but the only thing I could think of is what a security nightmare that is because if your container image is pulling other container images you probably can't easily track down what's even installed even if the first container is well documented there is no guarantee all the reference containers are.

I know docker is popular but this is one the things that stops me every time I think of using a Docker container, they are way to black-box for me.

Containers and license compliance

Posted May 4, 2018 16:06 UTC (Fri) by civodul (guest, #58311) [Link]

With guix pack we provide a way to provision containers in a declarative and bit-reproducible fashion: for a given commit of Guix, guix pack -f docker python python-numpy (say) always produces the same Docker image, bit-for-bit. That makes it easy to create those images and provides provenance tracking—no need to walk a whole bunch of Dockerfiles and possibly volatile repos.

I think that makes the licensing situation safer; the bits in the image don't matter much once you have an automated and reproducible way to reconstruct them.

Containers and license compliance

Posted May 5, 2018 11:42 UTC (Sat) by justincormack (subscriber, #70439) [Link]

If you are logged in the licensing info and package info is available eg https://hub.docker.com/r/library/mysql/tags/8/

You can definitely make it difficult, but containers are also useful for building well designed build pipelines with metadata, but there is a big divergence between the dont cares.

If you have traceability issues with the official Docker images please open an issue at the relevant repo in https://github.com/docker-library - they should traceable from git commits there and package information should not be removed. The tarballs are provided by upstream and should also be identifiable.

Containers and license compliance

Posted Oct 9, 2018 23:00 UTC (Tue) by grob (guest, #127762) [Link]

This is a subject close to my heart having just built a GPL-untainted base image over the course of the last 10 days or so after getting "the fear" - specifically of Java's GPL/Oracle proprietary license combo.

I documented my journey over here - https://medium.com/@robbie.gibbon/why-i-built-my-own-os-d.... So far as I know, my "gpl-free-base-image" is the only thing of its kind out there. Its built on mksh, the heirloom UNIX sys V core tools code drop (nearly 20 years old!), musl libc, python, and the usual line up of libraries (ncurses, openssl [so perhaps not 100% compliant with US cryptographic export rules at the moment], libffi, the antique Minix cawf and a few others).

My reasoning is that keeping control firmly in the user's hands and empowering them to extend the system with the GPL package versions that they want and from the distribution server of their choice is making every effort to comply with the ethos of the user-oriented freedoms aspects of the GPL. But it means that on every container initialisation, the user will likely need to bootstrap a download of glibc, openjdk, bash, GNU coretools etc. Of course depending on the needs of the application - for example a Node.JS application with node linking against Musl could probably run without any of that stuff needing to be downloaded. Anyway whilst really, really far from perfect, it does ensure that the end user retains governance, lineage and control over key components that are running in the container, and makes license compliance somewhat simpler for me as the distributor.

The best answer would be to reengineer the system to enable the user to substitute arbitrary interstitial slices of a container image, perhaps even only providing a metadata 'spec' of what services those slices need to provide.

I concluded that Docker Images as they stand are quite imperfect technology, there is simply no easy, automatic, idempotent and market ready way to build a both [[L][A]GPL in particular] license compliant and secure image with the technology as it stands today, and the traditional commercial Linux distributors, rather than stepping up to this challenge and solving these issues (perhaps even making some $$ in the process) - and in particular I'm thinking of Red Hat and SuSE - have stuck their collective heads in the sand (https://opensource.com/article/18/1/containers-gpl-and-co...) - whilst Canonical appears to have chosen to sidestep the issue and absolve themselves of any sin by making proprietary ISV sublicensing of Ubuntu nearly impossible (at least as I understand it).

Quite cool technology, but I just wish RH would step up and make an innovation here on the free software license compliance, lineage and security fronts; otherwise I think I may have to make one myself!! Of course there's always the Microsoft Windows Container runtime - well I guess they hit one out of the three big concerns...


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds