An audit container ID proposal

By Jonathan Corbet
March 29, 2018

The kernel development community has consistently resisted adding any formal notion of what a "container" is to the kernel. While the needed building blocks (namespaces, control groups, etc.) are provided, it is up to user space to assemble the pieces into the sort of container implementation it needs. This approach maximizes flexibility and makes it possible to implement a number of different container abstractions, but it also can make it hard to associate events in the kernel with the container that caused them. Audit container IDs are an attempt to fix that problem for one specific use case; they have not been universally well received in the past, but work on this mechanism continues regardless.

The audit container ID mechanism was first proposed (without an implementation) in late 2017; see this article for a summary of the discussion at that time. The idea was to attach a user-space-defined ID to all of the processes within a container; that ID would then appear in any events emitted by the audit subsystem. Thus, for example, if the auditing code logs an attempt to open a file, monitoring code in user space would be able to use the container ID in the audit event to find the container from which the attempt originated.

Richard Guy Briggs posted an implementation of the container-ID concept in mid-March. In this proposal, IDs for containers are unsigned 64-bit values; the all-ones value is reserved as a "no ID has been set" sentinel. A new file (containerid) is added to each process's /proc directory; a process's container ID can be set by writing a new value to that file. There are, however, a few restrictions on how that ID can be set:

The CAP_AUDIT_CONTROL capability is required to change this value. The necessary capability was the subject of a fair amount of discussion when the container-ID idea was first floated. The initial plan was to create a new capability for this specific purpose, but that ran into opposition. CAP_AUDIT_CONTROL exists to give access to audit filtering rules and such; extending it to cover the container ID wasn't the preferred option of the audit developers, but they seem to have accepted it in the end.
A process cannot set its own container ID; that must be done by some other process.
A process's audit ID can only be set once after the process is created. This is actually implemented by allowing the change if the current container ID is either the all-ones flag or equal to the parent process's container ID.
A process's container ID can only be set if the process has no children or threads. The purpose of this restriction seems to be to prevent a process from circumventing the "can't set your own container ID" rule by creating a child to do it. Since the single-set rule depends on comparing against the parent's container ID, allowing that ID to be changed for processes with children could be used to circumvent that rule as well.

Once a process's container ID has been set, any subsequent child processes will inherit the same ID. Otherwise, the kernel does almost nothing with this ID value, with one exception: events generated by the audit subsystem will include this ID if it has been set. The user-space tools have been patched to be able to make use of the container ID when it is present.

There is an interesting intersection between container IDs and network namespaces, though. Possibly interesting events can happen in a network namespace, but some of these events can be difficult to associate with a specific container. The rejection of a packet by firewall rules would be one example. The fact that multiple containers can exist within a single network namespace complicates the picture here. To address this problem, the patch set adds a list to each network namespace tracking the container IDs of all processes running inside that namespace. When an auditable event occurs involving that namespace that cannot be tied to a specific process, all of the relevant container IDs will be emitted with the event.

One open question is whether the proposed ptags mechanism might not be a better solution to this problem. This patch set is essentially enabling the application of a specific tag to processes; ptags provides that capability in a more general way. It is easy enough to see why the audit developers would prefer the current path: ptags is an out-of-tree patch that, in its current form, depends on the eternally in-progress security-module stacking work. The audit container ID patches are, instead, relatively simple and could conceivably be merged in the relatively near future.

The approach that some developers find easiest is not always the one the community decides to adopt. This time around, though, the simple approach may well win out. Asking the audit developers to solve the module-stacking problem would be a tall order for even the most intransigent of kernel developers. If a version of this patch set is merged, though, it will represent in a small way the first addition of the concept of a container to the kernel; we may yet see some resistance to doing that.

Index entries for this article
Kernel	Auditing
Kernel	Containers

An audit container ID proposal

Posted Mar 30, 2018 11:00 UTC (Fri) by josh (subscriber, #17465) [Link] (2 responses)

So, you have to have already started the process, and then set the ID from outside the process. That seems like a painful synchronization problem requiring a cooperative process regardless.

An audit container ID proposal

Posted Mar 30, 2018 12:03 UTC (Fri) by comio (subscriber, #115526) [Link]

I think that only the launcher (i.e. docker executable?) will have the CAP_AUDIT_CONTROL enabled. The other processes will just inherit the value during the fork/clone syscall.

An audit container ID proposal

Posted Mar 30, 2018 21:36 UTC (Fri) by nix (subscriber, #2304) [Link]

Also the process can't have started any children or threads yet.

Definitely this requires cooperation, or PTRACE_O_TRACEEXEC to stop it instantly (which seems like horrible overkill, but then, this whole API seems like a horrible hack to me: ptags is obviously better in every way except if you're only looking at auditing and nothing else).

An audit container ID proposal

Posted Apr 1, 2018 12:41 UTC (Sun) by cyphar (subscriber, #110703) [Link]

> A new file (containerid) is added to each process's /proc directory; a process's container ID can be set by writing a new value to that file.

Oh dear lord. I haven't been following the proposals as closely since they were reposted, but this is still getting really strange.

I think Casey Schaufler was right in the first review cycle[1]. If we're going to add a process tagging system, we should just add a generic one (like Jose Bollo's PTAGS) and then audit can make use of it. I can think of several things that PTAGS is useful for, while I can only think of one thing that /proc/$pid/containerid would be useful for (and it wouldn't even be useful for *container runtimes* -- only for audit).

From the original thread, the argument against (ab)using something like PTAGS for the purpose of audit was:

> We would love to have a generic kernel facility that the audit subsystem could use to identify containers, but we don't, and previous attempts have failed, so we have to create our own. [...] If a more general solution appears in the future I think we would make every effect to migrate to that; keeping this initial effort small should make that easier.

Effectively being that "there isn't a generic kernel facility, nobody is willing to merge one, but we need something for audit and if there was a generic facility we would use it". Surely someone pushing for an audit-specific process tagging system (called /proc/$pid/containerid even though it's specific to audit) should be enough reason for the relevant maintainers to consider something like PTAGS more seriously?

I'm sure that the PTAGS-audit integration would have some quirks (I imagine CAP_AUDIT_CONTROL will be a point of contention) but I'm sure something like "security.*" xattr namespacing would be applicable.

[1]: https://lwn.net/Articles/740765/