Designing ELF modules

By Jonathan Corbet
March 13, 2018

The bpfilter proposal posted in February included a new type of kernel module that would run as a user-space program; its purpose is to parse and translate iptables rules under the kernel's control but in a contained, non-kernel setting. These "ELF modules" were reposted for review as a standalone patch set in early March. That review has happened; it is a good example of how community involvement can improve a special-purpose patch and turn it into a more generally useful feature.

ELF modules look like ordinary kernel modules in a number of ways. They are built from source that is (probably) shipped with the kernel itself, they are compiled to a file ending in .ko, and they can be loaded into the kernel with modprobe. Rather than containing a real kernel module, though, that .ko file holds an ordinary ELF binary, as a user-space program would. When the module is "loaded", a special process resembling a kernel thread is created to run that program in user mode. The program will then provide some sort of service to the kernel that is best not run within the kernel itself.

In general, the community's reaction to this feature may have been expressed best by Greg Kroah-Hartman: "this is crazy stuff, but I like the idea and have no objection to it overall". ELF modules give the kernel a controlled way to run user-space helper code, and they make it easy to develop and distribute that code with the kernel itself. That latter aspect, in particular, distinguishes ELF modules from the existing "usermode helper" mechanism, which depends on programs developed and shipped separately from the kernel. It's clear that some developers see uses for this feature beyond the bpfilter subsystem, and would like for those uses to be supported as well.

Beyond rule translation

Consider, for example, one branch of the discussion where Andy Lutomirski raised concerns that the current implementation might break systems that load an ELF module during system boot. Alexei Starovoitov, the author of the patches, responded: "There is no intent to use umh modules during boot process. This is not a replacement for drivers and kernel modules". Instead, he said, this feature is aimed at one specific use: converting iptables rules to BPF programs. But some developers, including Kroah-Hartman, are clearly looking further ahead:

You are creating a very generic, new, user/kernel api that a whole bunch of people are going to want to use. Let's not hamper the ability for us all to use this right from the beginning please.

In particular, he sees uses for these modules as a way to implement USB drivers in user space, perhaps bringing some existing user-space drivers into the kernel tree in the process.

Making ELF modules serve the more general use case may require a number of changes to the patch set. As Linus Torvalds pointed out, there is a significant difference between standard kernel modules and the current implementation of ELF modules. When the process of loading a standard module completes, that module has registered itself with all of the requisite subsystems and is ready to respond to requests from the kernel or user space. The end of the loading process for an ELF module, though, only indicates that the program in the module has started executing. It may not yet be ready to answer requests or provide services and, should something go wrong in its initialization process, it may crash and never get to that point.

The answer to this problem (and a couple of others), according to Torvalds, is to make the execution of ELF modules synchronous, in that a modprobe invocation would not complete until the process that was started to run the module's code has exited. For short-duration tasks, the final exit status could reflect the success of the operation itself, which is not possible in the current implementation. For a long-running module, the code could fork and return a success status once initialization is complete, giving a clear indication that the module is ready to do its work.

Some other changes would be required to make ELF modules suitable for other use cases. Currently there is no means of communication between the module and the kernel beyond the standard system calls. If ELF modules are to be used for tasks like driving a new device, there will need to be a way to pass control of that device to the module from the kernel, among other things. A number of these issues could apparently be handled by opening a pipe between the kernel and the module when it is launched and using it for communications between the two.

A trickier problem may have to do with modules that need some sort of filesystem access to operate. The access itself can be provided, but it can be difficult to write such code in a way that doesn't assume some sort of filesystem layout (the existence and contents of /dev, for example) in the underlying system. The kernel tries hard not to impose such policies on user space, and nobody would like to see that change with ELF modules.

Security concerns

Another issue that came up in the conversation is security. Kees Cook argued that there were a number of security issues with ELF modules. They run with full privileges regardless of the privilege level of the process that caused them to be loaded, and they run in the root namespace even if they were loaded in response to a request from inside a container. Most of the security concerns have been pushed aside for a simple reason: standard kernel modules run with full privileges inside the kernel itself. Even a process running as root is not as privileged as an normal kernel module, so it is unlikely that adding this feature will make the system less secure, especially if module signing is used to limit the modules that can be loaded.

One interesting exception did turn up later in the conversation, though. As Torvalds pointed out, there is a race window between the time that the module signature is checked and when the code is actually loaded into memory and executed; an attacker with the CAP_SYS_MODULE capability could exploit this window to replace the code between those two steps. That escalates the ability to run an existing, signed module into the ability to run arbitrary code as root. One way of addressing this issue would be the synchronous behavior described above. The kernel could take control of the file containing the module, marking it as non-writable, for the duration of the module's execution.

Another possible solution would be to load the code into kernel memory first, perform the check, then execute from that copy of the code. Lutomirski, in a separate part of the discussion, had suggested a mechanism where the code would be stored as a binary blob within a standard kernel module; the kernel would then execute the contents of the blob after loading the module. This approach, too, would avoid the race window described above. It would also make the ELF-module functionality work in non-modular kernels (assuming the module is built in, of course) and enable tighter integration with the rest of the kernel.

The downside of these approaches is that they load the module code into kernel memory, which is not pageable. For tiny modules that would not be a problem, but ELF modules, like other kernel code, seem likely to grow over time. Lutomirski suggested that the module code could be backed up by a tmpfs filesystem; Kroah-Hartman responded that it would be "tricky" but that it could be a good solution. "Micro-kernel here we come!" But no such implementation exists now.

There were few solid conclusions from the discussion, due in part, at least, to a general hostility to the changes on Starovoitov's part. Some of that is understandable; it can be frustrating to create a mechanism to solve a specific problem, only to be told that it needs to be generalized so that it is better suited to unrelated problems as well. But the kernel exists to address the entire community's problems, so this process of making features more generally useful is a vital part of the kernel's long-term success. At least some of the points raised in the discussion will need to be addressed before ELF modules can find their way into the mainline kernel.

Index entries for this article
Kernel	Modules/ELF modules

Designing ELF modules

Posted Mar 13, 2018 21:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Can kernel instead of emulating userspace environment just somehow drop privileges for a certain kernel thread? Just do the syscall exit and set up RW mappings for a fixed "exchange" area to communicate with other threads. For purely computational tasks like a BPF compiler it should be enough.

Designing ELF modules

Posted Mar 14, 2018 4:41 UTC (Wed) by luto (guest, #39314) [Link]

In theory, yes, but the entry code isn’t set up for this.

Designing ELF modules

Posted Mar 14, 2018 11:42 UTC (Wed) by mageta (subscriber, #89696) [Link] (1 responses)

> Some other changes would be required to make ELF modules suitable for other use cases. Currently there is no means of communication between the module and the kernel beyond the standard system calls. If ELF modules are to be used for tasks like driving a new device, there will need to be a way to pass control of that device to the module from the kernel, among other things. A number of these issues could apparently be handled by opening a pipe between the kernel and the module when it is launched and using it for communications between the two.

How about we define something like a stable IPC between kernel and ELF modules, and ELF modules with other ELF modules, and then strip out all but the core features in the kernel, and host them in individual ELF modules.. we could call them servers.. waaaaaait...

Designing ELF modules

Posted Mar 16, 2018 8:11 UTC (Fri) by flewellyn (subscriber, #5047) [Link]

There's a reason Greg KH said "Microkernel, here we come!"

Designing ELF modules

Posted Mar 14, 2018 15:10 UTC (Wed) by mwsealey (subscriber, #71282) [Link]

> The downside of these approaches is that they load the module code into kernel memory, which is not pageable.

Why doesn't someone work to fix that desperately 1990's behavior? Other modern OSs can page kernel memory, and they're based on code from the 1970's :]

It's not a requirement to be able to swap out ANY kernel memory location (i.e. not the kernel text or static data sections), just stuff allocated through a particular API, perhaps. In later years we might be able to back vmalloc() as pageable memory and kmalloc() too with a flag - with the march towards HSA any device that needs a swapped-out page would go through the same process of causing an exception/interrupt via an IOMMU so it can be brought back in. Someone would just have to figure out the security implications of putting data in swappable memory - I think this would be classed as more a defensive programming technique than a kernel feature, though.

Everything old is new again

Posted Mar 15, 2018 13:54 UTC (Thu) by rwmj (subscriber, #5474) [Link] (2 responses)

For anyone who has ever used OS-9 this concept should be very familiar.

Everything old is new again

Posted Mar 15, 2018 15:51 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

You really mean OS-9 (https://en.wikipedia.org/wiki/OS-9) rather than Mac OS 9 or Plan 9?

Everything old is new again

Posted Mar 15, 2018 16:34 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Yes I really mean OS-9.

Designing ELF modules

Posted Mar 18, 2018 6:04 UTC (Sun) by alison (subscriber, #63752) [Link] (1 responses)

LWN:
'ELF modules give the kernel a controlled way to run user-space helper code, and they make it easy to develop and distribute that code with the kernel itself. That latter aspect, in particular, distinguishes ELF modules from the existing "usermode helper" mechanism, which depends on programs developed and shipped separately from the kernel.'

GKH via LWN:
"sees uses for these modules as a way to implement USB drivers in user space, perhaps bringing some existing user-space drivers into the kernel tree in the process."

Hmm, presumably then user-space drivers that implement features like USB drivers would have to be GPLv2? Shipping userspace drivers with the kernel could be a win if existing drivers must be code-reviewed with usual standards before acceptance, but not at the cost of massively merging permissively licensed code into the kernel. OTOH, IIRC, device-tree compiler is permissively licensed, and no one cried foul there.

Designing ELF modules

Posted May 7, 2022 10:18 UTC (Sat) by sammythesnake (guest, #17693) [Link]

Permissive licences* allow distribution under GPLv2, so the code could remain available under BSD/MIT/whatever without further drama - providing sufficient care is taken to keep such code from becoming so intricately linked to (and therefore "derivative work" of) GPL-only code.

There's already a bunch of code in the kernel under BSD/MIT style licences via this mode of thinking.

* For suitable definitions of "permissive licences"