Building header files into the kernel
Fernandes first posted this work in January; version 5 was posted on March 20. As part of the build process, it gathers up all of the kernel's headers (the ".h" files) and a few other artifacts into a compressed tar file; that file is then built into a kernel module. If that module is loaded into the running kernel, the tar file containing the headers can be read from /proc/kheaders.tgz. This is, thus, a way of allowing applications to access the header files that were used to build whatever kernel is running at the moment.
The purpose of this mechanism is to make those header files available in situations where they are otherwise unavailable. In particular, developers building kernel modules need access to this information, as do those who are building BPF programs to analyze a system's behavior. In some systems, notably Android-based devices, those header files are almost certainly not easily available. Fernandes has tried other solutions to this problem, such as BPFd, in the past, but all have fallen short. Providing headers with the kernel itself is the solution he has settled on.
Some of the initial reviews were less than entirely favorable; Christoph
Hellwig described
it as "a pretty horrible idea and waste of kernel memory
"
while Alexey Dobriyan said that it was
"gross
".
H. Peter Anvin also questioned
the memory use and suggested that the data should, at a minimum, be stored
in a swappable filesystem. Numerous others chimed in as well, describing
the work as a "hack" and saying that, rather than building the tar file
into a kernel module, it would be far more straightforward to just place
that file in the module directory where it could be read directly.
At the same time, a number of other developers have indicated that this feature
would be useful; Daniel Colascione even asked
whether it could be expanded to hold all of the kernel source.
Nobody seems to disagree with the overall objective of this work. There are times when the kernel headers are needed for development, but those headers tend to be absent on systems like Android. The disagreement is over the idea of building those headers into the kernel itself. This opposition is easy enough to understand; the kernel itself does not need that information to function, so there would have to be a strong reason indeed to sacrifice that much system memory to hold it in kernel space.
There are indeed reasons for doing so, many of which seem to come down to how Android systems are built rather than something more technical. It would be nice if Android simply had a "kernel headers" package but, as Fernandes explained, that is not really practical:
The seeming aversion to putting anything GPL-licensed into the system image rubs some developers the wrong way, but it is consistent with the GPL avoidance practiced in most of the Android system. There is another reason why putting the kernel headers there is not a complete solution, though: developers will often cross-build a kernel and ship it to a device for direct booting with the fastboot command. Any headers stored on the device itself will not match that new kernel, so they are useless at best. If the headers are built into the kernel itself, though, they will transfer to the device with that kernel and always be correct.
Even for kernels shipped with devices, though, the "store the headers in the filesystem" solution is problematic. As Fernandes noted, the Android project does not have much control over what vendors put onto their devices or where it goes, so it would be difficult (if not impossible) to mandate the presence of the kernel headers in any sort of standard location. Android can, though, mandate that specific kernel configuration options must be set; with this patch merged, vendors could be made to ship the headers for their kernels in a place where they could always be found. Even if vendors tend to hide their kernel modules in strange places (and they are vendors, so of course they do), the user space code on any given device knows how to find and load them.
In other words, building this information into the kernel is, among other
things, a technical solution to the social problem of getting vendors to
provide that information in a consistent way. Sometimes such solutions can
be what is needed. As Colascione put
it: "here's the bottom line: without this work, doing certain
kinds of system tracing is a nightmare, and with this patch, it Just
Works
". Or, as Karim Yaghmour described
it:
Proponents argue that, since the information is built into a kernel module, it can be configured out (or simply not loaded) when it is not needed. Anvin worried, though, that mechanisms like this tend to grow into a mandatory role over time.
One associated question is whether providing kernel header files is the
best way to provide the needed information to user space. Steve Rostedt said
that he would rather have a table describing the kernel's structures,
including the offset, size, and type of each field. That is the information
that is actually needed much of the time, and it could be more compact than
the source code is. Colascione sympathized
with the desire for a cleaner format, but argued that it would be better to
go with what works now: "Think of the headers as encoding this
information and more and the C compiler as a magical decoder ring
".
Header files also include macros, constant definitions, and other
information needed to build BPF programs.
The discussion has gone on at length, provoked anew by each new posting of
the patch set. It does not appear to have changed a lot of minds on either
side of the debate. Sooner or later, presumably as the 5.2 merge window
approaches, somebody (most likely Andrew
Morton) will have to make a decision. Given the evident
advantages from this patch set, it seems likely that Android kernels may
ship it regardless, so it may be mostly a matter of whether the mainline
follows suit.
Index entries for this article | |
---|---|
Kernel | Build system |
Kernel | Modules |
Posted Mar 21, 2019 22:26 UTC (Thu)
by xorbe (guest, #3165)
[Link] (6 responses)
Posted Mar 21, 2019 23:28 UTC (Thu)
by corbet (editor, #1)
[Link] (3 responses)
Posted Mar 21, 2019 23:34 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Mar 22, 2019 1:55 UTC (Fri)
by pr1268 (guest, #24648)
[Link]
You'd probably want to keep the comments. After all, there's a wealth of information in what the *human* programmer wanted when writing the header file that's not usually obvious in the code itself. I mean, if we're going to shove all this extra data into a kernel module anyway... As for the various compression formats, why not give users a choice? I propose gzip, bz2, lzma, xz, and lzma at a minimum. After all, open source is all about choice, right? </sarcastic humor> Seriously, though, if this whole feature is limited to a loadable module, and optional at build time, then perhaps this isn't a bad idea. If this [grows] into a mandatory role over time, then perhaps its use (or over-use) is a sign that this is a badly needed feature that should remain.
Posted Mar 22, 2019 22:41 UTC (Fri)
by jsmith45 (guest, #125263)
[Link]
Posted Mar 22, 2019 0:37 UTC (Fri)
by Tov (subscriber, #61080)
[Link] (1 responses)
Posted Mar 22, 2019 22:48 UTC (Fri)
by jsmith45 (guest, #125263)
[Link]
Posted Mar 22, 2019 1:42 UTC (Fri)
by IanKelling (subscriber, #89418)
[Link] (9 responses)
So, they have no control over what goes there, except they control the modules that go there, which means they do control what goes there.
"In a system whose functionality requires multiple *independent* parties to work together."
Except one party forces the others to build their kernels in a certain way through some contract, so they aren't independent.
What is going on with these unexplained contradictions?
Posted Mar 22, 2019 3:05 UTC (Fri)
by ndesaulniers (subscriber, #110768)
[Link]
Posted Mar 22, 2019 12:29 UTC (Fri)
by excors (subscriber, #95769)
[Link] (5 responses)
Posted Mar 22, 2019 22:11 UTC (Fri)
by NAR (subscriber, #1313)
[Link] (1 responses)
Posted Apr 1, 2019 15:37 UTC (Mon)
by neuland (guest, #126936)
[Link]
He also talks negatively in the same message about individual contributors who attempt to sue based on their copyright and organizations that facilitate that, such as the Software Freedom Conservancy.
Posted Mar 22, 2019 23:01 UTC (Fri)
by jsmith45 (guest, #125263)
[Link] (1 responses)
Posted Apr 2, 2019 9:26 UTC (Tue)
by robbe (guest, #16131)
[Link]
Posted Mar 24, 2019 1:37 UTC (Sun)
by _joel_ (subscriber, #112763)
[Link]
Posted Mar 28, 2019 5:48 UTC (Thu)
by thestinger (guest, #91827)
[Link] (1 responses)
They could come up with a scheme where they stick the headers elsewhere in the boot image, and have something like init expose access to it. The main advantage I can see with the kernel approach is that it wouldn't be only available for Android and is probably the simplest approach.
https://source.android.com/devices/bootloader/boot-image-...
Posted Mar 28, 2019 5:59 UTC (Thu)
by thestinger (guest, #91827)
[Link]
https://source.android.com/devices/architecture/dto/parti...
Unless they made a new partition with a filesystem mounted in userspace that's shipped with the boot image, there's not really great a place to put the kernel headers other than the kernel. I think that may be a better solution, although it'd be Android specific and not portable.
Posted Mar 22, 2019 2:07 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (11 responses)
Posted Mar 22, 2019 10:02 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (1 responses)
Glad to see someone else had the same idea. Might it make sense to reduce the in-kernel policy a bit by letting user-space handle the tmpfs? One possible (!) implementation would be to have a device which a) lets you read out the archive and b) lets you free the in-kernel memory holding the archive after you have done with it. (Maybe that could mean putting it into __init first and copying it from there to free-able kernel memory.)
Then again, perhaps the same thing could be achieved without it ever being in kernel memory. Tagged onto the end of the kernel binary? Something similar to initramfs? I was going to write "put into initramfs", which would avoid kernel changes altogether, but the thought of bloating that even more is not appealing.
Posted Mar 23, 2019 22:12 UTC (Sat)
by Mattimo (subscriber, #129903)
[Link]
Posted Mar 22, 2019 12:33 UTC (Fri)
by excors (subscriber, #95769)
[Link] (2 responses)
I think most Android devices don't have swap. They prefer to just kill background apps whenever RAM gets low.
Posted Mar 22, 2019 15:20 UTC (Fri)
by dezgeg (subscriber, #92243)
[Link] (1 responses)
Posted Mar 22, 2019 17:45 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Posted Mar 24, 2019 1:29 UTC (Sun)
by _joel_ (subscriber, #112763)
[Link] (5 responses)
Also note that on Android, we don't use disk-based swap. We have a memory based compressed swap called ZRAM, but the archive is already compressed so the suggested idea would provide no benefit (to us).
One thing I have thought of doing in the future is to make the /proc entry of the archive writeable and write an empty string into it thus freeing the archive's allocated memory and requiring a reboot. However at the moment, I am considering only building this as a module for production Android, and as a built-in when debugging. After the patches can make it, and if others want to free that memory, we can cross that bridge there IMO - such as by writing an empty string into the proc entry.
Posted Mar 24, 2019 1:38 UTC (Sun)
by _joel_ (subscriber, #112763)
[Link]
Posted Mar 24, 2019 4:28 UTC (Sun)
by neilbrown (subscriber, #359)
[Link]
If you don't have real swap, this doesn't help you, but it might be a useful answer to people who complain about a waste of kernel memory, or who emphasize that it is non-swapable memory (the article mentions both).
Posted Mar 24, 2019 5:39 UTC (Sun)
by jsmith45 (guest, #125263)
[Link] (1 responses)
Joel, why even bother remounting? You can just directly use the pre-existing internal kernel mount as if it was a special form of swappable kernel memory. CONFIG_BIG_KEYS actually does that, and the the bpfilter user mode helper module copies data that is baked into its module (as discardable init data) into the main internal tmpfs mount, so there is precedent for most of this. I'll be showing code as if making the file swappable was unconditional, but it should be very obvious how to add a config option that hybridizes what patch V5 has, and the below. One word of warning: This basically requires CONFIG_TMPFS. It will also work if !CONFIG_SHMEM, because then it uses tiny-shmem (which is ramfs). However using CONFIG_SHMEM without CONFIG_TMPFS yields a tmpfs that is excessively limited. Now lets create the file in tmpfs, and copy the data into it: OK. Next up, we need to change the section the baked-in data is stored in to be .init.data so it gets discarded after init. Almost done. All that is left is to make reads of the proc file return data from the in memory file. There are actually several ways to approach this. We could use a magic symlink to the internal file, similar to how the proc file descriptor symlinks work. But to make this act as much like a normal file as possible, the following might be easiest. It is definitely a bit odd to have the read delegate back to vfs_read, but as far as I can tell it should work. I'd bet one of VFS guys can come up with some improvements to this approach. Please note that all this code is untested, and is intended to a show the basic approach, although it probably works as-is, modulo any typos (since most of it was copy-pasted from existing parts of the kernel or your patch). Hope you find this helpful, or at least interesting
Posted Mar 26, 2019 23:53 UTC (Tue)
by quotemstr (subscriber, #45331)
[Link]
It occurs to me that we shouldn't even have these options. The kernel would be simpler to reason about if basic, fundamental, and cheap things like tmpfs and shmem (and procfs!) were hardwired to =y. I've never understood the rationale for extreme configurability. Fundamental things should always be available.
Posted Mar 26, 2019 7:34 UTC (Tue)
by minchan (subscriber, #61813)
[Link]
zRAM supports idle and/or incompressible page writeback once admin configures backing store.
Posted Mar 22, 2019 3:54 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (14 responses)
CTF is lightweight enough (cf DWARF) that GCC and clang could emit CTF data by default without too much hemming and hawing; we could rely on its presence and enjoy a world where we could not only statically analyze compiled objects, but also generate FFI accessors dynamically at runtime with strong type safety.
Posted Mar 22, 2019 12:21 UTC (Fri)
by adirat (subscriber, #86623)
[Link] (12 responses)
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/...
Posted Mar 22, 2019 18:37 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (1 responses)
Apparently they got it down to 1.5MB. Why are people still talking about shipping header tarballs? Is the BTF work just not well known? Since the motivating use case is BPF, which already requires a specialized tool chain, I don't see the downside.
Posted Mar 22, 2019 23:07 UTC (Fri)
by adirat (subscriber, #86623)
[Link]
Wake up people!
Posted Mar 22, 2019 18:58 UTC (Fri)
by dezgeg (subscriber, #92243)
[Link] (9 responses)
Posted Mar 22, 2019 23:31 UTC (Fri)
by ay (guest, #79347)
[Link]
Posted Mar 23, 2019 0:36 UTC (Sat)
by adirat (subscriber, #86623)
[Link] (7 responses)
Posted Mar 24, 2019 1:32 UTC (Sun)
by _joel_ (subscriber, #112763)
[Link] (6 responses)
Posted Mar 25, 2019 13:34 UTC (Mon)
by adirat (subscriber, #86623)
[Link] (5 responses)
I get it that some people are afraid of hard work or telling their managers/marketing they need more time, but you guys are burning a lot of credibility here by saying some very silly things, for example let me quote from the v4 thread exchange [1]:
> Think of the headers as encoding this information and more and the C
The C compiler as magical decoder ring? Really? It never crossed your mind to actually develop that debug info which you need? :) What's next, including GCC/LLVM itself in the kernel as a module because eBPF also needs them to compile its "restricted-C"? Then BCC and python/lua? Again: wake up, do the proper work and stop pushing bad solutions.
[1] https://lkml.org/lkml/2019/3/11/1352
Posted Mar 26, 2019 23:46 UTC (Tue)
by jsmith45 (guest, #125263)
[Link]
And parsing out the correct set of macros from that debug info may not be entirely trivial. (Some macros may have multiple values in different spots in the kernel.) Even if those issues were solved, and we had the macro data to combine with BTF it is not equivalent to headers. One thing that does not get captured is inline functions defined in the headers. I've no idea if those ever get used in BPF programs, but I could certainly imagine that at least some of them may work (e.g. if they are just abstracting over a struct access). And those are impossible to extract from any form of debugging information.
Posted Sep 14, 2021 1:13 UTC (Tue)
by nickodell (subscriber, #125165)
[Link] (3 responses)
I think you're underestimating the work involved in getting a C compiler to both emit and understand the new debug info. E.g. the preprocessor must know how to use macros from the debug info for substitution. The typechecker must know how to use function prototypes to check that a function is being called with the appropriate types. The compiler must know how to load structure definitions so that it knows what offset to access data from within a struct.
The only current data format that carries all that information is a C header file. Could you develop a new format which does not carry extraneous information like comments? Sure. But you'd be introducing an unknown number of bugs. The payoff for doing that is at most 4MB of freed memory. Approaches which load the tar file on demand, or that allow it to be paged in and out, seem much more promising.
Posted Sep 15, 2021 12:42 UTC (Wed)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Sep 15, 2021 13:30 UTC (Wed)
by rahulsundaram (subscriber, #21946)
[Link] (1 responses)
Posted Sep 15, 2021 16:49 UTC (Wed)
by nix (subscriber, #2304)
[Link]
They are both derived from the same ancestor (Solaris CTF) but have a significant number of differences by now, mostly reflecting their kernel-only versus userspace focuses (e.g. CTF supports symbol -> type lookup using the ELF symtab, which obviously makes little sense for BTF; BTF has a whole elaborate pile of relocation machinery for CO-RE, tightly tied to LLVM at present, that isn't in CTF).
Posted Mar 25, 2019 15:17 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Mar 28, 2019 7:23 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (2 responses)
Déjà vu: what about kernel *modules*? Typical deployment problem on any Linux-based OS. For Android how does fastboot deploy kernel modules and where to ? I mean the mainline & GPL drivers, not vendor modules.
This page seems to refer to vendor modules only: https://source.android.com/devices/architecture/kernel/mo...
Posted Mar 28, 2019 13:09 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted Mar 28, 2019 16:36 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
Do Android/fastboot/etc. support compiling mailine drivers as modules or not really?
Building header files into the kernel
There were a couple of side discussions on the best compression format; I'll freely admit to having mostly glossed over them as being a secondary issue.
Compression formats
Compression formats
Allow me to add to the bike-shedding...
I guess you could also remove all whitespace and comments…
Compression formats
Building header files into the kernel
Squashfs supports several compression formats and can be mounted in-place.
Building header files into the kernel
Building header files into the kernel
...
"Android can, though, mandate that specific kernel configuration options must be set"
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Store the compressed image in the __init section (which is discarded somewhere in the boot sequence) and have code to copy it into a tmpfs filesystem (which can be paged out).
Maybe put /proc/config.gz there too.
(and maybe a "cowsay" binary too, just it case it is ever needed).
Building header files into the kernel
> Store the compressed image in the __init section (which is discarded somewhere in the boot sequence) and have code to copy it into a tmpfs filesystem (which can be paged out).
> Maybe put /proc/config.gz there too.
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
I don't think there is an existing way to mount this into a mount namespace, but I suspect you could add a mount option to tmpfs to say "use pre-exist superblock named foo".
Building header files into the kernel
static struct path *mem_path;
static int __init ikheaders_init(void)
{
struct proc_dir_entry *entry;
size_t kernel_headers_data_size = &kernel_headers_data_end - &kernel_headers_data;
int err;
ssize_t written;
loff_t pos = 0;
/* copy data to a new tmpfs file, so it can be swapped out */
mem_file = shmem_kernel_file_setup("", kernel_headers_data_size, 0);
if (IS_ERR(mem_file)) {
err = PTR_ERR(mem_file);
goto err_no_fput;
}
written = kernel_write(mem_file, kernel_headers_data, kernel_headers_data_size, &pos);
if (written != kernel_headers_data_size) {
err = written;
if (err >= 0)
err = -ENOMEM;
goto error;
}
/* create the current headers file */
entry = proc_create("kheaders.tar.xz", S_IRUGO, NULL,
&ikheaders_file_ops);
if (!entry) {
err = -ENOMEM;
goto error;
}
proc_set_size(entry,
kernel_headers_data_size);
return 0;
error:
fput(mem_file);
error_no_fput:
return ret;
}
static void __exit ikheaders_cleanup(void)
{
remove_proc_entry("kheaders.tar.xz", NULL);
path_put(mem_path);
}
asm (
" .pushsection .init.data, \"a\" \n"
/*...*/
);
static int ikheaders_open_current(struct inode *inode, struct file *file)
{
struct file *mem_file
file = dentry_open(path, O_RDONLY, current_cred());
if (IS_ERR(file))
return PTR_ERR(file);
file->private_data = mem_file;
return 0;
}
static ssize_t
ikheaders_read_current(struct file *file, char __user *buf,
size_t len, loff_t *offset)
{
struct file *mem_file = file->private_data;
return vfs_read(mem_file, buf, len, offset);
}
static int ikheaders_release_current(struct inode *inode, struct file *file)
{
struct file *mem_file = file->private_data;
fput(mem_file);
return 0;
}
static const struct file_operations ikheaders_file_ops = {
.open = ikheaders_open_current,
.release = ikheaders_release_current,
.read = ikheaders_read_current,
.llseek = default_llseek,
};
Building header files into the kernel
Building header files into the kernel
If the data is used rarely or incompressible, it will be written back to the storage from the memory of zram.
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
> compiler as a magical decoder ring. :-) I totally get the desire for a
> metadata format a little less messy than C code, but as a practical
> matter, a rich C-compilation pipeline already exists, and the
> debuginfo you're proposing doesn't, (...)
Building header files into the kernel
Building header files into the kernel
>[...]
>The C compiler as magical decoder ring? Really? It never crossed your mind to actually develop that debug info which you need? :)
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
Building header files into the kernel
If modules really turn out to be a problem, one can, of course, just build them directly into the kernel image.
Modules
Modules