LWN: Comments on "Shrinking the kernel with a hammer"

Shrinking the kernel with a hammer

sammythesnake — Tue, 17 Jan 2023 10:18:37 +0000

[thread necromancy alert!]

Presumably when mkcramfs compresses files/blocks it has a case to store incompressible things unmodified - marking those files/blocks for XIP might be give the bulk of the benefit of a more sophisticated version for a lot less effort...

Shrinking the kernel with a hammer

meyert — Mon, 11 Jun 2018 12:48:49 +0000

Wow, I started to read this article series with slight interest, but in the episode it got really cool and everything did come together and made sense now.

Thanks for this article series and this cool final. Gave me a new perspective on memory usage.

Shrinking the kernel with a hammer

iq-0 — Mon, 12 Mar 2018 10:48:56 +0000

> [...] but the issue is exaggerated by the Rust ecosystem's obsession with microdependencies (there are modules which are really just one function, à la npm),

The reason you have to compile a lot of crates (rust libraries) while the thing you're building only uses a few parts of a few crates directly, has to do with how coherency-rules effectively cause many crates to depend on other crates in order to offer possibly relevant implementation of traits for there types or implementations of their traits on it's types.

To minimize the pain of these type/trait dependencies, and also to ease semver stability guarantees, a number of projects have extracted their basic types and/or traits in single purpose (and thus relatively small) crates. This helps these common crates to have few changes and reduce their compile times.

The fact that the crate dependency explosion often seems worse is due to different crates being able to have different (incompatible) dependencies on different versions of the same crate. Rust often handles these issues gracefully, which in many programming languages would have been painfull version conflicts, at the cost of sitting through additional crate compilations.

But to counter that, they only get build once for a project, unless you switch compiler versions, and thus often have the effect of reducing rebuild times. First time builds can be pretty long, but you only incur that cost occasionally. You do want to keep this in mind when configuring possible CI so that you cache these compiled dependencies.

> and the fast speed at which the Rust compiler moves.

Unless you really depend on the unstable (nightly) rust version the compiler normally is only updated every six weeks.

If you're using the unstable channel, you get to pick when you want to go through the bother of updating and thus recompiling everything. But I agree that that's hardly a consolation.

> Indeed, though as far as I know they statically link the Rust standard library. Despite the glibc being dynamically linked, e.g. oxipng still clocks in at 2.8M. Compare that to 86K for optipng.

All rust dependencies are, by default, statically linked, though LTO will prevent 90% of the standard library and other dependencies from being included in the final binary. A very large part of the resultant binary is debugging information (Rust's multi-versioning, types and module support has a big impact on the symbol length) and unwind information (in order to perform gracefull panics as opposed to plain aborts).

Both can be disabled and, with some effort, Rust binaries can be reasonably small. But things like monomorphization, while generating more optimized code, will almost always result in more code being generated. For most applications this usually isn't a big problem as the larger binaries don't really have a performance impact and greatly aid in error message information and debugging possibilities.

Luckily the people working on Rust support in Debian are working at making Rust programs integrate better with their distribution philosophy (dynamic linking, separating debug info and each dependency in a dedicated package), and I really hope that a number of their requirements and solutions will find their way back to the upstream Rust project.

Shrinking the kernel with a hammer

mathstuf — Sun, 11 Mar 2018 17:15:55 +0000

It is a feature for deployment, not so much for things one would expect from a distribution (i.e., development tools). One could do the same with C or C++ deployments, but it's a PITA to wrangle build systems in that stack without embedding dependencies, so "no one" does it. I suspect Rust (and not Go[1]) will get dynamic linking before C or C++ have viable "everything static" deployment solutions.

[1]AFAICT, Go has much more of a "non-Go code doesn't exist" mentality than Rust folks do for non-Rust code.

Shrinking the kernel with a hammer

mathstuf — Sun, 11 Mar 2018 17:10:56 +0000

> no stable C++ ABI

There isn't in the ISO standard sense, but there are de facto ABIs. GCC and MSVC have declared their ABIs long ago and stick to them. The Rust compiler does not commit to any given ABI between two releases. I suspect there may be one eventually, but it's not in the same area as C++.

Shrinking the kernel with a hammer

fratti — Sat, 10 Mar 2018 09:40:57 +0000

I can understand that there is no stable Rust ABI, after all there's no stable C++ ABI either, but the issue is exaggerated by the Rust ecosystem's obsession with microdependencies (there are modules which are really just one function, à la npm), and the fast speed at which the Rust compiler moves.

>practically all Rust Linux binaries dynamically link to glibc by default (and by design)

Indeed, though as far as I know they statically link the Rust standard library. Despite the glibc being dynamically linked, e.g. oxipng still clocks in at 2.8M. Compare that to 86K for optipng.

Shrinking the kernel with a hammer

bof — Sat, 10 Mar 2018 08:19:42 +0000

"Yes, I've been told their religious beliefs state that the dynamic linker is Unsafe™."

Recently having had openSUSE tumbleweed running crond coredump on me until restarted due to weird DL loading of PAM stuff which was apparently updated, again makes me strongly sympatise with that sentiment...

Shrinking the kernel with a hammer

jdub — Sat, 10 Mar 2018 08:13:30 +0000

Hrm, no, practically all Rust Linux binaries dynamically link to glibc by default (and by design), and you can easily dynamically link to C ABI shared libraries. If you want to build a static executable, you have to go out of your way to use the musl target.

There's nothing "unsafe" about dynamic linking, just the challenge of safety across C ABI boundaries (which exists for statically linked code as well) and the lack of a stable Rust ABI (which is pretty reasonable).

Shrinking the kernel with a hammer

fratti — Sat, 10 Mar 2018 06:42:41 +0000

Yes, I've been told their religious beliefs state that the dynamic linker is Unsafe™.

Shrinking the kernel with a hammer

pabs — Sat, 10 Mar 2018 00:48:52 +0000

> static binary in need of recompilation with every dependency update.

Is that considered a feature in the Rust community like it is with Go?

Shrinking the kernel with a hammer

excors — Fri, 09 Mar 2018 17:56:53 +0000

And don't forget Android, which often builds the Linux kernel, and most of a web browser or two, and some of Clang, and a thousand other things. I have several Android trees at about 200GB each. But a reasonable PC can still build the entire thing in under an hour, so it's not too bad really. The kernel itself is trivial.

Shrinking the kernel with a hammer

fratti — Fri, 09 Mar 2018 16:46:19 +0000

Something semi-related:

https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/ (heading "Hash-based differential compilation")

Shrinking the kernel with a hammer

fratti — Fri, 09 Mar 2018 16:43:00 +0000

not to forget things like compiling a modern web browser. Chromium needs more than 20 GiB of disk space just to build, and you'll be at it for several hours on a modern system. Sure, ccache can save you some time, but yikes, talk about bad first contributor experiences.

Firefox has also been getting worse now that they're using some Rust. I genuinely hope Rust gets its ABI stuff sorted so that we do not end up living in a world where everything is a >2 MiB static binary in need of recompilation with every dependency update.

Shrinking the kernel with a hammer

epa — Thu, 08 Mar 2018 08:52:03 +0000

In Linuscoin the computational challenge is to start with the SHA256 of a Linux kernel image and work out the combination of build options needed to produce it.

Shrinking the kernel with a hammer

epa — Thu, 08 Mar 2018 08:48:04 +0000

It's a full 32-bit CPU with 32-bit registers, 32-bit address space, and 32-bit data bus; just the program counter is 26 bits. So executable code needs to be in the lower 64 mebibytes of the address space. Data doesn't have that restriction (in the instruction set architecture; I don't think any machine was built using a CPU with this instruction set and more than 16 megs of RAM).

(The same register contained the 26-bit program counter and six flag bits. I believe this was to reduce the amount of saving and restoring needed for responding to interrupts: you could save the whole CPU state apart from the registers in a single 32-bit operation. With 64-bit CPUs I wonder whether the same technique could make a a comeback: I can see the need for a huge address space for data, but surely it wouldn't be much of a hardship if executable code had to be located in the bottom 281 terabytes...)

Shrinking the kernel with a hammer

anselm — Wed, 07 Mar 2018 22:42:03 +0000

The Linux build process must be one of the most wasteful things you can do on a computer

Crypto-“currency” mining?

Shrinking the kernel with a hammer

flussence — Wed, 07 Mar 2018 21:43:12 +0000

Ah, yeah. I can understand nobody wanting to maintain a 26-bit(!) arch.

Shrinking the kernel with a hammer

nix — Wed, 07 Mar 2018 18:34:38 +0000

Fascinating reminiscences, but..

The Linux build process must be one of the most wasteful things you can do on a computer

Oh God no. Compiler build processes with multiple-stage bootstrapping is the first thing that springs to mind (GCC building is *far* harder on a machine than Linux kernel building and most of it is thrown away); but then you look at new stuff like Rust, with, uh, no support for separate compilation or non-source libraries to speak of so everything you depend on is recompiled and relinked in statically for every single thing you build... the Linux build process is nice and trim. The oddest thing it does is edit module object files in a few simple ways after building.

Shrinking the kernel with a hammer

abufrejoval — Wed, 07 Mar 2018 02:13:18 +0000

<old-memories>
I remember running Microport Unix on my 80286 (fully loaded with 640K of base RAM) with an Intel Above Board that added 1.5MB of RAM, I believe (could have been 2MB). It also gave me a free 8087 math co-processor as a gift and a set of disks labelled "Microsoft Windows 1.01".

Since I dual booted it with DOS and the mapping between expanded (paged in a 64K "BIOS area" window in real mode) and extended (above 1MB range, available only in protected mode) mapping of RAM was set via DIP switches, I allocated around 50% to each. It meant a little more than 1MB of RAM overall for Microport.

Full UNIX (TM) file system, full multi-user (via serial ports), very much like a PDP-11 in fact, where Unix was born. The 286 had a MMU but not at a page, but segment level, again pretty much like a PDP-11. Of course UNIX System V, Release 2 didn't have 400 system calls and the kernel was statically built and linked. I did some fiddling with the serial drivers to have them support the internal queues of the UARTs, that avoided having to interrupt after every character set or received. That's what made 115kbit possible. Also fiddled with Adaptec bus master SCSI drivers.

Ah and it ran a DOS box, long before OS/2 ever did, one single DOS task which run in the lower 640k by resetting the 286 CPU via the keyboard controller on timer interrupts. The BIOS would then know via a CMOS byte, that the computer had in fact not just been turned on, but come back from protected mode: A glorious hack made possible by IBM for the PC-AT, so it could perform a RAM test after boot on systems which had more than 640K of RAM installed.

For kicks I ran a CP/M emulator in the DOS box, while running a compile job on the Unix...
</old-memories>

<other-old-memories>
A couple of years later I had to port X11R4 to a Motorolla 68020 system that ran AX, a µ-kernel OS somewhere between QNX and Mach. Basically a fixed demo system, that ran a couple of X applications on a true color HDTV display using a TI TMS34020 TIGA board. Had to make X11R4 true-color capable, too: It was only 1 and 8 bit color depth at that point.

MMU wasn't enabled on 68020, there was no file system and the µ-kernel just gave me task scheduling. So I had to write a Unix emulator, basically a library that emulated all system calls required by X11. The "file system" was a memory mapped archive (uncompressed) that just got included into the BLOB along with everything else.

Used a GCC 1.31 cross compiler on a Sun SPARC Station. After several months of working through the X11 source code to make it true color and run some accellerated routines on the TIGA GPU (TMS 34020 was a full CPU that used bit instead of byte addressing for up to 16MB of RAM with 32-bit addresses!) it just worked perfectly at first launch! Without any debugging facilities I'd have been screwed if it didn't...

Dunno what the 68020 had for RAM, but I doubt it was more than 1 or 2MB.
</other-old-memories>

So where it all comes together is that the process you describe is somewhat similar to turning Linux plus a payload into a Unikernel or Library OS, where everything not needed by the payload app is removed from the image.

I sure wouldn't mind if Linux could support that out of the box, including randomization of the LTO phase for RoP protection. And yes, I believe the GPL is not a good match for that.

The Linux build process must be one of the most wasteful things you can do on a computer, starting with the endless parsing of source files which have motivated Google to Go.

I keep dreaming about an AI that can take the Linux source code and convert it into something that is a Unikernel/LibraryOS image, which is only incrementally compiled/merged where needed when you change some kernel or application code at run-time.

I believe I'd call it Multics.

Like it! How about a script?

david.a.wheeler — Tue, 06 Mar 2018 17:22:43 +0000

I'd love to see some sort of script that could take the 'current' kernel + tools like busybox and generate the final result. Basically a "starter kit" that people could diverge from.

Bonus points: Put that in a CI environment, so that every update to the Linux kernel or busybox would create a new image, test the image, and report new sizes (including size regressions).

Shrinking the kernel with a hammer

farnz — Tue, 06 Mar 2018 12:51:38 +0000

Found it:

commit 99eb8a550dbccc0e1f6c7e866fe421810e0585f6
Author: Adrian Bunk <bunk@stusta.de>
Date:   Tue Jul 31 00:38:19 2007 -0700

    Remove the arm26 port

    The arm26 port has been in a state where it was far from even compiling
    for quite some time.

    Ian Molton agreed with the removal.

Shrinking the kernel with a hammer

farnz — Tue, 06 Mar 2018 10:23:20 +0000

About 10 years ago, give or take - I don't have a git clone to hand to go spelunking, but you're looking for the removal of include/asm-arm26 to see when it was deleted.

Shrinking the kernel with a hammer

epa — Tue, 06 Mar 2018 07:09:53 +0000

When was the support dropped? I know Russell King's original port was to these CPUs.

Shrinking the kernel with a hammer

farnz — Mon, 05 Mar 2018 20:17:50 +0000

The first commercial ARM desktops (Acorn Archimedes) had either 1 MiB (A310 and A410 models) or 4 MiB RAM (A440); the 512 KiB model (A305) was announced at the same time, but shipped later, and the 2 MiB RAM model (A420) also came along later.

Of course, they won't run Linux now, even if you did the soldering job needed to fit 16 MiB RAM - Linux does not support ARMv2 or ARMv2a CPUs (the ARM2 and ARM3 silicon that you can fit in these machines), and you can't fit an ARMv3 or later chip (the ARM6/7 silicon that the RiscPC used, on to modern AArch64 chips).

Shrinking the kernel with a hammer

flussence — Mon, 05 Mar 2018 19:32:12 +0000

We can almost say “640KB is enough for Linux” :-)

The kernel's obviously not going to run on an original IBM even if it's squeezed into RAM, but I wonder if there's any other fun applications of this stuff in the same vein... maybe it'd boot on old desktop ARM machines? Those had a whopping 2MB IIRC.

Shrinking the kernel with a hammer

npitre — Fri, 02 Mar 2018 19:55:55 +0000

The XIP support for CramFS available in Linux v4.15 already has the ability to work with individually selected XIP-ed pages. It's just a matter of adding a profile-based page selection mechanism to mkcramfs.

Right now mkcramfs enables XIP only for pages that correspond to loadable ELF segments that are flagged readable and/or executable, and not writable. That could be easily extended to e.g. media files that are inherently compressed.

Shrinking the kernel with a hammer

rbanffy — Fri, 02 Mar 2018 17:29:19 +0000

XIP-like ideas are handy on very parallel machines (thinking Xeon Phi-like, but any other single-image box - or rack - with a lot of cores would fit). Knowing a given memory range is immutable after you're up and running would make it easy to use core-local memory without worrying about it being consistent across the whole machine and no need to go across the motherboard-side bus. It's not a problem other cores can't modify your core-local memory because nobody is supposed to do it anyway.

Of course, core-local memory is useful for a whole lot of things besides that, but having to be concerned a process stays local to a specific core makes everything more complicated. If all cores have duplicates of frequently code memory that can be read faster than the main system memory can, all cores can spend less time memory-starved.

Shrinking the kernel with a hammer

seebe — Fri, 02 Mar 2018 14:45:27 +0000

> All we need to do is enable CONFIG_XIP_KERNEL and the build system will prompt for the desired kernel physical address location in flash.

Since most ARM MMU-based systems are now multiplatform now in the kernel, you need to hack a line in the ARM Kconfig in order to enable CONFIG_XIP_KERNEL for MMU systems...but it will work (for some systems anyway)

> I later (re)discovered that the almost 10-year-old AXFS filesystem (still maintained out of tree) could have been a good fit. I had forgotten about it though, and in any case I prefer to work with mainline code.

One thing that AXFS has (aside from allowing for a larger file system size) is the ability individually select page by page the portions you want to have XIP-ed (uncompressed) and leave the rest as compressed. This is very helpful because there is a lot of executable and const portions in a file that are simple never run, or only run once at startup or shutdown. So, by only XIP-ing the pages you will commonly use, you can reduce the Flash image size while still retaining low runtime RAM usage. There is profiling tool built into the AXFS driver that will tell you what pages were used (by putting some logging code in the page fault handler) so you can record that information and then input it back into the mkfs.axfs tool.

Maybe at some point we can add this type of functionality into cramfs (since trying to mainline AXFS would be much more work).

Shrinking the kernel with a hammer

atelszewski — Fri, 02 Mar 2018 11:47:20 +0000

Hi,

Yet again excellent article in the series!

My comments:
> The ability to develop and update the kernel and user space independently of each other;

If you mean it for development phase, then I totally agree. It's invaluable.
But production systems, in my opinion, are better updated with a single firmware image
containing the whole system (kernel+userspace).
This allows for easier tracking of what is the actual update status of a particular system,
especially if you're managing significant number of them.

When it comes to memory usage, I think RAM is the biggest challenge as of today.
This opinion is based on the desire to have the PCB layout as simple as possible.
With the recent addition of possibility to execute from QSPI memories, the Flash memory
can be extended quite easily with no much wiring.
But RAM is on the opposite. It's clunky, i.e. it requires quite some PCB traces and microcontroller's GPIOs to get started with.

--
Best regards,
Andrzej Telszewski