Giving bootloaders the boot with nmbl
At DevConf.cz 2024, Marta Lewandowska gave a talk to discuss a new approach for booting Linux systems, "No more boot loader: Please use the kernel instead". The talk, available on YouTube, introduced a new project called nmbl (for "no more bootloader", pronounced "nimble"). The idea is to get rid of bootloaders (e.g., GNU GRUB) with a Unified Kernel Image (UKI) that removes the need for a separate bootloader altogether. It is early days for nmbl, currently the project is only being tested for use with virtual machines, but the idea is compelling. If successful, nmbl could offer security, performance, and maintenance benefits compared to GRUB and other separate bootloaders.
Rationale
Longtime Linux users have seen their share of bootloaders, the software that initializes hardware and loads the operating system, over the years. In the earliest days, users might have used loadlin to boot into Linux from MS-DOS. Then there was Linux Loader (LILO). It was the popular choice for Linux distributions until the mid-2000s, when Linux distributions began switching to the GRand Unified Bootloader (GRUB), and then GRUB 2 (which has supplanted GRUB legacy, so we will just say "GRUB" after this). The SYSLINUX family of bootloaders was a popular choice for booting from floppies, CD-ROMs (ISOLINUX), network servers using PXE boot (PXELINUX), and a variety of filesystems (EXTLINUX). That is not an exhaustive list, merely a sampling of more widely used bootloaders on x86/x86_64 systems. Other platforms required their own bootloaders, of course.
Lewandowska, a quality engineer at Red Hat, started her talk with a discussion of the purpose of the bootloader and things that can go wrong. The bootloader, she said, is the first piece of software that runs and "gets everything ready for booting and getting the operating system running" and then transfers control to the kernel.
That may not sound like a big deal when booting a desktop or laptop system, she said, but it becomes much more complicated for multiple architectures, complex storage schemes, and booting over the network. All of those things have to be possible, and "all of those of you who have filed bugs with us know" it can go wrong in many ways.
On top of that, there is secure boot. The idea of secure boot, of course, is to ensure that a machine "doesn't have any malware from the beginning" she said. The chain of trust starts with hardware that will only load a trusted bootloader, which will only load a trusted kernel. Lewandowska explained briefly how this works with UEFI and Linux systems. When the system starts, its firmware will load a first-stage bootloader, called the shim, if it has a trusted signature. Then it will load a signed bootloader, in this case GRUB. Next, GRUB will provide a menu of boot options and/or allow the user to edit boot options. Then GRUB will verify the selected kernel's signature with a protocol installed by the shim. GRUB also has to load the initramfs, the initial root filesystem image used for booting the kernel, which Lewandowska said is "the biggest security hole" because it is not signed.
GRUB is great, she said, but it is also complex and needs to handle a lot of functionality that is duplicated in the Linux kernel. And, of course, it has security vulnerabilities too. She showed a slide with a list of 15 CVEs for GRUB since 2021. (Slides here.) There has been only one in 2024, so far, but "believe me, more are coming" Lewandowska said. In addition to the CVEs there are plenty of regular bugs in GRUB such as filesystem, storage, and memory-allocation bugs that are difficult to solve and that those working with GRUB don't have the resources to fix. GRUB is not as actively developed as the Linux kernel is, so things are fixed more slowly. She noted that Red Hat was carrying "hundreds of downstream patches" for GRUB. "We're trying to fix all this, but it's a huge task and it goes slowly." That finally brings us to nmbl, she said.
Why nmbl
The idea for nmbl is "taking a whole bunch of things that have already existed" in Linux, adding a bit of code, and putting them together, Lewandowska said. Nmbl is delivered as a UKI: a single image in Portable Execution / Common Object File Format (PE/COFF) format that bundles the kernel image and resources needed to boot. Nmbl includes the kernel command line, an initramfs, the kernel, and UEFI stub (using systemd-stub) wrapped up as a UEFI executable that can be run from the UEFI firmware. As a bonus, nmbl can be signed so "now the whole thing becomes secure", including the initramfs.
Most of this is already in Fedora, she said. Fedora has Dracut for generating initramfs images, and has been adding support for UKIs. (LWN has covered Fedora's plans for UKIs, and the progress toward UKI support in Fedora 40.) Nmbl also uses grub-emu to provide a GRUB-style menu that is already familiar for Linux users.
In a blog post timed to accompany the talk, she explained the advantages that nmbl might offer. First is improved boot time. Currently there are two variants of nmbl being worked on, one that provides direct booting of the desired kernel, and another that allows the user to boot into different kernels. The direct-boot option loads the same kernel used by nmbl and performs a switch root to switch from the initrfamfs to the user-space filesystem. The other option loads the nmbl UKI and uses grub-emu to display the menu of bootable kernels. Then it uses kexec to boot the final kernel and bring up the system. When nmbl is the target kernel, it would substantially decrease boot time since there is no need to boot a second kernel.
It will also speed up feature delivery, says
Lewandowska. "Since kernel and bootloader code will no longer have
to be duplicated, features will only have to be implemented once to be
available in both places.
" In addition, she wrote that
implementing features once will reduce duplicate work and worries that
things like kernel filesystem drivers would change. Since the
bootloader and kernel would be "one and the same
", any kernel
changes would be immediately available to the bootloader as well.
She also touted the idea of increased security. Including all of
the initramfs into a signed binary would "considerably increase
security
" she wrote. On top of that, nmbl would significantly
reduce the attack surface "since the new code comprising nmbl is
only several hundred lines of code compared to GRUB's hundreds of
thousands
". Finally, Lewandowska said that since the Linux kernel
has a much larger community, nmbl would receive more scrutiny than
GRUB does on its own.
Early days
There is still a lot of work to be done, Lewandowska said. The next thing that nmbl developers want to do is to build nmbl with every Fedora kernel build. Another feature on the horizon is shim A/B booting, which would allow fallback to the previous nmbl kernel if the newest one fails for some reason. This would make failed upgrades easier to recover from. The proof-of-concept (POC) for nmbl right now runs on UEFI. Longer-term, the team wants to get nmbl working on other architectures that do not use UEFI as well.
Support for bare metal is also on the wish list. During the Q&A of her talk, Lewandowska said that development and testing of nmbl is primarily being done with virtual machines at the moment. It has been tested on hardware, she said, but not recently. She has posted instructions on how to do this in a virtual machine with Fedora 39, including guidance on how to generate a signing certificate and enroll the key to use secure boot with the nmbl UKI.
It will likely be years before nmbl is ready to supplant GRUB as the bootloader of choice for most Linux users, but it's an interesting approach that could have a lot of advantages if it succeeds.
| Index entries for this article | |
|---|---|
| Conference | DevConf.cz/2024 |
Posted Jul 8, 2024 19:10 UTC (Mon)
by atai (subscriber, #10977)
[Link] (1 responses)
Posted Jul 8, 2024 19:30 UTC (Mon)
by fw (subscriber, #26023)
[Link]
Some POWER systems already come with Linux in the firmware as the bootloader: https://github.com/open-power/petitboot
It's a bit odd if your firmware has a newer kernel or glibc than the operating system.
Posted Jul 8, 2024 19:17 UTC (Mon)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted Jul 10, 2024 13:09 UTC (Wed)
by eru (subscriber, #2753)
[Link] (1 responses)
Posted Jul 11, 2024 3:20 UTC (Thu)
by jengelh (guest, #33263)
[Link]
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/hist...
The reason it “only work on legacy floppies -- not on IDE or USB devices” is because the loader code makes the int 13h calls with parameter DL=0x00 rather than DL=0x80.
Posted Jul 8, 2024 20:01 UTC (Mon)
by pmolloy (guest, #113754)
[Link] (3 responses)
Posted Jul 8, 2024 20:06 UTC (Mon)
by pmolloy (guest, #113754)
[Link]
Posted Jul 8, 2024 22:02 UTC (Mon)
by calvin (subscriber, #168398)
[Link] (1 responses)
Posted Jul 11, 2024 13:42 UTC (Thu)
by geert (subscriber, #98403)
[Link]
Posted Jul 8, 2024 20:57 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (36 responses)
People really should take a step back and wonder why they want a boot loader at all. Some reasons I can think of:
1. To interactively allow users to pick one of many OSes installed
Now, is the nmbl concept good at the first item above? Well, uh, kexec is highly problematic if you care about Secure Boot of other Linuxes, doesn't really work for booting Windows at all, and completely useless if you care about Measured Boot. Maybe I am biased but all of these things matter a lot, and should be at the core of how you design the boot logic of a modern OS.
Is nmbl good at the second item above? Well, the boot loader is a kernel/initrd in this model already, and apparently one with quite some bells and whistles, with networking, complex storage, cryptography, http client, ca store and stuff (I mean, that's how I understand it, i.e. it should be able to load kernels from sources that require all that). It hence will need require regular updating (as much as the 2nd stage kernel most likely, if not more often, since it probably needs ca store built in), and quite possibly will break every now and then nonetheless, because it's basically a full OS you are boot as first stage. So does nmbl allow you to select the nmbl kernel itself interactively? Nope, it certainly does not, how could it, it can't be a menu that allows selecting versions of itself. Sure, it can then chainload, but why would anyone assume that the nmbl kernel is so simple that it doesn't break as often or as seldom as the chainloaded kernel? Here's a little secret I learnt from developing an init system, and being heavily involved with making systems boot: Probably more than half of all fuckups with low level subsystems already take place at boot, at initialization: so if the nbml that is supposed to put a frickin menu on screen is already so complex with networking, complex storage, disk encryption and so on, then yes, that's the part you actually need have a scheme for so that the user (or better the system) can fallback to older versions to make things work again.
Is nmbl good at the third item above? Nah, not at all, not any better than the 2nd thing. In fact you just made everything worse: for automatic fallback to older versions of things you need to keep state, write somewhere that you are about to try booting some item, so that on next boot you can determine if boot failed or not and then not consider it anymore. This information you need to store as early as possible, i.e. *before* you try the first things that is risky, that might hang. So, given that nmbl is a full kernel with complex storage/blabla, you better do it before. But you cannot do that, since the initrd has no access to any storage until it actually did its things and found storage (yes, it could write to efi vars, but typically that's something you might not want to do on every boot, because those memory chips supposedly are not high quality enough to allow that, and firmwares are shit and have bad wear level/write strategies and so on).
So, I really don't get this idea at all. It seems this approach simply doesn't deliver anything that you actually want in a boot loader.
I mean, there's are reasons why systemd-boot is what systemd-boot is: just a dumb boot menu that runs in uefi mode and basically just allows you to pick another uefi binary to chainload: it's about placing the choice of what to boot into *early* in the boot chain, before you do complex storage, networking, crypto and what not. And using what the firmware already provides for this, in a minimal fashion. And then providing an API for it for userspace, and doing boot counting and things before we go down the risky path.
Hence, I'd really suggest people to think a bit about:
1. First and foremost: how is this stuff supposed to be measured (as in TPM PCR extensions) reasonably, i.e. deterministic, stable, and predictable from userspace, so that you can bind security policies to. How do you guarantee integrity of the boot process for SecureBoot. I wished in 2024 people would understand that this item here *cannot* be an afterthought if you want reasonably OS security. It should instead be one of the *first* things you think about, because it has implications for so much else.
I mean, don't get me wrong, I think there are many scenarios where something like systemd-boot is unnecessary. For example if you boot your OS in VMs only and hence instead of choosing between multiple versions of a kernel inside the VM you can just choose between multiple versions of the whole VM image, then why bother with systemd-boot. But for physical devices, for anything more generic, you do need something *before* you invoke your first kernel I am very sure. (And yeah, I think systemd-boot is really what you want there.)
I do appreciate that this proposal uses UKIs and systemd-stub and all, but hey, if you want a frickin boot menu, then use a fricking boot menu, and don't boot a full OS-in-a-cpio just to put some ncurses UI on screen, thank you very much.
Yeah, I guess I should put together a blog story or so one day that gives a better explanation of why systemd-boot and systemd-stub do what they do the way they do, and what implications boot integrity has on literally *everything* that comes later.
Lennart
Posted Jul 8, 2024 21:35 UTC (Mon)
by juliank (guest, #45896)
[Link] (17 responses)
- shim implements A/B boot
But in most cases you avoid actually running any extra code. And all the code paths used, kexec and all, you already need to take care of for your PCR policy, as you want to support kexec, to some extent, for enterprise use cases anyway.
Notably you get:
Like I need to be able to put my boot loader on a HTTP server boot from that, then pick next steps based on the Mac and architecture of the device that is being booted and shit like that.
Posted Jul 8, 2024 22:04 UTC (Mon)
by proski (subscriber, #104)
[Link] (13 responses)
Posted Jul 9, 2024 6:19 UTC (Tue)
by juliank (guest, #45896)
[Link] (2 responses)
Filesystem drivers in grub are a major source of security issues, hence by moving grub into user space they all become non-critical.
Now this can either use the grub drivers in user space, so you have a barrier to the kernel, or use the kernel implementation directly.
Posted Jul 9, 2024 6:48 UTC (Tue)
by mb (subscriber, #50428)
[Link] (1 responses)
Linux has many major filesystem developers who don't think security is an issue with filesystems that must be solved with high priority.
Posted Jul 9, 2024 7:51 UTC (Tue)
by juliank (guest, #45896)
[Link]
Posted Jul 9, 2024 8:12 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (9 responses)
Hence, I think placing kernels on anything else than ESP/VFAT doesn't really make much sense to me. The thing is that:
1. VFAT might not be the hottest party in town, but it at least is relatively simple. A million times simpler than a journal file system, hence easier to write a reasonably safe driver for.
Hence, people should really get the idea out of their heads that booting from kernels on a Linux fs was a worthy goal. It's really not. You are making things unsafe with that, and just *add* to the overall complexity of the system, while throwing boot integrity out of the window.
Hence, please forget about fs drivers in grub, please forget about using linux-as-boot-loader-because-of-fs-drivers, it's a *bad* idea.
Posted Jul 10, 2024 9:57 UTC (Wed)
by epa (subscriber, #39769)
[Link] (8 responses)
Restricting yourself to a subset of the filesystem’s options is a necessity in practice, anyway — grub is not going to support the latest COW trickery or RAID volume config — but this formalizes it.
Posted Jul 10, 2024 10:37 UTC (Wed)
by juliank (guest, #45896)
[Link]
Posted Jul 18, 2024 20:43 UTC (Thu)
by mrugiero (guest, #153040)
[Link] (6 responses)
Posted Jul 26, 2024 8:51 UTC (Fri)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted Jul 26, 2024 17:16 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (4 responses)
If VFAT is required for UEFI, then okay, otherwise you have Grub enforcing a requirement of a special filesystem solely for Grub. THAT is the problem here - Grub wants to work with what's available, and there is no requirement for a computer to run Windows, there is no requirement for a computer to run UEFI, therefore Grub doesn't want to require it.
Cheers,
Posted Jul 27, 2024 8:30 UTC (Sat)
by jem (subscriber, #24231)
[Link] (3 responses)
The old BIOS firmware is practically dead by now(*), so UEFI is also required for Linux; no difference from Windows there either.
(*) All "normal" x86_64 computers are shipped with UEFI firmware these days. Yes, BIOS emulation is still a a thing, but it would be silly thing for a distribution to enforce the use of it.
Posted Jul 27, 2024 20:43 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (2 responses)
That is not, actually, true.
Witness Apple hardware, which afaik uses UEFI, but does not (unless we bugger about with it) have any VFAT partitions.
The specification states that UEFI *must* understand VFAT. It does not say that it's not allowed to understand anything else.
Which means if Grub is to boot linux on an Apple system it should/needs to understand Apple's disk formats. And who's to say Grub is running on a UEFI systsm anyway.
That's the point - Grub does not want to demand that there is a VFAT partition, because there is a quite possible scenario that it is the only software making that demand, and it doesn't want to be in that position. (And that position is - I believe - the case for all modern Apple systems.)
Cheers,
Posted Jul 28, 2024 10:04 UTC (Sun)
by jem (subscriber, #24231)
[Link] (1 responses)
> Which means if Grub is to boot linux on an Apple system it should/needs to understand Apple's disk formats.
Not at all, my ca 2011 MacBook Air boots Linux the standard UEFI way, by loading systemd-boot and the kernel image from the EFI System Partition, which is a VFAT file system. Maybe macOS is started differently; I don't know and I don't care.
>And who's to say Grub is running on a UEFI systsm anyway.
True. I stopped using Grub about ten years ago, when I discovered that the kernel image can be loaded directly by UEFI. In this case, the over-engineered, hard to configure piece of software called Grub is relegated to being just a chooser application. There are better alternatives for this task, like systemd-boot.
Posted Jul 28, 2024 13:42 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
: -)
Cheers,
Posted Jul 9, 2024 8:05 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (2 responses)
Ah, so shim is you boot menu/arbiter between multiple kernels then and implements boot counting and stuff?
> - shim loads the UKI directly
Not sure I grok this. The job of pivoting to the root fs is the job of the initrd, so you are just describing an initrd here?
> - you can interrupt it and get a boot menu
But that boot menu is *after* the kernel is already invoked? what's that good for? It's too late! At that pointas I understand you already initialized a full kernel with network drivers, graphics drivers, input drivers, TPM stuff, complex storage I understand you? So what's the point in putting a menu at a place where the ship has already sailed and all the risky stuff that can go wrong already happened?
> But in most cases you avoid actually running any extra code. And all the code paths used, kexec and all, you already need to take care of for your PCR policy, as you want to support kexec, to some extent, for enterprise use cases anyway.
This does not appear thought to the end to me. I spend quite some time on thinking how kexec + PCR measurements should actually look like in the end, and the model I came up with generally is qute different from PCR measurements at boot. i.e. I think the PCRs that we typically focus on at boot should "settle" eventually, i.e. become stable, and always be predictable so that I can lock secrets to them when I know the system came up clean. But I think that secrets handed over from one kernel to the next over kexec should probably locked down differently, i.e. against a nonce that is specific to a transition, so that we can select explicitly what to pass over and what not pass over, in a way that things aren't replayable, and that stuff that shouldn't be passed over is definitely destroyed. Hence, my suggestion would again be: think about what you are doing here. kexec at poweron is *very* different from kexed-as-reboot when it comes to measurements, and you really should think about that first.
> Notably you get:
Sure but you only get that *in* *addition* to the ESP VFAT you have to manage anyway. You have to regularly update shim, *and* the nmbl UKI, and keep multiple copies around. And as I understand nmbl is quite complex, i.e. is supposed to contain complex storage, network drivers, network boot, ca certificates so that https works and whatnot. So you have to update it regular, almost certainly as often as the final kernel you intend to invoke with it.
Hence congratulations: you made your problem *harder*, not easier. Now you have to update various resources in ESP VFAT *and* even more resources on a complex fs.
Moreover, how do you actually intend to guarantee the crptographic integrity of that journaling file system? A complex file system you *have* to authenticated *before* you parse things. So this means you have to stick dm-integrity/dm-verity/dm-crypt into your nmbl kernel with all the rats tail of stuff it pulls in then, i.e. TPM, FIDO2, PKCS11, interactivity to query for passwords, and thus plymouth and hence graphics drivers, and input drivers, and nvidia firmwares and so on.
Hence, why bother? You have to do that *again* later anyway in the final kernel, and there it's the job of the initrd. So you are basically doing the nasty job of the initrd twice!
> - The familiar user experience if you use grub-emu
Humm, and is that a good thing? Who wants that?
> - The familiar configuration language if you ...
This one here is *utterly* a bad idea if you think about measurements and stuff. The fact that grub's turing complete scripting language is measured executed-line-by-executed-line is really the worst thing you can do for measurements, because it tracks too closely in the measurements what the user is doing interactively. It makes things impossible to predict ahead of time, and hence the measurements useless for unlocking secrets locally.
- and from that you retain the ability of extensive network support and interactive configuration
The network aspect I don't get either. So in your model shim and nmbl would have to be placed on the local ESP, and the 2nd stage kernel is then placed on http or so? so what's the point of this model? You have to set up things locally *and* remotely now? That's a hybrid local+remote boot, i.e. it combines the worst of both worlds. And it's entirely redudant in that too. because you *already* can do something like this simpler: it's what an initrd is for after all: place the kernel+initrd (whether as UKI or not) on the local device, and then make it jump to a remote rootfs. There's no need to incorporate a 2nd UKI in any of this.
> Like I need to be able to put my boot loader on a HTTP server boot from that, then pick next steps based on the Mac and architecture of the device that is being booted and shit like that.
If you want a proper remote boot you have to use the networking caps of your firmware, i.e. uefi http boot. If you do that, then why not start the target kernel directly? why add a level of pointless inidrection? And if you don't want to use uefi http boot and are fine with having local resources, then why not just put the final kernel on the local disk, and just the rootfs on the remote share?
Hence, sorry, but what you are describing really makes no sense at all to me. You are winning nothing, and just adding complexity.
Posted Jul 9, 2024 14:13 UTC (Tue)
by juliank (guest, #45896)
[Link] (1 responses)
That's I think what they're going for, I'm not sure they're implementing the boot counting idea, I don't think there's a strong need to count successful _early_ boot.
> Not sure I grok this. The job of pivoting to the root fs is the job of the initrd, so you are just describing an initrd here?
See it's reasonable clever: If you want to boot the same kernel that is in the nmbl UKI, we can just skip the whole kexec shenanigans altogether and improve boot speed. The whole boot menu thing is more or less a fallback or expert mode feature.
Personally I prefer to just create a boot entry, set boot next and reboot if you select a different kernel rather than chainload but ymmv and this is not my project :D
> But that boot menu is *after* the kernel is already invoked? what's that good for? It's too late! At that pointas I understand you already initialized a full kernel with network drivers, graphics drivers, input drivers, TPM stuff, complex storage I understand you? So what's the point in putting a menu at a place where the ship has already sailed and all the risky stuff that can go wrong already happened?
You only really need an A/B boot mechanism and essentially what the plan here is, is to put that mechanism into shim itself. Then shim can fall back to the good nmbl UKI.
Personally I have two other approaches where I either
1) have two boot entries in UEFI, and I either set BootNext to attempt switching the kernel, and record the working kernel in boot-complete.target, or
Notably that's not all the risky bits; the main regressions I see in a kernel update are usually not early boot failures but stuff like your graphic driver acting up or resume ends up failing which you only notice after potentially a long run time. Then you want to be able to potentially have more fallback options to pick from then just the A/B.
> Moreover, how do you actually intend to guarantee the crptographic integrity of that journaling file system? A complex file system you *have* to authenticated *before* you parse things
Frankly you don't need to care; the ship has sailed, your booted OS will not be verifying it's cryptographic integrity anyway. And then the worst offender is systemd-gpt-auto-generator anyway, which will happily mount any untrusted partition with the right GUID over your trusted authenticated root (yes yes you can constrain it *now* on 254, but the introduction of that feature was a mess) :D
> The network aspect I don't get either. So in your model shim and nmbl would have to be placed on the local ESP, and the 2nd stage kernel is then placed on http or so? so what's the point of this model? You have to set up things locally *and* remotely now?
You could be HTTP booting a remote nmbl only. Now obviously that's no concern of RH, but if you consider a product like Canonical's MAAS, the remote boot loader then chainloads the local one (after all when the machine boots the central management server might want to reprovision it), you'd want to be able to replicate that.
You can't just pivot roots between two different userspaces, the remote boot loader of the management system may be Ubuntu but you want to install a RHEL or something.
Posted Jul 9, 2024 14:33 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link]
> That's I think what they're going for, I'm not sure they're implementing the boot counting idea, I don't think there's a strong need to count successful _early_ boot.
There's so much to fail there, for example signature checks and stuff.
> > Not sure I grok this. The job of pivoting to the root fs is the job of the initrd, so you are just describing an initrd here?
> See it's reasonable clever: If you want to boot the same kernel that is in the nmbl UKI, we can just skip the whole kexec shenanigans altogether and improve boot speed. The whole boot menu thing is more or less a fallback or expert mode feature.
But that's so wrong: kexec in itself is one of those things that are a source of problems, i.e. outside of well-known hw environments drivers are not universally ready to recover from a kexec handover. So with this "clever" approach, when something goes wrong and you revert to an older version you suddenly pull another major source of problems in your boot paths? So in an attempt to make things better you make it substantially worse?
Also, if you have A/B stuff in shim anyway, why not just use that for pick boot paths and just dumping the later userspace stuff? I mean, you just reinvented sd-boot in shim, then, which while I am not a fan of I cannot claim wasn't a workable solution.
> > But that boot menu is *after* the kernel is already invoked? what's that good for? It's too late! At that pointas I understand you already initialized a full kernel with network drivers, graphics drivers, input drivers, TPM stuff, complex storage I understand you? So what's the point in putting a menu at a place where the ship has already sailed and all the risky stuff that can go wrong already happened?
> You only really need an A/B boot mechanism and essentially what the plan here is, is to put that mechanism into shim itself. Then shim can fall back to the good nmbl UKI.
So sd-boot is basically an A/B boot mechanism too, except we support not just A and B but A/B/C/D/E/F… If the core of the idea is to have that I am just wondering why bother with nbml at all...
> Personally I have two other approaches where I either
> 1) have two boot entries in UEFI, and I either set BootNext to attempt switching the kernel, and record the working kernel in boot-complete.target, or
Most folks I talk to suggest we should avoid writing to NVRAM too regularly, and hence better do the arbitration on disk.
> Notably that's not all the risky bits; the main regressions I see in a kernel update are usually not early boot failures but stuff like your graphic driver acting up or resume ends up failing which you only notice after potentially a long run time. Then you want to be able to potentially have more fallback options to pick from then just the A/B.
As somebody who works on init systems, initrds, and so on I can tell you that a major chunk of failures is boot-time stuff. probably at least 50% of fatal issues with updates.
> > Moreover, how do you actually intend to guarantee the crptographic integrity of that journaling file system? A complex file system you *have* to authenticated *before* you parse things
> Frankly you don't need to care; the ship has sailed, your booted OS will not be verifying it's cryptographic integrity anyway.
It won't? In my world they all do. It's kinda what the systemd cabale is kinda working towards: ensuring that stuff that is not authenticated, not measured cannot corrupt the boot process. Hence, parsing a complex fs without ensuring its integrity first somehow is an absolute and complete nostarter in our eyes.
Sorry, but in 2024 it shouldn't be good enough to design new systems that do not validate things like that.
> And then the worst offender is systemd-gpt-auto-generator anyway, which will happily mount any untrusted partition with the right GUID over your trusted authenticated root (yes yes you can constrain it *now* on 254, but the introduction of that feature was a mess) :D
Nah, it's not that simple. At MSFT there was always an LSM that ensures restrictions are made on mounts, so that we can require that the only file systems that can be mounted are either dm-crypt or dm-verity backed (or vfat). Google for "IPE LSM" for details.
Yes, s-g-a-g shouldn't even *try* to mount things like that in a secure environment (and now it doesn't anymore), but the primary line of defense really should be an LSM such as IPE here, not things like s-g-a-f image policies.
> > The network aspect I don't get either. So in your model shim and nmbl would have to be placed on the local ESP, and the 2nd stage kernel is then placed on http or so? so what's the point of this model? You have to set up things locally *and* remotely now?
> You could be HTTP booting a remote nmbl only. Now obviously that's no concern of RH, but if you consider a product like Canonical's MAAS, the remote boot loader then chainloads the local one (after all when the machine boots the central management server might want to reprovision it), you'd want to be able to replicate that.
> You can't just pivot roots between two different userspaces, the remote boot loader of the management system may be Ubuntu but you want to install a RHEL or something.
I am not sure I follow anymore. Sounds adventurous to me...
Posted Jul 8, 2024 22:44 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (9 responses)
> 1. To interactively allow users to pick one of many OSes installed
However, to counter those points ...
What is the USUAL reason why MOST people want a boot loader? To boot the same OS as they always boot. In which case, nmbl suits the bill perfectly.
Okay, I started with LILO, was reluctantly forced to convert to Grub, and thought Grub2 was worse, and hate change. But isn't UEFI a boot loader? If we have to use UEFI, why not use that as your boot loader to load your 99% every time nmbl kernel.
If I want to do something else, chances I'm a linux user, and I thought UEFI could be configured to come up - at boot - and offer you a choice every time. So you can simply add your nmbl kernels to UEFI and - for nearly all users nearly all the time - it's more than enough.
Cheers,
Posted Jul 8, 2024 22:46 UTC (Mon)
by bluca (subscriber, #118303)
[Link]
Posted Jul 9, 2024 7:37 UTC (Tue)
by taladar (subscriber, #68407)
[Link]
Actually no, from what I can tell most non-tech-savvy people only ever reboot when they are forced to by an update. So they mostly boot a different system than the one they booted last time.
Okay, you could count resume from suspend-to-disk in which case that is probably not quite true anymore but I would still argue that a significant percentage of boots is into a new, updated kernel. And they do not have the ability to fix things manually when anything goes wrong with that.
Posted Jul 9, 2024 8:14 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (6 responses)
Posted Jul 9, 2024 10:45 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (5 responses)
And it seems like a may be a dinosaur in the minority now, but I also shut down my computer a lot. It is a desktop after all ...
So even if I update my kernel once a week (and I don't do it that fast!), that still means the massive majority of my boots are the same kernel every time.
Just because the main reason for a server reboot is a new kernel, doesn't mean that desktops are rebooted on the same schedule ... and they're a lot more common!
Cheers,
Posted Jul 9, 2024 11:02 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (4 responses)
So, the point I was making: if you have more than one kernel you boot into, then you need something before that kernel to control which path to boot into. And your case apparently *does* qualify for this.
Posted Jul 9, 2024 13:44 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
Okay, If I want to start multi-booting loads of distros then I really need something like grub or whatever not to overload the UEFI prom and trash it, but a single-distro linux desktop?
(And yes I personally do want good control of my boot chain, because last time SUSE tried to auto-update, it trashed grub.conf and nothing would boot, but that's because I do complicated things.)
(And I notice that a fair few of the distros - as and when they update - only give you the one kernel.)
Cheers,
Posted Jul 9, 2024 13:52 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
This is not just theoretical, the NIX folks showed a working demo at FOSDEM, it's a lightning talk, check it out: https://fosdem.org/2024/schedule/event/fosdem-2024-3045-a...
Posted Jul 9, 2024 14:05 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link]
On top of that, there's the widely held assumption that the memory chips that back EFI variables in this industry, as well as the write mechanism is the firmware to write them are not of the highest quality. Because of that people generally try to minimize writing to them. Hence, it's best to keep regular writes out of nvram space, and rather do them in the HDD instead. Hence, registering new kernels always in efi vars is things most people involved try to avoid, and focus instead of leaving NVRAM as static as possible and instead just update the HDD instead.
Hence, altogether, I think outside of very specific setups (which I think VMs probably qualify as) I doubt you want to use solely UEFI BootXYZ variables as poor mans boot menu.
(people typically have other problems with it too, i.e. that the boot menu UI configured that way is not reachable the same way in the various firmware implementations – if they have an UI for that at all)
Posted Jul 10, 2024 3:08 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link]
Posted Jul 9, 2024 1:50 UTC (Tue)
by Heretic_Blacksheep (guest, #169992)
[Link]
Unless the industry suddenly stops with proprietary firmware in the general computing arena, the only filesystem that's going to be universally supported on UEFI systems is the 30 year old FAT32 (and maybe NTFS which only does Windows users any good ... in about 10 years when MS might around to stop requiring a shim on FAT32 - and that'll likely be a side effect of Windows on ARM than anything else). FAT32 has no native concepts of security, file integrity, and bare minimal concepts of metadata accounting. That means it's impossible to fully verify or audit the integrity of the boot chain even if all you have on the EFS is a shim pointing at where the real boot chain resides and the driver code to read that file system (and don't get me started on the Bad Idea some distros have of sticking the entire chain on the EFS!).
I recognize there's more out there than PCs. I also recognize that RHEL/Fedora has to support more than just UEFI. But I also have noticed a trend of platforms converging on UEFI in general, not just traditional Intel PCs, but ARM systems as well as others. Well, that specification's bare minimum only requires FAT32 support. No one can depend on OEMs implementing anything but bare minimum.
I don't see nmbl really doing anything fix this problem other than replacing GRUB2, which may or may not be a good idea depending on your point of view. My personal POV is that UEFI already has a fully functional boot loader/selector so GRUB2's is largely superfluous. The other point I don't consider as a Good Thing: more distros will be encouraged to stick kernel images (UKI or otherwise) on the EFS - and even worse, requiring the disabling of SB/MB at the same time so users are entirely dependent on FAT32 not screwing up - which it's prone to do. People will end up with subtle and non-subtle kernel corruption in that configuration. SB/MB won't fix a corrupt kernel due to incomplete storage writes, but it will at least stop the machine from booting into a subtly broken machine. SB/MB will do very little in furthering a forensic investigation if the entire boot chain is stored on FAT32.
All of that to say I consider nmbl as "aspirational" in its goals rather than anything else. It would be good to reduce the boot chain complexity as no more than is practically necessary versus what it is today for most distributions. But without a real big paradigm shift towards open sourced firmware, it will remain aspirational because it doesn't address the elephant in the room in why the intermediate shims between UEFI code and the initial kernel entry point is required.
(Long winded way of saying, I mostly agree)
Posted Jul 9, 2024 4:50 UTC (Tue)
by kazer (subscriber, #134462)
[Link] (6 responses)
Nothing in the suggestion prevents using different method to boot when not using the nmbl: UEFI still allows selections in boot and thus could allow plain old Grub-booting. That would be the fallback you asked for. Nmbl would be nice for the fast, secure default case which is what most booting ends up being.
Posted Jul 9, 2024 10:06 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (5 responses)
Posted Jul 9, 2024 11:11 UTC (Tue)
by kraxel (subscriber, #49444)
[Link] (4 responses)
Dropping shim.efi is not that easy because fallback.efi (part of shim) is needed on first boot. Also doing secure boot without shim.efi requires some extra non-standard steps such as enrolling the distro secure boot certificates in the firmware. Sure possible, even easier for VMs than for physical hardware, but still an extra hurdle ...
Posted Jul 9, 2024 11:26 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (3 responses)
sd-boot does support auto-enroll btw, if you want.
Posted Jul 9, 2024 12:02 UTC (Tue)
by kraxel (subscriber, #49444)
[Link] (1 responses)
Using shim.efi -> fallback.efi and having fallback.efi create BootNNNN entries pointing to the kernels in EFI\Linux\... is one option, and the one used by the cloud images linked in the previous comment.
Using sd-boot is an option too, but right now in fedora only without secure boot. Once https://pagure.io/releng/issue/10765 is solved (which I hope will not take another two years) I'll have a look at this + auto-enroll.
Posted Jul 9, 2024 12:16 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
I wonder if I'll manage to beat Fedora by shipping a shim-trusted sd-boot in Debian first :-P Currently waiting on the Debian CA owners to create a new set of intermediate certificates, everything else is ready and waiting...
Posted Jul 9, 2024 12:05 UTC (Tue)
by kraxel (subscriber, #49444)
[Link]
Posted Jul 9, 2024 0:40 UTC (Tue)
by sub2LWN (subscriber, #134200)
[Link]
The process from the blog for maintaining a Secure Boot root of trust, removing Microsoft's keys and substituting them with some of your own, is an exciting (if daunting and tedious) prospect. Similar in scope to setting up a PGP messaging system or a local Certificate Authority (or chaining a bunch of "openssl" commands together, setting the expiry dates decades into the future, and hoping for the best).
"Most commodity PCs sold today include keys that Microsoft controls. In fact, Microsoft's keys are the only ones that are more-or-less guaranteed to be installed in your firmware, at least on desktop and laptop computers." - Rod Smith. I'd like to learn more of the lore about how this situation came about, and how Microsoft is still relied on to create things such as the signed shim.
"If you remove the Microsoft CA Certificate from your machine, you will no longer be able to boot binaries signed by Microsoft with Secure Boot enabled. Likewise with other certificates. If you strip signatures from binaries, you are removing that root of trust from those binaries." - Marta Lewandowska. This sounds like a good antidote to the feeling some systems are designed to give these days: that the user is not meant to be administering the system they happen to have purchased.
Posted Jul 9, 2024 3:18 UTC (Tue)
by Paf (subscriber, #91811)
[Link]
UEFI for multi-OS boot, and Linux boots itself from UEFI. This sounds like a great idea. Frankly any time you can replace hundreds of thousands of lines of code with hundreds, I am tempted to say the world is telling you something... You had really better get a LOT for those hundreds of thousands of lines.
Posted Jul 9, 2024 7:19 UTC (Tue)
by paulbarker (subscriber, #95785)
[Link] (3 responses)
To use this in the embedded world would require:
* extending drivers to do some sort of "partial init" when the kernel is used as a bootloader.
* making it possible to build even more tiny kernel images that can fit in limited on-chip SRAM.
* adding support for early boot tasks like initialising DRAM and relocating the image.
Adding these features to the kernel would increase the complexity and maintenance overhead in ways that I'm not sure would be welcome. So, I expect there will be a place for bootloaders like U-Boot for some time yet.
Posted Jul 9, 2024 9:17 UTC (Tue)
by mfuzzey (subscriber, #57966)
[Link]
But in the embedded world something similar already exists with u-boot "falcon mode".
Posted Jul 9, 2024 19:13 UTC (Tue)
by flussence (guest, #85566)
[Link] (1 responses)
I have an old AMD APU box, first-gen UEFI, works fine for the most part as long as you don't do anything interesting with it. The kexec() call this proposal hinges on wedges the GPU in a state with no power management and requires a full PCI reset to get it unstuck. There's no indication it broke apart from some red text in dmesg and the temperature slowly creeping up. My desktop does a similar thing with more dangerous outcomes - sometimes the GPU fan doesn't spin up after kexec.
I've had to write a bunch of manual workarounds for things like this and I'm just one end user. A project of this nature is either going to have to identify and pre-emptively address all those hardware quirks that went unnoticed before, or be prepared to weather a hell of a storm. PulseAudio exposed just a few sound card driver bugs and people are still complaining about that a decade and a half later.
Posted Jul 10, 2024 21:37 UTC (Wed)
by wsy (subscriber, #121706)
[Link]
Posted Jul 10, 2024 1:20 UTC (Wed)
by mirabilos (subscriber, #84359)
[Link]
This is multiple steps backwards, more rigid everywhere, and seemingly in favour of only the Restricted Boot theatre.
Posted Jul 12, 2024 1:24 UTC (Fri)
by gdt (subscriber, #6284)
[Link]
shall GRUB continue to be supported
shall GRUB continue to be supported
History of bootloaders
History of bootloaders
History of bootloaders
LinuxBoot
>
> It started as NERF in January 2017 at Google.
>
> LinuxBoot is a Linux Foundation project and as such has a technical charter.
LinuxBoot
LinuxBoot
LinuxBoot
And before that, there was MILO on Alpha.
Why though?
2. To interactively allow users to pick one of many versions of the same OS kernel installed
3. To automatically (i.e. non-interactively) pick the newest version of an OS that is installed, and fall back to an older version if it fails
2. Where to place the selection logic for which boot path to boot, how early can you make it
3. Where to place the complex parts of finding your root disk, how late you can make it
4. How to deliver boot counting/automatic fallback, where to store the counters
5. How you get away with maintaining the least amount of code, without having to reimplement Linux' storage stack. If you have to stick some boot phase between firmware and OS kernel, how do you minimize the code footprint on that.
Why though?
- shim loads the UKI directly
- the UKI pivots to the new root in the general case
- you can interrupt it and get a boot menu
- A journaling filesystem to put your kernels on, rather than notorious brat uh VFAT (thanks autocorrect)
- The familiar user experience if you use grub-emu
- The familiar configuration language if you ...
- and from that you retain the ability of extensive network support and interactive configuration
I believe the opposite is true. The shim can only use UEFI services, limiting the UKI kernel to VFAT. GRUB, on the other hand, supports some (not all) journaling filesystems, such as ext4 and xfs.
Why though?
Why though?
Why though?
Why though?
Why though?
2. You cannot avoid vfat/esp anyway, it's the only thing the firmware provides you with as first stage of your boot, you *have* to use it anyway, you *have* to regularly update files in it, hence why bother with any further level of complexity?
Filesystems readable by grub
Filesystems readable by grub
Filesystems readable by grub
Just use the EFI partition.
Filesystems readable by grub
Filesystems readable by grub
Wol
Filesystems readable by grub
Filesystems readable by grub
Wol
Filesystems readable by grub
Filesystems readable by grub
Wol
Why though?
>
> - shim implements A/B boot
> - the UKI pivots to the new root in the general case
> - A journaling filesystem to put your kernels on, rather than notorious brat uh VFAT (thanks autocorrect)
Why though?
2) I have the old one kexec the new one and then continue with boot-complete.target successful boot marking.
Why though?
> 2) I have the old one kexec the new one and then continue with boot-complete.target successful boot marking.
Why though?
> 2. To interactively allow users to pick one of many versions of the same OS kernel installed
> 3. To automatically (i.e. non-interactively) pick the newest version of an OS that is installed, and fall back to an older version if it fails
Wol
Why though?
Why though?
Why though?
Why though?
Wol
Why though?
Why though?
Wol
Why though?
Why though?
Why though?
Why though?
Why though?
Why though?
Why though?
https://download.fedoraproject.org/pub/fedora/linux/relea...
Why though?
Why though?
Why though?
Why though?
Install Media / Bootdisks
A lovely idea
All the world's an x86
All the world's an x86
There the hardware boot ROM + u-boot SPL do the basic firmware init and then the u-boot SPL directly loads the Linux kernel (usually the final kernel).
All the world's an x86
All the world's an x86
This is so many bad ideas.
Kernel parameters
