|
|
Subscribe / Log in / New account

Poettering: The Wondrous World of Discoverable GPT Disk Images

In a lengthy blog post, Lennart Poettering describes the advantages of using the unique IDs (UUIDs) and flags from the discoverable partitions specification to label the entries in a GUID Partition Table (GPT). That information can be used to tag disk images in a self-descriptive way, so that external configuration files (such as /etc/fstab) are not needed to assemble the filesystems for the running system. Systemd can use this information in a variety of ways, including for running the image in a container: "If a disk image follows the Discoverable Partition Specification then systemd-nspawn has all it needs to just boot it up. Specifically, if you have a GPT disk image in a file foobar.raw and you want to boot it up in a container, just run systemd-nspawn -i foobar.raw -b, and that's it (you can specify a block device like /dev/sdb too if you like). It becomes easy and natural to prepare disk images that can be booted either on a physical machine, inside a virtual machine manager or inside such a container manager: the necessary meta-information is included in the image, easily accessible before actually looking into its file systems."

to post comments

Painting yourself into a corner with GPT UUIDs

Posted Jun 12, 2021 11:34 UTC (Sat) by mtu (guest, #144375) [Link] (23 responses)

Maybe it's just me not trusting the blogpost's author even with a reimplementation of true(1), but I don't like it. /etc/fstab works because it's an expression of my will, not an automagic interpretation of whatever GPTs on whatever kind of drive the kernel manages to get its hands on at boot.

I mean, this idea is fine for swap, I guess—but vital filesystems? What if I'm shuffling disks with multiple bootable systems around? What if my ESP does not reside on the same disk as the system I'm booting? What if it were positively dangerous to mount some of the partitions present in my machine (because of broken filesystems, compromised environments or other reasons)? Plug in some drives with bootable systems for experimentation or rescue, and suddenly you're in a lottery of which partitions are getting booted and mounted.

Also, I can't think of a worse place to maintain mount points than my disks' partitions tables. Hell, I'm glad for every month of my life I don't need to call any partition management utility and risk breaking the world. It's like using a flamethrower to cook your dinner: possible, but needlessly dangerous.

As often, the underlying worldview seems to suppose that you're running Linux on a laptop for development work, never touching the soldered-on SSD, because that's the way Apple does it.

Painting yourself into a corner with GPT UUIDs

Posted Jun 12, 2021 13:53 UTC (Sat) by lutchann (subscriber, #8872) [Link] (4 responses)

It looks like it will be a useful tool for VM images and other simple systems, but I agree it would be a disaster if distributions eventually wanted to make it mandatory for new installs.

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 13:56 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

It looks like exactly the same advantages, and problems, as automounting RAID arrays: you can mount them automatically and the knowledge of FS identity follows the FS around without your needing to do anything (which is an unambiguously good thing)... except that *because* this is true, it can happen unexpectedly when you attach it to an existing machine which already has an fs of the specified type. The way the md subsystem fixes this is with an attached hostname which can be specified at mount time, but that won't work here: the firmware has no idea of the hostname and the mounting is possibly not being done by a userspace program we control but by the firmware itself. (It will work if systemd, or another component of normal userspace, is doing the mounting, from an initramfs or something on the ESP -- but if you have that, you might just as well bake in the traditional root fs location. Mind you, this *is* strictly nicer than the traditional root fs location, so it's beneficial regardless...)

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 20:07 UTC (Mon) by intgr (subscriber, #39733) [Link] (1 responses)

> It looks like exactly the same advantages, and problems, as automounting RAID arrays
> it can happen unexpectedly when you attach it to an existing machine which already has an fs of the specified type

The GPT autodiscovery logic described in the article will only automount partitions from the same physical disk whose bootloader was booted from. Am I missing something?

As long as the attached disk is a separate disk, there's no risk of unexpected mounts. RAID does not have this advantage, because it spans multiple physical disks.

Painting yourself into a corner with GPT UUIDs

Posted Jun 15, 2021 18:19 UTC (Tue) by nix (subscriber, #2304) [Link]

> The GPT autodiscovery logic described in the article will only automount partitions from the same physical disk whose bootloader was booted from. Am I missing something?

No, I was. It should even be safe if you accidentally boot from the wrong disk: a consistent set of fses will be picked regardless. (Well, as long as neither system uses /etc/fstab but only systemd mountpoint generators, which describes precisely zero systems I have ever seen, and will probably only be able to handle simple cases in any case -- but of course, that describes most systems, and almost all non-server systems. My laptop has one big fs, even if the server has dozens.)

Painting yourself into a corner with GPT UUIDs

Posted Jun 17, 2021 8:54 UTC (Thu) by roblucid (guest, #48964) [Link]

As a former sysadmin, these kind of automatic systems seem nice until something goes wrong, then you find yourself with a system that syncs up to the wrong disk. Under pressure you have no idea why and you have to rebuild the system part from scratch. Hopefully you maintained a clear seperation of user data and can get the service back fast on a duplicate server.

The auto part doesn't help the admin much, but it means replacing OS disks requires an understanding of these developer's specs.
Lennart unfortunately in the past stubbornly insisted that /usr had to be in the same partition as / and said no one bothered with disk partitions on modern systems. OpenSUSE actually solved it simply by mounting & remounting /usr as part of the ram disk startup, which showed it was perfectly feasible to be more flexible.
I still don't see any focus on solving other people's problems in the blog, it's about defining HIS configuration data as a standard and making systemd implementation easier.

The big mistake was SysV's move away from /etc config files for scripts and hidden state.
Instead of human readable config tables used by a daemon, slow inefficient scripts had to be run to change my and update state. It stymied GUI admin programs because there wasn't a single source for the data they needed to manipulate.

Painting yourself into a corner with GPT UUIDs

Posted Jun 12, 2021 14:43 UTC (Sat) by sjj (guest, #2020) [Link] (6 responses)

This is uncalled for bad faith interpretation. Would be nice if people at least here could discuss technical matters in a more emotionally contained manner.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 5:53 UTC (Sun) by ttuttle (subscriber, #51118) [Link] (5 responses)

Can you explain how? It seemed to me like it was worried/skeptical and presented some examples of situations where it would go haywire.

I also take a little issue with "emotionally contained" as a goal. A lot of folks are deeply into tech, and it's normal to feel strongly about something you're deeply into. It's more about what you do with those emotions.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 15:22 UTC (Sun) by sjj (guest, #2020) [Link] (4 responses)

Starting with a personal attack that has nothing to do with the issue at hand? Trying to come up with some paranoid misinterpretation that adding some metadata in disk image files on the filesystem somehow makes the OS randomly mount and boot them? Come on.

If you can’t see a difference between feeling strongly about a technical issue and lashing out, good luck for your career.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 20:04 UTC (Sun) by mtu (guest, #144375) [Link] (2 responses)

> Trying to come up with some paranoid misinterpretation that adding some metadata in disk image files on the filesystem somehow makes the OS randomly mount and boot them?

Nowhere in this thread is there any mention of disk image files. Perhaps in a more emotionally contained discussion, we wouldn't have these misunderstandings.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 21:14 UTC (Sun) by sjj (guest, #2020) [Link] (1 responses)

If a disk image follows the Discoverable Partition Specification then systemd-nspawn has all it needs to just boot it up. Specifically, if you have a GPT disk image in a file foobar.raw and you want to boot it up in a container, just run systemd-nspawn -i foobar.raw -b, and that's it […]
Quoted in the post you’re responding to.

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 12:31 UTC (Mon) by mtu (guest, #144375) [Link]

The blogpost cites eight different possible uses of the Discoverable Partitions Specification.

OP talks about "Use #2: Booting an OS image on bare-metal without /etc/fstab or kernel command line root=" and its potential pitfalls.

You appear to _think_ OP is talking about "Use #1: Running a disk image in a container", but quoting that part of the blogpost doesn't make it so.

If that's how you approach a complex topic, good luck for your career.

Painting yourself into a corner with GPT UUIDs

Posted Jun 17, 2021 11:46 UTC (Thu) by Gladrim (subscriber, #45751) [Link]

Seemed more like the OP admitting his bias than any kind of personal attack.

Painting yourself into a corner with GPT UUIDs

Posted Jun 12, 2021 17:03 UTC (Sat) by smcv (subscriber, #53363) [Link] (5 responses)

> /etc/fstab works because it's an expression of my will, not an automagic interpretation of whatever GPTs on whatever kind of drive the kernel manages to get its hands on at boot.

Sure, but as the referenced blog post points out, your firmware doesn't (can't!) read /etc/fstab to find the root filesystem. Assuming you're using EFI and a reasonably normal Linux distribution, the boot chain goes something like this:

* your firmware reads EFI variables (in NVRAM on the motherboard), the partition table of each disk, and the ESP of each disk that has one, and uses those to choose and run a bootloader (hopefully the one you expect) from the ESP (hopefully the one you expect, if more than one disk has an ESP);
* the bootloader reads grub.cfg (or equivalent if not grub), and uses that to choose and run a kernel/initramfs pair (hopefully the one you expect), possibly passing it a root= parameter (hopefully the one you expect);
* the kernel and initramfs figures out what to mount as the final root filesystem, usually based on the root= parameter from grub.cfg (hopefully up to date), or perhaps based on a copy of /etc/fstab that has been copied into the initramfs (hopefully up to date);
* the initramfs mounts the final root filesystem and pivots into it;
* now we've reached the first point in the whole process where /etc/fstab is accessible, and the system can use it to mount the remaining filesystems

Every time you change the specification for the root filesystem device in /etc/fstab, it has to be copied into either grub.cfg or the initramfs for the change to actually take effect. The same is true for any non-root filesystems that your initramfs mounts (/usr or /var maybe, if those are separate partitions in your installation).

> What if I'm shuffling disks with multiple bootable systems around? What if my ESP does not reside on the same disk as the system I'm booting? What if it were positively dangerous to mount some of the partitions present in my machine (because of broken filesystems, compromised environments or other reasons)?

Then you're going to have to be more careful, with or without this design existing. If your requirements are more complex, then you can't use a code path that makes simplifying assumptions - but that doesn't mean that it isn't useful for people whose requirements for this particular part of the system are simpler than yours.

A lot of aspects of operating system design have this pattern: there's a default, common-case or automatic path that works for most people, and if your requirements go beyond what it supports, you override it and do something more complicated or more explicit under your control. Meanwhile, I might accept the defaults for that part of the system, but want finer control over a different part of the system, one where you're happy with the defaults - different people and different installations have different requirements.

Painting yourself into a corner with GPT UUIDs

Posted Jun 12, 2021 17:59 UTC (Sat) by mtu (guest, #144375) [Link] (4 responses)

You're right about the bootloader finding the right root partition: It's a gamble in the best of circumstances anyway.

> > What if I'm shuffling disks with multiple bootable systems around? What if my ESP does not reside on the same disk as the system I'm booting? What if it were positively dangerous to mount some of the partitions present in my machine (because of broken filesystems, compromised environments or other reasons)?

> Then you're going to have to be more careful, with or without this design existing. If your requirements are more complex, then you can't use a code path that makes simplifying assumptions - but that doesn't mean that it isn't useful for people whose requirements for this particular part of the system are simpler than yours.

See, I'd contend that shuffling disks with bootable systems and/or data partitions in and out of an existing box (for diagnostics, rescue, provisioning etc.) is far more common occurrence than actually swapping root partitions, or pointing a bootloader somewhere else. So let's suppose I'm sure what partition I'm mounting as root, but there's a bunch of other disks/partitions present for whatever reason.

_Without_ this GPT UUID automagic mounting feature, I can be sure that nothing outside of /etc/fstab gets mounted automatically (unless I've got some sort of automounting going, in which case that's my own fault). Therfore, potentially corrupt or compromised filesystem on disks that I'm not booting from are untouched until _I_ decide to access them, wearing surgical gloves and using dd, or at least mounting read-only.

_With_ GPT UUID automagic mounting, I'd have to tell the kernel (or systemd, I guess?!) to please, _please_ do not attempt to mount stuff unless it's in /etc/fstab. Failing that, my system might happily eat corrupt filesystems, or cover itself in unknown/dangerous files by automounting whatever the cat dragged in.

I don't see why that's a risk everyone is supposed to take, just so the people who brought us GNOME can "optimize" away five lines of /etc/fstab that nobody ever worries about more than once a year anyway.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 14:01 UTC (Sun) by bluca (subscriber, #118303) [Link] (3 responses)

If you had actually read the spec and/or the blog post you are commenting on, you'd have seen that manual legacy settings still have higher precedence. So for your corner cases when you are attaching disks that are somehow both corrupted and have the right label on them, you can still have the overrides behave as expected.

> (Note, if /etc/fstab or root= exist and contain relevant information they always takes precedence over the automatic logic. This is in particular useful to tweaks thing by specifying additional mount options and such.)

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 13:58 UTC (Mon) by mgedmin (subscriber, #34497) [Link] (2 responses)

What happens if my /etc/fstab doesn't have an entry for /usr (because it's part of /), but I attach a USB drive with a GPT that says "mount this thing as /usr please"?

Does the mere existence of an /etc/fstab prevent mounting of things by magic labels? Or is there a safety check where magic-label-based things are only mounted on empty directories?

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 15:22 UTC (Mon) by mezcalero (subscriber, #45103) [Link] (1 responses)

There are multiple safety checks in place: we don't overmount populated dirs. We don't overmount dirs that already are mount points. And anything listed in /etc/fstab is also excluded from the automatic logic.

Lennart

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 15:24 UTC (Mon) by mezcalero (subscriber, #45103) [Link]

Oh, and I forgot to say: the automatic logic only looks for the root partition on the disk the ESP used for booting is on. And it looks for the other partitions only on the disk the root partition is on. Thus, if you plug in random stuff then this logic shouldn't care whatsoever.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 8:05 UTC (Sun) by randomguy3 (subscriber, #71063) [Link] (1 responses)

The spec does account for people swapping in and out drives. Only the same drive as the one containing the bootloader is considered for automatic mounting. So whichever drive you decide to boot from will (assuming it's set up correctly, which is always a caveat with boot configuration) have a consistent set of partitions mounted.

If you're doing something like having two bootable Linux systems on the same drive, this spec won't work for you, and you'll need to make everything a "data partition" (ie: don't use the auto-mount identifiers) and configure the old fashioned way.

Note that this is not based on me having any deep (or even prior) understanding of the spec - I followed the link to it and scanned down the table. It seems your first sentence was very self-aware.

Painting yourself into a corner with GPT UUIDs

Posted Jun 17, 2021 8:14 UTC (Thu) by roblucid (guest, #48964) [Link]

But, when migrating systems I would often boot an OS on a different disk from the boot manager.
Switching onto a newly prepared disk, might mean booting the old OS partition in the case of an oversight.
The irony is that fstab(5) was designed as the mount(8) configuration file, it is by definition machine readable but convenient for a competent sysadmin.
When SysV and Solaris "improved" BSD style config files the result was a mess, it actually worsened configuration visibility by hiding state in data files. To understand a running system remotely, I'd have to run a utility to display the disk partitions.
To me Lennart is trying to optimise for an installer that is run once, or maybe never (I used to clone machines with dd(1)).
The installer creates an fstab(5), it could create the GRUB file too, Linux mount was able to use labels or uids.

Painting yourself into a corner with GPT UUIDs

Posted Jun 13, 2021 23:51 UTC (Sun) by linuxrocks123 (subscriber, #34648) [Link] (2 responses)

> Maybe it's just me not trusting the blogpost's author even with a reimplementation of true(1)

If Poettering ever reimplemented true, the executable would be over 50MB large; it would have hard dependencies to PAM, systemd, PulseAudio, GNOME, Wayland, and the Java runtime; and, when asked why true didn't work correctly, he would respond with a lengthy argument quoting Thomas Aquinas and claiming his implementation's behavior is actually and obviously correct.

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 5:46 UTC (Mon) by motk (guest, #51120) [Link]

Yeah, this is pretty obviously bogus mate. Reconsider.

Painting yourself into a corner with GPT UUIDs

Posted Jun 14, 2021 6:24 UTC (Mon) by rahvin (guest, #16953) [Link]

This post doesn't belong here.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 12, 2021 12:11 UTC (Sat) by karkhaz (subscriber, #99844) [Link] (4 responses)

This is very cool and the blog post makes the motivation clear, with one exception: the architecture flags. I don't understand why one would want to build a "multi-arch disk image"?

The only use-case I can think of is to have a USB live-image that you can use to boot and repair (or install) both your server and your Raspberry Pi, but this seems a bit silly and a waste of space. Does anybody have a sensible use-case for this?

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 12, 2021 14:45 UTC (Sat) by amacater (subscriber, #790) [Link] (1 responses)

You're using something like Debian multi-arch and want an image / a container / a space that contains x86_64 and i386 binaries and can therefore run binaries from both / can cross compile from amd64 to i386 more readily ??

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 12, 2021 16:39 UTC (Sat) by smcv (subscriber, #53363) [Link]

Debian multiarch makes it *less* necessary to have a separate root filesystem for each architecture, if anything: if you want to cross-compile s390x binaries that depend on libfoo on an amd64 system, it's usually possible to install libfoo-dev:s390x and gcc-s390x-linux-gnu alongside the amd64 system's native libfoo-dev:amd64 and gcc.

Having more than one CPU architecture's root filesystem in a single disk image does seem quite unusual to want; but if you're designing the discoverable partition scheme *anyway* for other reasons, then giving each CPU architecture a different partition type UUID costs you virtually nothing and gives you a new capability that might be useful to *someone*, so it might as well be included. So I can see why it's there: even though the benefit seems small, the cost is less than the benefit just because the cost is also so small.

A couple of (rare) situations where you might benefit from having more than one discoverable root partition with different architectures:

* you're building a "universal" live/recovery image or installer, and having it be usable by everyone is more important to you than minimizing download size (I think Debian used to have hybrid powerpc/amd64 installer images targeted at Apple laptops?)
* you have an existing installation in an ARM system and now want to migrate the same disks into an x86 machine, or vice versa (I did this with a NAS box a while ago)

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 14:01 UTC (Mon) by nix (subscriber, #2304) [Link]

I think this is mostly useful so you can have one container with many concatenated arch disk images on it, and switch easily between them. I know I've wanted that in the past. Mind you, it's not at all a common case: I'm not sure why it belongs in here (which is mostly about making common cases easier), except that recording the arch of the binaries contained on a root filesystem *is* obviously a piece of useful description if you're trying to decide whether to mount that filesystem, so if you can record it without tradeoffs, why not?

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 15:32 UTC (Mon) by mezcalero (subscriber, #45103) [Link]

You can have images this way that are truly universal, i.e. work on x86-64 and aarch64 the same way for example. i.e. let's say you prep an nspawn container and want it to be deployable on both your x86-64 and your aarch64 servers. Using this you can easily do so.

Or consider a disk image you can copy to USB and then boot on any kind of system you have handy as long as it speaks UEFI. i.e. think about Apple desktop systems which switched between ppc, x86-64 and now aarch64 in not so long ago time. You could relatively naturally build a single OS image that you then can booth on either of them.

But you know, I don't want to convince you that this is the ideal way to do that, or if it's even a totally worthy goal, but the logic behind it is trivial (we just use different type uuids for each arch), so there's nothing lost if we prep the ground for usecases like this. And at the very least you get a bit of debuggability out of this, since if you fail to make some image boot on your system you can easily figure out if it's because the arch didn't match.

Not that multi-arch is not a new concept. Debian strives to build the whole OS that way, to some degree, and Fedora's multilib kinda goes in a similar direction (though with much weaker goals).

Lennart

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 12, 2021 14:08 UTC (Sat) by flussence (guest, #85566) [Link]

This is something that'd be nice to have in uutils' mount command. I've already got properly labelled GPT tables (I was bored one day), but they're going unused because openrc just hands off to `mount -a`.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 13, 2021 2:29 UTC (Sun) by xecycle (subscriber, #140261) [Link]

I’d be interested if they have plans to extend this to LVM, or produce a similar project. As of today I don’t think there is a single filesystem able to serve all workloads (ext4 = no FICLONE, xfs = no shrinking, CoW-based btrfs/zfs = performance very sensitive to workload pattern), so I became a big fan of LVM thin provisioning; that of course has nothing to do with GPT partition types. Would be nice if they take into account DM raids altogether.

Question: Does this make UUIDs non-unique?

Posted Jun 13, 2021 9:17 UTC (Sun) by ausserirdischesindgesund (guest, #152763) [Link] (2 responses)

I have not read the whole specification if the UUIDs there are examples or have to be used verbatim in their entirety. If the latter: Can this produce side effects by e.g. confusing backup software that depends on UUIDs being different between hosts or similar scenarios?

Question: Does this make UUIDs non-unique?

Posted Jun 13, 2021 10:23 UTC (Sun) by bluca (subscriber, #118303) [Link] (1 responses)

Those listed are partition _type_ UUIDs, and have to be used verbatim - that's how autodiscovery works. What you are thinking of are partition UUIDs, which are separate and unique.

Question: Does this make UUIDs non-unique?

Posted Jun 14, 2021 15:49 UTC (Mon) by ausserirdischesindgesund (guest, #152763) [Link]

Makes sense, thanks!

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 13, 2021 13:12 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

This guy didn't really write that replacing human-readable configuration files with dispersed binary junk attache to other binary junk was "natural", did he? Ever seen a tree, Mr Poettering? That's something natural. Nothing implemented in software ever is, it's all artificial and principally arbitrary.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 13, 2021 13:40 UTC (Sun) by pizza (subscriber, #46) [Link]

Then by all means, replace your computer with a tree.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 8:37 UTC (Mon) by gspr (guest, #91542) [Link]

The Wiktionary entry for "natural" lists 13 meanings as an English adjective. Maybe number 4 is illuminating?

"Natural: 4. As expected; reasonable. "

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 9:02 UTC (Mon) by taladar (subscriber, #68407) [Link] (17 responses)

It is a bit confusing that they have UUIDs for certain common mount points but not for e.g. /var/log which is very common to put on a separate partition both for read-write support in an otherwise read-only system and to avoid the situation where growing log files fill up the entire root partition instead of "just" /var/log.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 10:08 UTC (Mon) by bluca (subscriber, #118303) [Link]

There's a UUID for /var: 4d21b016-b534-45c2-a9fb-5c16e091fd2d

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 11:26 UTC (Mon) by zdzichu (subscriber, #17118) [Link]

/var itself should be a read-write mount, even on R/O system images. Otherwise it's not really a /var, right?
But if you insist on getting a dedicated partition for /var/log, just do it – generate some UUID and open a Pull Request against https://github.com/systemd/systemd/blob/main/docs/DISCOVE...
No guarantee it will work, but at least it will be discussed.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 12:20 UTC (Mon) by mtu (guest, #144375) [Link] (14 responses)

It's ridiculous in any case to cement a select few paths for partition-automounting, but no others. Since /etc/fstab takes arbitrary paths, this is a step backwards.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 13:59 UTC (Mon) by anselm (subscriber, #2796) [Link] (11 responses)

Nobody (least of all Lennart Poettering) said that this was supposed to replace /etc/fstab in its full generality, which would be a silly notion. If it can streamline the 95% of applications that use only a few well-known partitions then that is already a big win. The remaining 5% can still be treated individually by hand because root= and /etc/fstab will continue to work.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 21:28 UTC (Mon) by jccleaver (guest, #127418) [Link] (10 responses)

> Nobody (least of all Lennart Poettering) said that this was supposed to replace /etc/fstab in its full generality, which would be a silly notion.

There have been a lot of silly notions coming from the author of systemd over the years that we now have to deal with on the ground. Remember when we dropped eth0 being your ethernet interface on simple VMs because Lennart was concerned about laptop wifi?

The Slippery Slope Argument is not a logical fallacy if there's plenty of slope to induce from. Hesitation is warranted here IMHO.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 21:48 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link] (7 responses)

> Remember when we dropped eth0 being your ethernet interface on simple VMs because Lennart was concerned about laptop wifi?

Don't remember that. Do you have a source for this claim?

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 22:23 UTC (Mon) by anselm (subscriber, #2796) [Link] (1 responses)

Remember that for some people, Lennart Poettering is personally responsible for everything they don't like about Linux.

It's true that the “predictable interface names” approach comes from the systemd (or, really, udev) crowd. IMHO predictable network interface names are a good idea in principle, but there are those who appear to take exception to the fact that what used to be eth0 now tends to be called enp2s0 or something. These people have obviously never installed a second Ethernet card in a computer and noticed that their eth0 is now suddenly eth1 and all their routing tables and firewall rules no longer work right, and just as obviously haven't read the “predictable interface names” document to the end where it explains three different straightforward methods for making sure the scheme is not used.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 2:03 UTC (Tue) by jccleaver (guest, #127418) [Link]

> These people have obviously never installed a second Ethernet card in a computer and noticed that their eth0 is now suddenly eth1 and all their routing tables and firewall rules no longer work right

I have installed many a NIC in a physical machine, and much more rarely had to add in virtual network hardware. If I'm re-arranging hardware like that, I'm *expecting* there to be the possibility of side-effects, but most of that was avoided by using MACs properly. And on the VM side, you'd either remove hardware address affinity, or make sure you're moving the MAC around during a VMotion. It's not complicated, and the few annoyances out there were handled easily enough. For every one server with 6 NICs we had hundreds of VMs where it didn't mean anything.

> and just as obviously haven't read the “predictable interface names” document to the end where it explains three different straightforward methods for making sure the scheme is not used.

I certainly have; the justifications given are laughably rare, and the workarounds are either:
a) remove the entirety of a config (the systemd way: accept all of our nudges or go off the reservation),
b) roll all of your own names manually (if I'm doing super complex routing I'm already doing this; a single-interface VM doesn't need it), or
c) add to your kernel command line for the rest of your life (including initial installs at first)

This implementation was a needless cluster F of a system that worked fine for 99.9+% of cases, and the added complexity directly led to BZ#1391944.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 1:44 UTC (Tue) by jccleaver (guest, #127418) [Link] (4 responses)

> Don't remember that. Do you have a source for this claim?

https://www.freedesktop.org/wiki/Software/systemd/Predict...

The "laptop" callout was a bit of hyperbole, but laptops for developers at Starbucks were used as a justification for a variety of Fedora changes that trickled down into EL rebuilds after this time running on servers. You'll have to forgive me for thinking that the zillions of Linux VMs out there with the simplest possible ethernet interface setup should be the default use case. All of the justifications listed there are pretty laughable when you consider the collateral effect.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 2:02 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (3 responses)

> The "laptop" callout was a bit of hyperbole, but laptops for developers at Starbucks were used as a justification for a variety of Fedora changes that trickled down into EL rebuilds after this time running on servers

You admit now that the laptop claim was hyperbolic but what you stated here still appears to be quite misleading. Laptops were not used a justification anywhere that I am aware of. It is not mentioned in the article you linked to nor in the feature proposals within Fedora. In fact the first naming effort had nothing to do with systemd at all. It came from Dell called Biosdevname and was driven entirely by server concerns and it landed in Fedora 15

https://fedoraproject.org/wiki/Features/ConsistentNetwork...

The udev naming scheme came much later in Fedora 19 and only really took priority in Fedora 21

https://fedoraproject.org/wiki/Features/SystemdPredictabl...

Also it was driven by Kay Sievers as the udev maintainer and not Lennart.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 2:17 UTC (Tue) by jccleaver (guest, #127418) [Link] (2 responses)

> Laptops were not used a justification anywhere that I am aware of.

Laptops were the use-case cited for a number of low level infrastructure changes around this time intended to make networking better in dynamic environments. That's all fine and dandy, but server admins don't need or want this. So-called "predictable interfaces" were anything but, especially in an era when kickstart servers were the most likely provisioning method in use in the RH ecosystem.

> It came from Dell called Biosdevname and was driven entirely by server concerns and it landed in Fedora 15

I'm aware of this. Was running a Dell shop both before and after this was introduced (although mostly in EL land, not running Fedora Server for the most part). biosdevname was useful only in very specific situations, and could normally be disabled on bare-metal unless you really *were* in a dynamic networking environment. This was the exception, not the rule. Regardless, it was inapplicable to VMs.

> Also it was driven by Kay Sievers as the udev maintainer and not Lennart.
The "systemd cabal" is one and the same here for all intents and purposes.

My original point stands: A variety of "silly notions" ended up affecting everyone because they wanted to make weird special use cases easier. In the networking example, replacing the mostly-predictable "eth0" with a unique string on any given box, e.g. "enp5s0", and justifying the increased complexity on being able to swap out a blown NIC (?!) was not a win. I brought it up my first comment because replacing existing partition mechanisms (which mostly work), or even human-readable LVM labels, with UUIDs I have to look up strikes me as a similarly unjustified normalized obfuscation. And there's a history of "optional" things getting steadily less optional.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 2:25 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> My original point stands

I don't think it does. Going back to what you originally said:

> Remember when we dropped eth0 being your ethernet interface on simple VMs because Lennart was concerned about laptop wifi?

1) Laptop wifi was certainly not used as rationale for the udev change and you have conceded as much already.
2) It wasn't driven by Lennart and you have now admitted to that as well. When you use specific names, it has to be meaningful. Otherwise, making it personal isn't justifiable.

You might not like the change personally and it is neither here nor there but you have to be accurate about your characterization.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 16, 2021 14:53 UTC (Wed) by lsl (subscriber, #86508) [Link]

> and could normally be disabled on bare-metal unless you really *were* in a dynamic networking environment;.

Not at all. Without something like biosdevname or the PredictableInterfaceNames thing, your interface names may be shuffled around at every other reboot on many, if not most, servers (basically everything with >1 NIC).

It's unnecessary for VMs with a single network interface, but just don't use it there? Eth0 is what you get by default when using the Fedora and EL cloud images, so nothing to do there.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 11:13 UTC (Tue) by tzafrir (subscriber, #11501) [Link]

Elsewhere in the "VM" space, Xenserver tried to tackle the same issue. They added a mechanism to keep persistent interface names for eth* network interfaces. It worked.

... by renaming them to temporary names when they appear and then renaming them back "when they all appeared" (on an explicit run of a specific program).

And this is only one of many kludges attempting to maintain persistent names for eth* interfaces.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 16, 2021 12:05 UTC (Wed) by intgr (subscriber, #39733) [Link]

> There have been a lot of silly notions coming from the author of systemd over the years that we now have to deal with on the ground.

OK you had one example. But what are the other ones?

When it comes to interface names, I sort of emphatise with both sides: it seemed like a problem worth solving, and I think they did a decent job given the constraints, but I also see how it needlessly complicates configuration in places that don't need it.

I've read most articles on Lennart's blog and from my point of view nearly all of the changes he describes are improvements. Not saying perfect, but they're finally working on areas of Linux that were previously neglected for decades.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 14:06 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)

They are common cases that are frequently needed to boot a system. Seems reasonable. Since they're just UUIDs, distros can generate more for distro-specific partitions they might need: e.g. Nix (no relation) might want /nix to have a partition type ID of its own. Because the UUID space is so sparse, they can just do it: there's no need to coordinate with anyone else until and unless they want other systems to recognize a separately-mounted /nix. (Note: I believe Nix doesn't support a separately-mounted /nix in any case: it was just a random example of an OS with a major fs tree which is not /, /usr or /var.)

There is no pretension here to including *every possible* filesystem path. That would both be pointless and impossible. Nor is there any suggestion that this is replacing /etc/fstab: it's just nailing the partition -> mount point mapping into the partition table. (You could generate pieces of /etc/fstab, or systemd mount units, directly from the partition info: indeed, I believe this is what systemd is doing these days. But that doesn't obsolete the idea of having the partition -> mount point info for other mount points in a readable form somewhere outside the partition table! You don't *have* to use this for everything: you just *can* use it for boot-critical things.)

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 14, 2021 19:14 UTC (Mon) by Pc5Y9sbv (guest, #41328) [Link]

I wonder, would the spec benefit from using a tree of normative namespace UUIDs and hash-based UUID generation, like you can produce using the Python uuid.uuid5(namespace, name) routine? I realized you'd still need a list of recognized UUIDs to be known by a particular instance of the bootloader, since the hashes aren't really reversible. But, this could be easier to manage if it were a simple list of human-readable strings or namespace IDs which can be converted into the UUIDs that will get encoded in the partition tables and bootloaders.

With hashing, you would get deterministic UUIDs for specific mount point scenarios, so you can predict and coordinate usage of UUIDs across drafts or custom builds of the tooling. Depending on how deep you want this to go, you could standardize a namespace hierarchy and hashing scheme to encode binary platform, mountpoint paths, and even mount option strings. The spec, codebase, or build-time tooling could share a human readable table of mountpoints which could be trivially extended by the community, converging on the same UUIDs for the same purposes no matter who first tries to coin a new mountpoint variation.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 3:13 UTC (Tue) by jamesh (guest, #1159) [Link]

This kind of reminds me of the mount scheme of some pre-Fedora versions of Red Hat Linux, where volume labels would be set to preferred mount points, and the /etc/fstab file identified file systems with e.g. "LABEL=/home". While it fixed the problem of file systems changing device name when moving disks from one controller to another, it really didn't work well if you had two installs connected to the same computer, and was replaced by the UUID system most distros use today.

This spec seems to cover the common failure modes of that old scheme, so could be quite useful. Outside of containers/VMs, the behaviour of picking the newest root file system if multiple ones are found on the disk should be quite useful for transactionally updated embedded systems. All that's really needed is something to automatically revert to the old image if booting the new one fails.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 8:45 UTC (Tue) by geert (subscriber, #98403) [Link] (3 responses)

For now, this seems to have support for only a subset of architectures capable of running Linux: x86, x86-64, arm32, arm64, ia64, rv32, rv64.
No idea why ia64 made the list, while e.g. powerpc and s390 are missing.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 9:28 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

It's the architectures that can use UEFI to boot - GPT is the UEFI partition format.

PowerPC and S/390 don't boot via UEFI, hence omitted.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 9:39 UTC (Tue) by geert (subscriber, #98403) [Link] (1 responses)

CONFIG_EFI_PARTITION defaults to yes since commit 5f6f38dbb0fc8518 ("partitions: enable EFI/GPT support by default") in v3.8, under the premise that GPT is now (2013) commonly used.
Looking at the defconfigs, only some old m68k and MIPS machines, microblaze, and riscv nommu disable it explicitly.

Poettering: The Wondrous World of Discoverable GPT Disk Images

Posted Jun 15, 2021 9:43 UTC (Tue) by bluca (subscriber, #118303) [Link]

New architectures are added as needed. Nobody requested the ones you mentioned so far. If they support GPT and you need support for them, please send a PR.


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds