OpenBSD kernel address randomized link

Posted Jul 13, 2017 9:09 UTC (Thu) by farnz (subscriber, #17727)
In reply to: OpenBSD kernel address randomized link by ikm
Parent article: OpenBSD kernel address randomized link

Assuming OpenBSD implements the expected POSIX behaviours (and I'd be surprised if it didn't - this is one of the saner bits of POSIX), this is easy to arrange:

Boot from kernel.
Link new kernel into same filesystem, under the name kernel.new.
Use fsync and/or sync to ensure that all data is written out to disk.
Use rename to atomically replace kernel with kernel.new.

Throughout this process, the system stays atomically in one of two states:

kernel is the currently running kernel, not yet replaced by kernel.new. kernel.new may or may not exist, and may or may not be a bootable kernel at this point.
kernel is the newly built kernel.

You can increase safety by first using link to ensure that kernel.booted is the last kernel to boot as far as running the linker, and teaching the bootloader to try it if kernel fails to boot for any reason - that way, you get a second chance if the new kernel is mislinked due to a bug.

OpenBSD kernel address randomized link

Posted Jul 13, 2017 12:07 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

Note that exactly this behaviour is likely to cause failure with the combination of XFS and GRUB2, because GRUB2 reads the filesystem without replaying the log (XFS journal), but XFS (IMHO sensibly) assumes that once something has hit the log it is committed to disk, so speeds up shutdown by not doing a full log replay until mount time. So GRUB2 reads an undefined, uncommitted God-knows-what which probably doesn't include the latest writes, and implodes.

This is, IMHO, entirely GRUB2's fault, for implementing an XFS filesystem reading driver that doesn't actually read the filesystem in the same way that XFS itself has (and *always* has: this is not a recent optimization).

OpenBSD kernel address randomized link

Posted Jul 13, 2017 13:20 UTC (Thu) by farnz (subscriber, #17727) [Link]

That sort of buggy behaviour is precisely the sort of bug I was thinking of when I said "Assuming OpenBSD implements the expected POSIX behaviours"; it sounds like the combination of GRUB2 and Linux does not implement the expected POSIX behaviours.

And I'd agree with you on the blame; while I can see why GRUB2 wouldn't want to do a log replay itself, it should at least check the log first, then the backing store if the log has no recent changes - that way, it won't read something unexpected.

OpenBSD kernel address randomized link

Posted Jul 13, 2017 13:22 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (2 responses)

Hmm, this may be the wrong place to ask but

Does XFS have read-only mounts? How do they interact with this... let's call it an optimisation?

Historically it seems like XFS log is not very big, so a correct implementation could read the entire thing into RAM, and consult it throughout all further operations. Obviously that's going to be both slow and error-prone, because it's basically taking a lot of situations where there's a happy path and splitting them into two similar but different happy paths one of which is very difficult to test. Is that really how all bootloaders designed to work with XFS do it?

I presume that the XFS authors focused instead on, as you say, "replaying the log at mount time", converting the mount to read-write on the fly which is cool but obviously there is no promise that's _possible_ let alone a good idea.

OpenBSD kernel address randomized link

Posted Jul 14, 2017 23:55 UTC (Fri) by rahvin (guest, #16953) [Link]

Presumably the current linux kernel drivers can adequately handle XFS in it's preferred order (I would assume so as I use XFS on a partition on my home server). Given this I believe the parent poster was simply lamenting that Grub2 doesn't just use this same behavior and has implemented a driver that's not following the appropriate standard leading to problems.

Without knowing anything about Grub's code I would say it shouldn't be that difficult to change the Grub2 behavior to the proper methodology without significant speed impacts because the kernels already got all that sorted out. The problem is likely that the Grub2 project isn't sexy, has limited contributors and it's not a popular issue, the single greatest weakness of getting any problem addressed in open source when you can't fix it yourself. Feel free to correct me if I'm wrong.

OpenBSD kernel address randomized link

Posted Jul 17, 2017 17:32 UTC (Mon) by ikm (guest, #493) [Link]

As far as I remember ext4 remounts the filesystem read-write to replay the journal in this case (and subsequently remounts it back r/o), or refuses to mount the filesystem otherwise. I'd expect XFS to do the same, even though I'm just guessing at this point. I's just that putting journal checks thoughout all hot paths of the fs code just to implement read-only mounts doesn't sound very practical. It'd also be quite brittle/error prone.

OpenBSD kernel address randomized link

Posted Jul 15, 2017 21:46 UTC (Sat) by lsl (subscriber, #86508) [Link] (1 responses)

> Throughout this process, the system stays atomically in one of two states:

AIUI, the atomicity guarantees are with respect to other processes inspecting the file system concurrently. POSIX guarantees nothing whatsoever about the state of the system following a hard system crash or power loss. Finding your kernel image replaced by a picture of Rick Astley should be perfectly fine as far as POSIX is concerned.

OpenBSD kernel address randomized link

Posted Jul 16, 2017 12:44 UTC (Sun) by farnz (subscriber, #17727) [Link]

To be completely pedantic, POSIX is fine with your system exploding in a shower of sparks after power loss or a hard crash. However, if you've implemented the concurrent access rules POSIX requires, and you've implemented the data integrity after sync rules that POSIX requires, then assuming that you've implemented any sort of reasonable behaviour after a power loss or hard system crash, and that the hardware does not fail, you'll see atomicity.

OpenBSD kernel address randomized link

Posted Jul 17, 2017 3:32 UTC (Mon) by Jonno (subscriber, #49613) [Link]

> Throughout this process, the system stays atomically in one of two states:

Actually, that is not true. After step 4 the file will be safely on disk, but the directory will not be. If power fails after you did the rename but before the directory is synced the directory listing can be anything. As far as POISIX is concerned, a directory is a file whose content is an implementation defined mapping from filename to inode. In step 4 you are editing that content in-place. The dangers of doing this is exactly the same as the dangers of editing any other file in-place.

As a practical matter, if the directory is no larger than a single block, writes are going to be atomic (at least at the interface level, what the drive does internally is another matter), but for large directories all bets are of. This race condition can not be entirely avoided, but can be shortened by using fsync on the directory immediately after step 4.