|
|
Subscribe / Log in / New account

5.8 Merge window, part 1

5.8 Merge window, part 1

Posted Jun 7, 2020 0:37 UTC (Sun) by darwi (subscriber, #131202)
Parent article: 5.8 Merge window, part 1

> The "pstore" mechanism, which stashes away system-state information in case of a panic, has gained a new back-end that stores data to a block device. See this commit for documentation.

This is amazing for debuggability on x86 laptops, which typically lack standardized ways of saving the kernel log on panic. I hope more and more block drivers implement the necessary hooks for this.


to post comments

5.8 Merge window, part 1

Posted Jun 7, 2020 1:59 UTC (Sun) by flussence (guest, #85566) [Link] (5 responses)

This would be a massive improvement over the current state of affairs; I've tried using the EFI backend before on a headless server and it worked great for a while, until one day it didn't and then fixing it required a screwdriver.

5.8 Merge window, part 1

Posted Jun 7, 2020 13:37 UTC (Sun) by josh (subscriber, #17465) [Link] (4 responses)

I would be interested in hearing the details on what went wrong.

5.8 Merge window, part 1

Posted Jun 8, 2020 5:46 UTC (Mon) by flussence (guest, #85566) [Link] (3 responses)

Extremely badly behaved AMI firmware; deleting EFI vars only hid them from view without deallocating space in nvram. After a few panics happened on one buggy kernel version it started returning -ENOSPC for efivarfs operations.

At the time I also had `installkernel` set up to update the efibootmgr entries during make install, which made everything worse - couldn't change the default boot to a known-good entry, it was stuck on the bad one.

5.8 Merge window, part 1

Posted Jun 8, 2020 5:55 UTC (Mon) by amacater (subscriber, #790) [Link] (1 responses)

That one bricked my Thinkpad - my main machine. Luckily, I was at a tech conference among friends and a couple of EFI experts where somebody suggested a complete reflash of the BIOS, someone else had an external USB that could fake a floppy - after one of the most panic stricken 3/4 hours of my life, it all worked. Many years later, it's beside me on the table, still working. Thanks MiniDebConf Cambridge

5.8 Merge window, part 1

Posted Jun 21, 2020 22:18 UTC (Sun) by nix (subscriber, #2304) [Link]

I heard about that incident. I've never dared use pstore on non-disposable hardware since.

I suspect this is too paranoid of me -- but it still feels nicer to be able to use a blockdev. (Or it would if all my disks weren't fully partitioned already -- I don't imagine it would work on USB mass storage... expecting *that* to work in panic state seems like asking a lot.)

5.8 Merge window, part 1

Posted Jun 8, 2020 8:04 UTC (Mon) by mjg59 (subscriber, #23239) [Link]

The "delete from view without reallocating" is actually a reasonable approach to not rewriting an entire flash block every time someone touches a variable. The problem was that the platform wouldn't trigger garbage collection until it detected it was out of space, and if you were *very* close to being out of space the firmware would run out of space before the garbage collection code ran.

There's not actually a good answer here. EFI firmware updates need to be in SMM because that's the only mechanism Intel provide to allow authenticated writes to flash, and SMM can't run without all cores being in SMM, so if you do a variable update and need to rewrite an entire block of flash you're going to halt the OS for long enough that things will be unhappy. Getting into a situation where you allow the OS to make the machine unbootable obviously isn't a great answer to that (https://lore.kernel.org/patchwork/patch/300747/ is arguably more egregious in this respect), but the singular bug that actually led us to this point is an understandable one.

(The -ENOSPC behaviour is accompanied by the kernel then attempting to create a variable it knows is too big in order to force the firmware to do a garbage collection run on next boot. If you boot via the UEFI boot stub then the kernel will do this while still in UEFI boot services, which should trigger the garbage collection before the kernel starts. As a result, this *should* now be largely invisible to users without putting systems at risk, but obviously we have no way whatsoever kf knowing how firmware actually works before we try it)

5.8 Merge window, part 1

Posted Jun 7, 2020 16:35 UTC (Sun) by ebiederm (subscriber, #35028) [Link]

We have had kexec on panic for years. So this is doable without pstore.

I am just a little astonished. Last round this was tested (admittedly with more data because people wanted a kernel core dump) using drivers in the kernel to write to on a kernel panic only worked on developers machines. Under actual read world failure conditions the kernel was always too compromised to successfully (safely?) write to a block device.

The listed driver restrictions might be enough to make the code reliable in a crash scenario. I would love to see a report on their testing.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds