|
|
Subscribe / Log in / New account

Do not use non-core systemd

Do not use non-core systemd

Posted Sep 25, 2025 22:15 UTC (Thu) by intgr (subscriber, #39733)
In reply to: Do not use non-core systemd by Cyberax
Parent article: An unstable Debian stable update

> The good parts include [...] home directory management.

I'm not sure, that part might be abandonware. You would think that if it was maintained, not corrupting user data world be high on the list of priorities.

A year ago I had an incident where during a regular system shutdown, the systemd-homed resize operation took too long and got killed due to timeout. Result was a corrupted partition table inside the user home image and I could no longer access my data.

OK, bugs happen, but what baffled me was that to this day, no systemd developers have acknowledged or reacted the bug report, despite reports from other users also being affected.

Thankfully after a few hours of struggling, I managed to mount the image and could recover my data. But to this day, I still hold my breath every time I restart my system that this bug doesn't hit again.


to post comments

Do not use non-core systemd

Posted Sep 25, 2025 23:39 UTC (Thu) by smurf (subscriber, #17840) [Link] (6 responses)

I'd assume that the partition table of any image (or hard disk) is essentially read-only, in the sense that nothing writes to these areas, and thus cannot be corrupted by something that merely hangs, no matter what you do to the process controlling it.

Thus, whichever freaky bug you encountered is not reproducible. So what should the maintainers actually do, other than say "yeah it's a strange case of data corruption but if we don't see a pattern, much less a reproducer …" and leave it open?

Do not use non-core systemd

Posted Sep 25, 2025 23:57 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link]

Bugs without reproducers are a normal fact of life, you can't just give up any time you see one.

To start, you look at all the information you have available; in this case that'd mean looking at the corrupted partition table - hexdump if necessary - to see if you can spot any patterns or deduce anything about what happened.

Sometimes that can tell you all you need to know; I once saw a guy deduce just from a hexdump of some memory corruption that it was an errant memmove that shot past the address it was supposed to stop at - he spotted that everything had been shifted by 8 bytes - and then from that, found the exact line of code that caused it (an open coded memmove).

If that fails to identify the bug, you take what you do know and look for places where your code is weak and could be hardened; you look for ways to make entire classes of bugs impossible. This is on-disk data structures we're talking about; we don't have practical techniques for making entire codebases in systems languages bug free, but we absolutely can identify ways to limit damage and make sure that on disk data structures aren't corrupted.

Do not use non-core systemd

Posted Sep 26, 2025 0:15 UTC (Fri) by intgr (subscriber, #39733) [Link] (4 responses)

> I'd assume that the partition table of any image (or hard disk) is essentially read-only

Incorrect assumption. Systemd-homed regularly resizes images/partitions to rebalance the disk space available between multiple users.

Particularly, btrfs can only be shrunk when unmounted, but obviously needs to be done when encryption keys are still available, so it frequently happens during system shut down.

> something that merely hangs,

It's not a "hang". The resizing can take a few minutes even on fast SSD if lots of data needs to be relocated.

Do not use non-core systemd

Posted Sep 26, 2025 0:44 UTC (Fri) by intelfx (subscriber, #130118) [Link] (1 responses)

> Particularly, btrfs can only be shrunk when unmounted

Without commenting on the rest of the analysis, that's not true.

Do not use non-core systemd

Posted Sep 26, 2025 7:08 UTC (Fri) by intgr (subscriber, #39733) [Link]

Oh, I must be misremembering. But regardless, that's when homed frequently does shrinking.

Do not use non-core systemd

Posted Sep 27, 2025 10:55 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

> Incorrect assumption. Systemd-homed regularly resizes images/partitions to rebalance the disk space available between multiple users.

Ah. Thanks for the correction.

One might assume that it was an SSD that was powered down during writing. In this case all bets are off. Don't Do That.

Thus there needs to be an "work in progress, do NOT shut down" dialog/message/whatever. (Assuming that systemd doesn't time out the resize.)

Do not use non-core systemd

Posted Sep 28, 2025 1:18 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

That sounds like a job for `systemd-inhibit(1)`. See https://systemd.io/INHIBITOR_LOCKS/


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds