the self healing work continues in bcachefs
the self healing work continues in bcachefs
Posted Mar 29, 2025 5:09 UTC (Sat) by koverstreet (✭ supporter ✭, #4296)Parent article: The first part of the 6.15 merge window
Posted Mar 30, 2025 22:59 UTC (Sun)
by motk (subscriber, #51120)
[Link] (5 responses)
Posted Mar 31, 2025 3:09 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link] (4 responses)
We regularly recover from extreme disaster scenarios today - I've been looking at a metadata dump where it looked like a head just skated across the platter, which created some very... particular alloc info inconsistencies, but that's been the only failure to repair in ~6 months, and I've seen logs of some good ones. So that's largely done.
Once the mount API extension happens, plus better communication between the mount helper and systemd/plymouth (because of course communicating things to the user has been getting more complicated), we'll even be able to tell the user "hey, your SSD crapped itself (X IO errors, toast btree nodes), please wait while we reconstruct btree roots/alloc/what have you, here's a progress bar"
And this stuff is pretty fast, too - post 6.14 that dealt with backpointers check/repair. Even btree node scan is fast thanks to a small bitmap in the superblock, if we lose btree roots.
Further off, post experimental, will be finishing off online fsck - and then we'll be able to recover from slightly absurd levels of damage in the background while your filesystem is RW. (People with huge arrays really want this).
Posted Mar 31, 2025 4:23 UTC (Mon)
by jmalcolm (subscriber, #8876)
[Link] (1 responses)
Posted Mar 31, 2025 4:57 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link]
Posted Apr 3, 2025 8:06 UTC (Thu)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Apr 3, 2025 13:52 UTC (Thu)
by koverstreet (✭ supporter ✭, #4296)
[Link]
Posted Apr 3, 2025 8:03 UTC (Thu)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Apr 3, 2025 13:59 UTC (Thu)
by koverstreet (✭ supporter ✭, #4296)
[Link]
There are cases where fsck will delete things, but for the most part that's only if we have another piece of metadata that says "this shouldn't exist".
e.g., extent past the end of an inode - something went wrong with truncate.
If a reflink pointer points to a missing indirect extent, we just mark it as poisoned, so on future attempts to read from it we don't have to print out the same error, and we can un-poison it if the indirect extent comes back; this guards against a temporary lookup error in the reflink btree.
For the snapshots btree, a key for a snapshot node that doesn't exist generally indicates a problem with snapshot deletion, and the key will be deleted. But we also track when a btree has lost data (topology error, IO error), and if the snapshots btree has lost data we'll instead try to reconstruct snapshot tree nodes (and also subvolume keys, etc.).
We can reconstruct inodes if the inodes btree has lost data (permissions, ownership, timestamps etc. will all be wrong, and i_size will be a bit off but you'll still have the correct file contents).
This topic is an area of future research, but for all practical purposes we're in good shape.
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
Is self-healing always wanted? My concerns are:
Is self-healing always good?
If the filesystem can’t tell if file X should be there or not, or is uncertain as to what its contents should be, I would prefer that all attempts to access X fail with something other than -ENOENT until and unless the administrator tells the filesystem to use its best guess of what the pre-corruption situation was, or X is overwritten by an operation that makes that state irrelevant. Silently returning wrong data is the worst possible outcome.
Is self-healing always good?
