| From: |
| "Darrick J. Wong" <djwong-AT-kernel.org> |
| To: |
| djwong-AT-kernel.org, cem-AT-kernel.org |
| Subject: |
| [PATCHSET v5] xfs: autonomous self healing of filesystems |
| Date: |
| Mon, 12 Jan 2026 16:32:43 -0800 |
| Message-ID: |
| <176826412644.3493441.536177954776056129.stgit@frogsfrogsfrogs> |
| Cc: |
| hch-AT-lst.de, linux-fsdevel-AT-vger.kernel.org, linux-xfs-AT-vger.kernel.org |
| Archive-link: |
| Article |
Hi all,
This patchset builds new functionality to deliver live information about
filesystem health events to userspace. This is done by creating an
anonymous file that can be read() for events by userspace programs.
Events are captured by hooking various parts of XFS and iomap so that
metadata health failures, file I/O errors, and major changes in
filesystem state (unmounts, shutdowns, etc.) can be observed by
programs.
When an event occurs, the hook functions queue an event object to each
event anonfd for later processing. Programs must have CAP_SYS_ADMIN
to open the anonfd and there's a maximum event lag to prevent resource
overconsumption. The events themselves can be read() from the anonfd
as C structs for the xfs_healer daemon.
In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically. This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing. They are auto-started by a starter
service that uses fanotify.
v5: add verify-media ioctl, collapse small helper funcs with only
one caller
v4: drop multiple client support so we can make direct calls into
healthmon instead of chasing pointers and doing indirect calls
v3: drag out of rfc status
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-l...
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfspr...
fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfste...
---
Commits in this patchset:
* docs: discuss autonomous self healing in the xfs online repair design doc
* xfs: start creating infrastructure for health monitoring
* xfs: create event queuing, formatting, and discovery infrastructure
* xfs: convey filesystem unmount events to the health monitor
* xfs: convey metadata health events to the health monitor
* xfs: convey filesystem shutdown events to the health monitor
* xfs: convey externally discovered fsdax media errors to the health monitor
* xfs: convey file I/O errors to the health monitor
* xfs: allow reconfiguration of the health monitoring device
* xfs: check if an open file is on the health monitored fs
* xfs: add media verification ioctl
---
fs/xfs/libxfs/xfs_fs.h | 186 +++
fs/xfs/libxfs/xfs_health.h | 5
fs/xfs/xfs_healthmon.h | 181 +++
fs/xfs/xfs_mount.h | 4
fs/xfs/xfs_notify_failure.h | 4
fs/xfs/xfs_trace.h | 511 ++++++++
.../filesystems/xfs/xfs-online-fsck-design.rst | 153 ++
fs/xfs/Makefile | 7
fs/xfs/xfs_fsops.c | 15
fs/xfs/xfs_health.c | 124 ++
fs/xfs/xfs_healthmon.c | 1257 ++++++++++++++++++++
fs/xfs/xfs_ioctl.c | 7
fs/xfs/xfs_mount.c | 2
fs/xfs/xfs_notify_failure.c | 392 ++++++
fs/xfs/xfs_super.c | 12
fs/xfs/xfs_trace.c | 5
16 files changed, 2846 insertions(+), 19 deletions(-)
create mode 100644 fs/xfs/xfs_healthmon.h
create mode 100644 fs/xfs/xfs_healthmon.c