|
|
Log in / Subscribe / Register

xfs: autonomous self healing of filesystems

From:  "Darrick J. Wong" <djwong-AT-kernel.org>
To:  cem-AT-kernel.org, djwong-AT-kernel.org
Subject:  [PATCHSET V2] xfs: autonomous self healing of filesystems
Date:  Wed, 22 Oct 2025 16:59:46 -0700
Message-ID:  <176117744372.1025409.2163337783918942983.stgit@frogsfrogsfrogs>
Cc:  linux-fsdevel-AT-vger.kernel.org, linux-xfs-AT-vger.kernel.org
Archive-link:  Article

Hi all,

This patchset builds new functionality to deliver live information about
filesystem health events to userspace.  This is done by creating an
anonymous file that can be read() for events by userspace programs.
Events are captured by hooking various parts of XFS and iomap so that
metadata health failures, file I/O errors, and major changes in
filesystem state (unmounts, shutdowns, etc.) can be observed by
programs.

When an event occurs, the hook functions queue an event object to each
event anonfd for later processing.  Programs must have CAP_SYS_ADMIN
to open the anonfd and there's a maximum event lag to prevent resource
overconsumption.  The events themselves can be read() from the anonfd
either as json objects for human readability, or as C structs for
daemons.

In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically.  This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing.  It is autostarted via some udev rules.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-l...

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfspr...

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfste...
---
Commits in this patchset:
 * docs: remove obsolete links in the xfs online repair documentation
 * docs: discuss autonomous self healing in the xfs online repair design doc
 * xfs: create debugfs uuid aliases
 * xfs: create hooks for monitoring health updates
 * xfs: create a filesystem shutdown hook
 * xfs: create hooks for media errors
 * iomap: report buffered read and write io errors to the filesystem
 * iomap: report directio read and write errors to callers
 * xfs: create file io error hooks
 * xfs: create a special file to pass filesystem health to userspace
 * xfs: create event queuing, formatting, and discovery infrastructure
 * xfs: report metadata health events through healthmon
 * xfs: report shutdown events through healthmon
 * xfs: report media errors through healthmon
 * xfs: report file io errors through healthmon
 * xfs: allow reconfiguration of the health monitoring device
 * xfs: validate fds against running healthmon
 * xfs: add media error reporting ioctl
 * xfs: send uevents when major filesystem events happen
---
 fs/iomap/internal.h                                |    2 
 fs/xfs/libxfs/xfs_fs.h                             |  173 ++
 fs/xfs/libxfs/xfs_health.h                         |   52 +
 fs/xfs/xfs_file.h                                  |   36 
 fs/xfs/xfs_fsops.h                                 |   14 
 fs/xfs/xfs_healthmon.h                             |  107 +
 fs/xfs/xfs_linux.h                                 |    3 
 fs/xfs/xfs_mount.h                                 |   13 
 fs/xfs/xfs_notify_failure.h                        |   44 +
 fs/xfs/xfs_super.h                                 |   13 
 fs/xfs/xfs_trace.h                                 |  404 +++++
 include/linux/fs.h                                 |    4 
 include/linux/iomap.h                              |    2 
 Documentation/filesystems/vfs.rst                  |    7 
 .../filesystems/xfs/xfs-online-fsck-design.rst     |  336 +---
 fs/iomap/buffered-io.c                             |   27 
 fs/iomap/direct-io.c                               |    4 
 fs/iomap/ioend.c                                   |    4 
 fs/xfs/Kconfig                                     |    8 
 fs/xfs/Makefile                                    |    7 
 fs/xfs/libxfs/xfs_healthmon.schema.json            |  648 +++++++
 fs/xfs/xfs_aops.c                                  |    2 
 fs/xfs/xfs_file.c                                  |  167 ++
 fs/xfs/xfs_fsops.c                                 |   75 +
 fs/xfs/xfs_health.c                                |  269 +++
 fs/xfs/xfs_healthmon.c                             | 1741 ++++++++++++++++++++
 fs/xfs/xfs_ioctl.c                                 |    7 
 fs/xfs/xfs_notify_failure.c                        |  135 +-
 fs/xfs/xfs_super.c                                 |  109 +
 fs/xfs/xfs_trace.c                                 |    4 
 lib/seq_buf.c                                      |    1 
 31 files changed, 4173 insertions(+), 245 deletions(-)
 create mode 100644 fs/xfs/xfs_healthmon.h
 create mode 100644 fs/xfs/libxfs/xfs_healthmon.schema.json
 create mode 100644 fs/xfs/xfs_healthmon.c




Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds