|
|
Subscribe / Log in / New account

btrfs: do not trust direct IO page at all

From:  Qu Wenruo <wqu-AT-suse.com>
To:  linux-btrfs-AT-vger.kernel.org
Subject:  [PATCH 0/2] btrfs: do not trust direct IO page at all
Date:  Tue, 21 Oct 2025 17:51:13 +1030
Message-ID:  <cover.1761030762.git.wqu@suse.com>
Archive-link:  Article

[CHANGELOG]
RFC->v1
- Fix a BUG() triggered btrfs/261
  Where the dio bio can be backed by large folios, thus the whole bio
  can be larger than PAGE_SIZE * BIO_MAX_VECS (4K * 256 for x86_64).
  In that case we are not ensured to allocate a bio to cover the whole
  range.
  Add infrastructure to trace multiple btrfs bios for the same dio bio.

- Add the patch to force STABLE_WRITE flags for all inodes

There is a kernel bugzilla report mentioning that direct IO (and certain
buffered IO can modify the page cache during writeback since the device
has no STABLE_WRITE flag) can easily lead to RAID1 mirror content
mismatch.

Although that report doesn't mention btrfs, as our commit 968f19c5b1b7
("btrfs: always fallback to buffered write if the inode requires
checksum") make inodes with data checksum (the default) to fallback to
buffered IO thus avoid modification during writeback.

The report still exposed that, for our nodatasum inodes, they are still
affected by the same direct IO buffer modification bug, and even worse
since the inode has nodatasum, it doesn't even set the STABLE_WRITE flag
thus we're allowed to modify the page cache even if it's still under
writeback.

This series address the problem by:

- Force STABLE_WRITE flags for all btrfs inodes
  So even for nodatasum inodes, they will wait for writeback before
  modifying the page cache. So that at least the content of different
  mirrors should match.

- Use bounce pages for direct IO
  Instead of using the pages from dio bio, always allocate our own pages
  (so no one else can modify) to do the real IO.
  This will ensure even direct IO on nodatasum inodes will result stable
  contents on different mirrors.

Qu Wenruo (2):
  btrfs: force stable writes for all inodes
  btrfs: allocate bounce pages for direct IO

 fs/btrfs/btrfs_inode.h |   5 +-
 fs/btrfs/direct-io.c   | 293 ++++++++++++++++++++++++++++++++---------
 2 files changed, 231 insertions(+), 67 deletions(-)

-- 
2.51.0




Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds