ext4: optimize online defragment
From: | Zhang Yi <yi.zhang-AT-huaweicloud.com> | |
To: | linux-ext4-AT-vger.kernel.org | |
Subject: | [PATCH 00/13] ext4: optimize online defragment | |
Date: | Tue, 23 Sep 2025 09:27:10 +0800 | |
Message-ID: | <20250923012724.2378858-1-yi.zhang@huaweicloud.com> | |
Cc: | linux-fsdevel-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, tytso-AT-mit.edu, adilger.kernel-AT-dilger.ca, jack-AT-suse.cz, yi.zhang-AT-huawei.com, yi.zhang-AT-huaweicloud.com, libaokun1-AT-huawei.com, yukuai3-AT-huawei.com, yangerkun-AT-huawei.com | |
Archive-link: | Article |
From: Zhang Yi <yi.zhang@huawei.com> Hello! Currently, the online defragmentation of the ext4 is primarily implemented through the move extent operation in the kernel. This extent-moving operates at the granularity of PAGE_SIZE, iteratively performing extent swapping and data movement operations, which is quite inefficient. Especially since ext4 now supports large folios, iterations at the PAGE_SIZE granularity are no longer practical and fail to leverage the advantages of large folios. Additionally, the current implementation is tightly coupled with buffer_head, making it unable to support after the conversion of buffered I/O processes to the iomap infrastructure. This patch set (based on 6.17-rc7) optimizes the extent-moving process, deprecates the old move_extent_per_page() interface, and introduces a new mext_move_extent() interface. The new interface iterates over and copies data based on the extents of the original file instead of the PAGE_SIZE, and supporting large folios. The data processing logic in the iteration remains largely consistent with previous versions, with no additional optimizations or changes made. Additionally, the primary objective of this set of patches is to prepare for converting the buffered I/O process for regular files to the iomap infrastructure. These patches decouple the buffer_head from the main extent-moving process, restricting its use to only the helpers mext_folio_mkwrite() and mext_folio_mkuptodate(), which handle updating and marking pages in the swapped page cache as dirty. The overall coding style of the extent-moving process aligns with the iomap infrastructure, laying the foundation for supporting online defragmentation once the iomap infrastructure is adopted. Patch overview: Patch 1: Fix an off-by-one issue. Patch 2: Fix a minor issue related to validity checking. Patch 3-5: Introduce a sequence counter for the mapping extent status tree, this also prepares for the iomap infrastructure. Patch 6-8: Refactor the mext_check_arguments() helper function and the validity checking to improve code readability. Patch 9-13: Drop move_extent_per_page() and switch to using the new mext_move_extent(). Additionally, add support for large folios. With this patch set, the efficiency of online defragmentation for the ext4 file system can also be improved under general circumstances. Below is a set of typical test obtained using the fio e4defrag ioengine on the environment with Intel Xeon Gold 6240 CPU, 400G memory and a NVMe SSD device. [defrag] directory=/mnt filesize=400G buffered=1 fadvise_hint=0 ioengine=e4defrag bs=4k # 4k,32k,128k donorname=test.def filename=test inplace=0 rw=write overwrite=0 # 0 for unwritten extent and 1 for written extent numjobs=1 iodepth=1 runtime=30s [w/o] U 4k: IOPS=225k, BW=877MiB/s # U: unwritten extent-moving U 32k: IOPS=33.2k, BW=1037MiB/s U 128k: IOPS=8510, BW=1064MiB/s M 4k: IOPS=19.8k, BW=77.2MiB/s # M: written extent-moving M 32k: IOPS=2502, BW=78.2MiB/s M 128k: IOPS=635, BW=79.5MiB/s [w] U 4k: IOPS=246k, BW=963MiB/s U 32k: IOPS=209k, BW=6529MiB/s U 128k: IOPS=146k, BW=17.8GiB/s M 4k: IOPS=19.5k, BW=76.2MiB/s M 32k: IOPS=4091, BW=128MiB/s M 128k: IOPS=2814, BW=352MiB/s Best Regards, Yi. Zhang Yi (13): ext4: fix an off-by-one issue during moving extents ext4: correct the checking of quota files before moving extents ext4: introduce seq counter for the extent status entry ext4: make ext4_es_lookup_extent() pass out the extent seq counter ext4: pass out extent seq counter when mapping blocks ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() ext4: add mext_check_validity() to do basic check ext4: refactor mext_check_arguments() ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() ext4: introduce mext_move_extent() ext4: switch to using the new extent movement method ext4: add large folios support for moving extents ext4: add two trace points for moving extents fs/ext4/ext4.h | 3 + fs/ext4/extents.c | 2 +- fs/ext4/extents_status.c | 27 +- fs/ext4/extents_status.h | 2 +- fs/ext4/inode.c | 28 +- fs/ext4/ioctl.c | 10 - fs/ext4/move_extent.c | 773 ++++++++++++++++-------------------- fs/ext4/super.c | 1 + include/trace/events/ext4.h | 97 ++++- 9 files changed, 486 insertions(+), 457 deletions(-) -- 2.46.1