| From: |
| "Nirjhar Roy (IBM)" <nirjhar.roy.lists-AT-gmail.com> |
| To: |
| linux-xfs-AT-vger.kernel.org |
| Subject: |
| [RFC V3 0/3] xfs: Add support to shrink multiple empty AGs |
| Date: |
| Mon, 20 Oct 2025 21:13:41 +0530 |
| Message-ID: |
| <cover.1760640936.git.nirjhar.roy.lists@gmail.com> |
| Cc: |
| nirjhar.roy.lists-AT-gmail.com, ritesh.list-AT-gmail.com, ojaswin-AT-linux.ibm.com, djwong-AT-kernel.org, bfoster-AT-redhat.com, david-AT-fromorbit.com, hsiangkao-AT-linux.alibaba.com |
| Archive-link: |
| Article |
This work is based on a previous RFC[1] by Gao Xiang and various ideas
proposed by Dave Chinner in the RFC[1].
Currently the functionality of shrink is limited to shrinking the last
AG partially but not beyond that. This patch extends the functionality
to support shrinking beyond 1 AG. However the AGs that we will be remove
have to empty in order to prevent any loss of data.
The patch begins with the re-introduction of some of the data
structures that were removed, some code refactoring and
finally the patch that implements the multi AG shrink design.
The final patch has all the details including the definition of the
terminologies and the overall design.
fstests are in [3].
[rfc_v2] --> v3
1) Function/macro renamings:
1.a xfs_ag_is_empty() -> xfs_perag_is_empty()
1.b xfs_ag_is_active() -> xfs_perag_is_active()
1.c xfs_shrinkfs_stablize_ags() -> xfs_shrinkfs_quiesce_ags()
1.d for_each_perag_range_reverse -> for_each_agno_range_reverse
2) Modified the commit messages for patch 3/3
2.a Modified the definition of empty AG
2.b Slightly changed the description of some of the steps in ag
quiesce/stablization and ag deactivation.
3) Design changes:
3.a In function xfs_growfs_data_private() - call
xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, delta) instead of
manually restoring the fdblock incore counters(which were reserved
during AG deactivation) if the AG count is reducing during shrink.
3.b Introduced a new state XFS_OPSTATE_SHRINKING. This flag will be set
during start of the shrink (in xfs_growfs_data_private())
and will be cleared after the shrink process finishes/aborts.
Now, using the function xfs_is_shrinking(), we turn off the
following check in xfs_validate_ag_length():
if (bp->b_pag && seqno != mp->m_sb.sb_agcount - 1)
return __this_address;
We do the above in the following way:
if (!xfs_is_shrinking(mp) &&
bp->b_pag && seqno != mp->m_sb.sb_agcount - 1)
return __this_address;
Shrinking is a rare operation and hence the above logic makes
sense.
3.c In function xfs_perag_deactivate() - Returning int instead of bool
and replacing wait_event() with wait_event_killable() so that the
shrink process can be safely killed by an user. If the wait is
interrupted, the offlined AGs (if any) will be re-activated.
[rfc_v1] --> v2
1) Function renamings:
1.a xfs_activate_ag() -> xfs_perag_activate()
1.b xfs_deactivate_ag() -> xfs_perag_deactivate()
1.c xfs_pag_populate_cached_bufs() -> xfs_buf_cache_grab_all()
1.d xfs_buf_offline_perag_rele_cached() -> xfs_buf_cache_invalidate()
1.e xfs_extent_busy_wait_range() -> xfs_extent_busy_wait_ags()
1.f xfs_growfs_get_delta() -> xfs_growfs_compute_delta()
2) Fixed several coding style fixes and typos in the code and
commit messages.
3) Introduced for_each_perag_range_reverse() macro and used in
instead of using for loops directly.
4) Design changes:
4.a In function xfs_ag_is_empty() - Removed the
ASSERT(!xfs_ag_contains_log(mp, pag_agno(pag)));
4.b In function xfs_shrinkfs_reactivate_ags() - Replaced
if (nagcount >= oagcount) return; with ASSERT(nagcount < oagcount);
4.c In function xfs_perag_deactivate() - Add one extra step where
we manually reduce/reserve (pagf_freeblks + pagf_flcount) worth of
free datablocks from the global counters. This is necessary
in order to prevent a race where, some AGs have been temporarily
offlined but the delayed allocator has already promised some bytes
and later the real extent/block allocation is failing due to
the AG(s) being offline.
4.d In function xfs_perag_activate() - Add one extra step where
we restore the global free block counter which we reduced in
xfs_perag_deactivate.
4.e In function xfs_shrinkfs_deactivate_ags() -
1. Flushing the xfs_discard_wq after the log force/flush.
2. Removed the direct usage of xfs_log_quiesce(). The reason
is that xfs_log_quiesce() is expected to be called when the
caller has made sure that the log/filesystem is idle but
for shrink, we don't necessarily need the log/filesystem
to be idle.
However, we still need the checkpointing to take place,
so we are doing a xfs_sync_sb+AIL flush twice - something
similar that is being done in xfs_log_cover().
More details are in the patch.
3. Moved the entire code of ag stabilization (after ag
offlining) into a separate function -
xfs_shrinkfs_stabilize_ags().
4.f Fixed a bug where if the size of the new tail AG was less than
XFS_MIN_AG_BLOCKS, then shrink was passing - the correct behavior
is to fail with -EINVAL. Thank you Ritesh[2] for pointing this out.
5) Added RBs from Darrick in patch 1/3 and patch 2/3 (after addressing his
comments).
[1] https://lore.kernel.org/all/20210414195240.1802221-1-hsia...
[2] https://lore.kernel.org/all/875xfas2f6.fsf@gmail.com/
[3] https://lore.kernel.org/all/cover.1758035262.git.nirjhar....
[rfc_v1] https://lore.kernel.org/all/cover.1752746805.git.nirjhar....
[rfc_v2] https://lore.kernel.org/linux-xfs/cover.1758034274.git.ni...
Nirjhar Roy (IBM) (3):
xfs: Re-introduce xg_active_wq field in struct xfs_group
xfs: Refactoring the nagcount and delta calculation
xfs: Add support to shrink multiple empty AGs
fs/xfs/libxfs/xfs_ag.c | 191 ++++++++++++++++-
fs/xfs/libxfs/xfs_ag.h | 17 ++
fs/xfs/libxfs/xfs_alloc.c | 10 +-
fs/xfs/libxfs/xfs_group.c | 4 +-
fs/xfs/libxfs/xfs_group.h | 2 +
fs/xfs/xfs_buf.c | 78 +++++++
fs/xfs/xfs_buf.h | 1 +
fs/xfs/xfs_buf_item_recover.c | 37 ++--
fs/xfs/xfs_extent_busy.c | 30 +++
fs/xfs/xfs_extent_busy.h | 2 +
fs/xfs/xfs_fsops.c | 379 +++++++++++++++++++++++++++++++---
fs/xfs/xfs_mount.h | 3 +
fs/xfs/xfs_trans.c | 1 -
13 files changed, 701 insertions(+), 54 deletions(-)
--
2.43.5