|
|
Log in / Subscribe / Register

mm: switch THP shrinker to list_lru

From:  Johannes Weiner <hannes-AT-cmpxchg.org>
To:  Andrew Morton <akpm-AT-linux-foundation.org>
Subject:  [PATCH v4 0/8] mm: switch THP shrinker to list_lru
Date:  Thu, 21 May 2026 11:02:06 -0400
Message-ID:  <20260521150330.1955924-1-hannes@cmpxchg.org>
Cc:  David Hildenbrand <david-AT-kernel.org>, Lorenzo Stoakes <ljs-AT-kernel.org>, Shakeel Butt <shakeel.butt-AT-linux.dev>, Michal Hocko <mhocko-AT-kernel.org>, Dave Chinner <david-AT-fromorbit.com>, Roman Gushchin <roman.gushchin-AT-linux.dev>, Muchun Song <muchun.song-AT-linux.dev>, Qi Zheng <qi.zheng-AT-linux.dev>, Yosry Ahmed <yosry.ahmed-AT-linux.dev>, Zi Yan <ziy-AT-nvidia.com>, "Liam R . Howlett" <liam-AT-infradead.org>, Usama Arif <usama.arif-AT-linux.dev>, Kiryl Shutsemau <kas-AT-kernel.org>, Vlastimil Babka <vbabka-AT-kernel.org>, Kairui Song <ryncsn-AT-gmail.com>, Mikhail Zaslonko <zaslonko-AT-linux.ibm.com>, Vasily Gorbik <gor-AT-linux.ibm.com>, Baolin Wang <baolin.wang-AT-linux.alibaba.com>, Barry Song <baohua-AT-kernel.org>, Dev Jain <dev.jain-AT-arm.com>, Lance Yang <lance.yang-AT-linux.dev>, Nico Pache <npache-AT-redhat.com>, Ryan Roberts <ryan.roberts-AT-arm.com>, cgroups-AT-vger.kernel.org, linux-mm-AT-kvack.org, linux-kernel-AT-vger.kernel.org
Archive-link:  Article

This is version 4 of switching the THP shrinker to list_lru.

Changes in v4:
- guard folio_memcg_alloc_deferred() with mem_cgroup_disabled() to fix
  NULL deref in __memcg_list_lru_alloc() when booting with
  cgroup_disable=memory (e.g., kdump capture kernel) -- reported and
  tested by Mikhail Zaslonko on s390 and x86
- flatten if (folio) branches in alloc_swap_folio() and alloc_anon_folio()
  in a prep patch so the list_lru allocation additions are a clean minimal
  diff (Lorenzo)
- folio_memcg_alloc_deferred() moved out of alloc_charge_folio() into the
  anon-only collapse_huge_page() path; collapse_file() shares that helper
  but its pages don't go on the THP shrinker queue (David)
- guard folio_memcg_alloc_deferred() with order > 1; mTHPs below order-2
  can't be queued on the deferred split list (David)
- make deferred_split_lru static, hide behind folio_memcg_alloc_deferred()
  wrapper with GFP_KERNEL (Lorenzo)
- rename l -> lru throughout huge_memory.c (Lorenzo)
- kdoc for folio_memcg_list_lru_alloc() (Lorenzo)
- list_lru_lock_irq()/unlock_irq()/add_irq() irq-disabling variants;
  use list_lru_add_irq() in deferred_split_scan() (Lorenzo)
- reorder shrinker_free() before list_lru_destroy() (Lorenzo)

Changes in v3:
- dedicated lockdep_key for irqsafe deferred_split_lru.lock (syzbot)
- conditional list_lru ops in __folio_freeze_and_split_unmapped() (syzbot)
- annotate runs of inscrutable false, NULL, false function arguments (David)
- rename to folio_memcg_list_lru_alloc() (David)

Changes in v2:
- explicit rcu_read_lock() in __folio_freeze_and_split_unmapped() (Usama)
- split out list_lru prep bits (Dave)

The open-coded deferred split queue has issues. It's not NUMA-aware
(when cgroup is enabled), and it's more complicated in the callsites
interacting with it. Switching to list_lru fixes the NUMA problem and
streamlines things. It also simplifies planned shrinker work.

Patches 1-4 are cleanups and small refactors in list_lru code. They're
basically independent, but make the THP shrinker conversion easier.

Patch 5 extends the list_lru API to allow the caller to control the
locking scope. The THP shrinker has private state it needs to keep
synchronized with the LRU state.

Patch 6 extends the list_lru API with a convenience helper to do
list_lru head allocation (memcg_list_lru_alloc) when coming from a
folio. Anon THPs are instantiated in several places, and with the
folio reparenting patches pending, folio_memcg() access is now a more
delicate dance. This avoids having to replicate that dance everywhere.

Patch 7 flattens the folio allocation retry loops in alloc_swap_folio()
and alloc_anon_folio() without functional change, in preparation for
patch 8.

Patch 8 finally switches the deferred_split_queue to list_lru.

Based on mm-unstable.

 include/linux/huge_mm.h    |   7 +-
 include/linux/list_lru.h   |  68 +++++++++
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 355 ++++++++++++++-----------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   3 +
 mm/list_lru.c              | 220 ++++++++++++++++++---------
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  52 ++++---
 mm/mm_init.c               |  15 --
 11 files changed, 374 insertions(+), 376 deletions(-)




Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds