|
|
Log in / Subscribe / Register

Separate ptdesc from struct page

From:  "Matthew Wilcox (Oracle)" <willy-AT-infradead.org>
To:  linux-mm-AT-kvack.org
Subject:  [RFC PATCH 0/7] Separate ptdesc from struct page
Date:  Mon, 20 Oct 2025 01:16:35 +0100
Message-ID:  <20251020001652.2116669-1-willy@infradead.org>
Cc:  "Matthew Wilcox (Oracle)" <willy-AT-infradead.org>, Vishal Moola <vishal.moola-AT-gmail.com>, Johannes Weiner <hannes-AT-cmpxchg.org>
Archive-link:  Article

With one specific configuration on x86-64 this boots and runs the fstests
testsuite until it crashes in generic/108 while trying to load a module.
Obviously this isn't fit for upstreaming yet (although the first four
or five might be worth it now).  I'm sending this out to demonstrate
(a) that Progress Is Being Made towards shrinking struct page and (b)
one potential implementation of alloc_pages_memdesc().

We can build on this further; I have a patch to eliminate the
separately-allocated ptl, since there's no longer a reason to keep struct
ptdesc within the sizeof(struct page).  I'm not sending it as part of
this batch to keep the patch review workload down.

While working on this, I've started to suspect that (when not pointing
to a fraction of a page), pgtable_t should point to a ptdesc and not a
struct page.  That's a change that's somewhat independent of this series,
and could go before or after.

Obviously there's a certain cost and very little benefit to applying
this patch series.  We probably need to do all the memdescs at once.
I'm going to move onto doing slab next (slab is particularly tricky
because there's a mutual recursion between needing to allocate a struct
slab for a struct page for a struct slab for a ...).  I know how to do
it, it just needs to be written down.

There's a certain amount of debugging code mixed in here (in the
later patches).  For example, we store a copy of the ptdesc pointer in
page->__folio_index, which lets me see when page->lru has overwritten
page->memdesc.  For example, the next crash to track down is:

memdesc dead000000000122 index ffff888119a59420
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888119a59420 pfn:0x124cce
flags: 0x8000000000000000(zone=2)
raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff888119a59420 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(1)

so page->lru.prev is LIST_POISON, while page->__folio_index is plausibly
a pointer to a struct ptdesc.  In case anybody knows off the top of
their head what's going on, it's:

RIP: 0010:collapse_large_pages.cold+0x45/0x49
Call Trace:
 <TASK>
 cpa_flush+0x1de/0x310
 change_page_attr_set_clr+0x10e/0x160
 set_memory_rox+0x46/0x50
 execmem_restore_rox+0x1d/0x30
 module_enable_text_rox+0x6d/0xb0
 load_module+0x17de/0x22a0
 init_module_from_file+0x8a/0xb0

I don't immediately see where page->lru is being used, but maybe after
I've had a good sleep, it'll come to me.

Matthew Wilcox (Oracle) (7):
  mm: Use frozen pages for page tables
  mm: Account pagetable memory when allocated
  mm: Mark pagetable memory when allocated
  pgtable: Remove uses of page->lru
  x86: Call preallocate_vmalloc_pages() later
  mm: Add alloc_pages_memdesc family of APIs
  mm: Allocate ptdesc from slab

 arch/x86/mm/init_64.c    |  4 +-
 include/linux/gfp.h      | 13 ++++++
 include/linux/mm.h       | 88 ++++++++++++++++------------------------
 include/linux/mm_types.h | 75 +++++++++++++---------------------
 mm/internal.h            | 14 +++++--
 mm/memory.c              | 67 ++++++++++++++++++++++++++++++
 mm/mempolicy.c           | 28 +++++++------
 mm/mm_init.c             |  1 +
 mm/page_alloc.c          | 12 ++++--
 mm/pgtable-generic.c     | 24 +++++++----
 mm/vmalloc.c             |  2 +
 11 files changed, 198 insertions(+), 130 deletions(-)

-- 
2.47.2



Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds