| From: |
| "Matthew Wilcox (Oracle)" <willy-AT-infradead.org> |
| To: |
| linux-mm-AT-kvack.org |
| Subject: |
| [RFC PATCH 0/7] Separate ptdesc from struct page |
| Date: |
| Mon, 20 Oct 2025 01:16:35 +0100 |
| Message-ID: |
| <20251020001652.2116669-1-willy@infradead.org> |
| Cc: |
| "Matthew Wilcox (Oracle)" <willy-AT-infradead.org>, Vishal Moola <vishal.moola-AT-gmail.com>, Johannes Weiner <hannes-AT-cmpxchg.org> |
| Archive-link: |
| Article |
With one specific configuration on x86-64 this boots and runs the fstests
testsuite until it crashes in generic/108 while trying to load a module.
Obviously this isn't fit for upstreaming yet (although the first four
or five might be worth it now). I'm sending this out to demonstrate
(a) that Progress Is Being Made towards shrinking struct page and (b)
one potential implementation of alloc_pages_memdesc().
We can build on this further; I have a patch to eliminate the
separately-allocated ptl, since there's no longer a reason to keep struct
ptdesc within the sizeof(struct page). I'm not sending it as part of
this batch to keep the patch review workload down.
While working on this, I've started to suspect that (when not pointing
to a fraction of a page), pgtable_t should point to a ptdesc and not a
struct page. That's a change that's somewhat independent of this series,
and could go before or after.
Obviously there's a certain cost and very little benefit to applying
this patch series. We probably need to do all the memdescs at once.
I'm going to move onto doing slab next (slab is particularly tricky
because there's a mutual recursion between needing to allocate a struct
slab for a struct page for a struct slab for a ...). I know how to do
it, it just needs to be written down.
There's a certain amount of debugging code mixed in here (in the
later patches). For example, we store a copy of the ptdesc pointer in
page->__folio_index, which lets me see when page->lru has overwritten
page->memdesc. For example, the next crash to track down is:
memdesc dead000000000122 index ffff888119a59420
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888119a59420 pfn:0x124cce
flags: 0x8000000000000000(zone=2)
raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff888119a59420 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(1)
so page->lru.prev is LIST_POISON, while page->__folio_index is plausibly
a pointer to a struct ptdesc. In case anybody knows off the top of
their head what's going on, it's:
RIP: 0010:collapse_large_pages.cold+0x45/0x49
Call Trace:
<TASK>
cpa_flush+0x1de/0x310
change_page_attr_set_clr+0x10e/0x160
set_memory_rox+0x46/0x50
execmem_restore_rox+0x1d/0x30
module_enable_text_rox+0x6d/0xb0
load_module+0x17de/0x22a0
init_module_from_file+0x8a/0xb0
I don't immediately see where page->lru is being used, but maybe after
I've had a good sleep, it'll come to me.
Matthew Wilcox (Oracle) (7):
mm: Use frozen pages for page tables
mm: Account pagetable memory when allocated
mm: Mark pagetable memory when allocated
pgtable: Remove uses of page->lru
x86: Call preallocate_vmalloc_pages() later
mm: Add alloc_pages_memdesc family of APIs
mm: Allocate ptdesc from slab
arch/x86/mm/init_64.c | 4 +-
include/linux/gfp.h | 13 ++++++
include/linux/mm.h | 88 ++++++++++++++++------------------------
include/linux/mm_types.h | 75 +++++++++++++---------------------
mm/internal.h | 14 +++++--
mm/memory.c | 67 ++++++++++++++++++++++++++++++
mm/mempolicy.c | 28 +++++++------
mm/mm_init.c | 1 +
mm/page_alloc.c | 12 ++++--
mm/pgtable-generic.c | 24 +++++++----
mm/vmalloc.c | 2 +
11 files changed, 198 insertions(+), 130 deletions(-)
--
2.47.2