|
|
Log in / Subscribe / Register

Pulling slabs out of struct page

Pulling slabs out of struct page

Posted Oct 8, 2021 21:55 UTC (Fri) by willy (subscriber, #9762)
Parent article: Pulling slabs out of struct page

I have some further thoughts indicating where I'm going at https://kernelnewbies.org/MemoryTypes

The dynamically allocated struct folio/slab/pgtable is where Kent Overstreet and Johannes Weiner want to go. It's more work, with a bigger payoff. We can collaborate on the steps along the way, since so much of the way is shared.


to post comments

Pulling slabs out of struct page

Posted Oct 9, 2021 15:40 UTC (Sat) by luto (subscriber, #39314) [Link] (10 responses)

At the risk of asking a horrible question: do we really need the ability to start with a _page_ (PFN mapped to userspace, for example) and find type information?

I don’t think we really need this. We already support, in a very limited way, non-struct-page user mappings. For lightweight operations on user memory, we can use the uaccess functions, and they inherently lock correctly against unmapping. For heavyweight operations, we can look up the VMA. This leaves things that don’t want to pay the full price of finding a VMA. Whether those really exist isn’t quite clear to me on a conceptual level, but there is certainly a lot of code that calls get_user_pages [0] and expects the result to be live until release. (And some FS code may want to do useful IO.)

I wonder if performance could be acceptable if GUP walked the VMA tree to find a refcountable object. Some interesting locking would be needed to compete with get_user_pages_fast.

[0] This is all kinds of messy. KVM does unspeakable and blatantly incorrect to host user memory. Even the normal pattern of GUPping a page interacts in unfortunate ways with COW.

Pulling slabs out of struct page

Posted Oct 9, 2021 15:46 UTC (Sat) by willy (subscriber, #9762) [Link] (7 responses)

> At the risk of asking a horrible question: do we really need the ability to start with a _page_ (PFN mapped to userspace, for example) and find type information?

Yes. Some of the places we need this:

- GUP gets back a page and then calls set_page_dirty(). That needs to figure out whether this is a file/anon/ksm/netpool/DEVICE/... page and call the filesystem if required.

- compaction walks the memmap and needs to figure out what this memory is and whether it can be relocated.

- memory failure gets a physical address and needs to understand how to handle it

There are more, but these should illustrate some of the problems we have to solve.

Pulling slabs out of struct page

Posted Oct 9, 2021 16:08 UTC (Sat) by luto (subscriber, #39314) [Link] (6 responses)

> - GUP gets back a page and then calls set_page_dirty(). That needs to figure out whether this is a file/anon/ksm/netpool/DEVICE/... page and call the filesystem if required.

Is this done directly in GUP? If so, surely it could work like the fault code and look up the VMA.

> - compaction walks the memmap and needs to figure out what this memory is and whether it can be relocated.

Hmm, this one is legit.

> - memory failure gets a physical address and needs to understand how to handle it

In my dream world, the low-level memory failure / machine check code gets a virtual address and can look up a VMA or vmap area. Making this work with kmap might be interesting.

> There are more, but these should illustrate some of the problems we have to solve.

I wonder if it's possible to reduce the dependency on struct page or equivalent to the point that everything works without it except for some nice-to-have features like compaction. (I'm not saying that the colossal amount of effort involved is worthwhile.)

Pulling slabs out of struct page

Posted Oct 9, 2021 16:59 UTC (Sat) by willy (subscriber, #9762) [Link] (1 responses)

I'm really just trying to avoid the bugs we have where people look at page->mapping and the compiler can't say "this is a tail page, that doesn't do what you think it does". Everybody keeps trying to get me to solve their problems as well.

Please, just let me solve a problem, not rewrite the entire kernel.

Pulling slabs out of struct page

Posted Oct 9, 2021 17:06 UTC (Sat) by luto (subscriber, #39314) [Link]

I don't want you to rewrite the whole kernel! I'm just contemplating how it _could_ be rewritten if someone were inclined to do so.

(Also, I do care about the KVM mess, and I don't think KVM could have dug itself into quite the hole its in if there hadn't been a struct page to begin with for most user mappings, but fixing that needs a rewrite and a time machine.)

Pulling slabs out of struct page

Posted Oct 10, 2021 14:39 UTC (Sun) by willy (subscriber, #9762) [Link] (3 responses)

> In my dream world, the low-level memory failure / machine check code gets a virtual address and can look up a VMA or vmap area. Making this work with kmap might be interesting.

I don't think your dream world is possible. It's the same problem the page cache has with errors on writeback -- the producer might not be around any more. We might have unmapped the vmap/kmap; the user process that dirtied the cache line might have exited, or just been switched away from.

But more importantly, unless the cache is writethrough, the CPU no longer knows which virtual address(es) were used to dirty the cache line.

Pulling slabs out of struct page

Posted Oct 10, 2021 14:53 UTC (Sun) by luto (subscriber, #39314) [Link] (2 responses)

As I understand it, on Intel chips that support memory failure recovery, failed writes may not be notified at all. (I’ve at least been told this is true for the TDX style machine checks.)

And Linux’s entry code makes quite weak guarantees about recoverability of machine checks: we make a best (and pretty good) effort to recover from a fault in user code, and we try to recover from kernel code with exception table entries. If normal kernel code without an exception table entry hits a memory failure entry, forget about struct page: we may be 100% dead regardless because we have no idea how to resume execution.

If we hit a machine check with an exception handler, then we know the program counter, and we have a full register file. Figuring out the failed virtual address isn’t much of a problem even if the hardware doesn’t help.

Pulling slabs out of struct page

Posted Oct 10, 2021 14:57 UTC (Sun) by willy (subscriber, #9762) [Link] (1 responses)

Having the full register file doesn't matter if the store that dirtied the cache line was 10ms ago. I can't imagine how any CPU vendor would keep the register state around until the cache line moves from L3 to DRAM

Pulling slabs out of struct page

Posted Oct 10, 2021 15:32 UTC (Sun) by luto (subscriber, #39314) [Link]

You’re assuming that the CPU will notify the OS at all when a store from L3 to DRAM fails and that the OS actually needs to do anything about it. I don’t know all the nasty details, but it may be possible (and even mandatory?) to mark the memory bad when writeback fails and deliver a fault on a subsequent read.

Pulling slabs out of struct page

Posted Oct 9, 2021 15:55 UTC (Sat) by willy (subscriber, #9762) [Link] (1 responses)

Oh, since you mentioned unspeakable things, the graphics stack does horrendous hacks, so a VMA no longer tells you anything about the page you found. It might be anon, or it might be a page that belongs to a graphics device. And now they want to do that to file mappings too.

Pulling slabs out of struct page

Posted Oct 10, 2021 11:39 UTC (Sun) by pbonzini (subscriber, #60935) [Link]

I won't deny that KVM does the unspeakable, but I think the idioms are more or less common to all users of MMU notifiers.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds