|
|
Log in / Subscribe / Register

Pulling slabs out of struct page

Pulling slabs out of struct page

Posted Oct 10, 2021 14:39 UTC (Sun) by willy (subscriber, #9762)
In reply to: Pulling slabs out of struct page by luto
Parent article: Pulling slabs out of struct page

> In my dream world, the low-level memory failure / machine check code gets a virtual address and can look up a VMA or vmap area. Making this work with kmap might be interesting.

I don't think your dream world is possible. It's the same problem the page cache has with errors on writeback -- the producer might not be around any more. We might have unmapped the vmap/kmap; the user process that dirtied the cache line might have exited, or just been switched away from.

But more importantly, unless the cache is writethrough, the CPU no longer knows which virtual address(es) were used to dirty the cache line.


to post comments

Pulling slabs out of struct page

Posted Oct 10, 2021 14:53 UTC (Sun) by luto (subscriber, #39314) [Link] (2 responses)

As I understand it, on Intel chips that support memory failure recovery, failed writes may not be notified at all. (I’ve at least been told this is true for the TDX style machine checks.)

And Linux’s entry code makes quite weak guarantees about recoverability of machine checks: we make a best (and pretty good) effort to recover from a fault in user code, and we try to recover from kernel code with exception table entries. If normal kernel code without an exception table entry hits a memory failure entry, forget about struct page: we may be 100% dead regardless because we have no idea how to resume execution.

If we hit a machine check with an exception handler, then we know the program counter, and we have a full register file. Figuring out the failed virtual address isn’t much of a problem even if the hardware doesn’t help.

Pulling slabs out of struct page

Posted Oct 10, 2021 14:57 UTC (Sun) by willy (subscriber, #9762) [Link] (1 responses)

Having the full register file doesn't matter if the store that dirtied the cache line was 10ms ago. I can't imagine how any CPU vendor would keep the register state around until the cache line moves from L3 to DRAM

Pulling slabs out of struct page

Posted Oct 10, 2021 15:32 UTC (Sun) by luto (subscriber, #39314) [Link]

You’re assuming that the CPU will notify the OS at all when a store from L3 to DRAM fails and that the OS actually needs to do anything about it. I don’t know all the nasty details, but it may be possible (and even mandatory?) to mark the memory bad when writeback fails and deliver a fault on a subsequent read.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds