|
|
Subscribe / Log in / New account

CXL 2: Pooling, sharing, and I/O-memory resources

CXL 2: Pooling, sharing, and I/O-memory resources

Posted May 19, 2022 18:00 UTC (Thu) by MattBBaker (guest, #28651)
Parent article: CXL 2: Pooling, sharing, and I/O-memory resources

I think a lot of people are confused about how CXL memory is going to work. Trying to munge it in memory hotplug is just going to be hopelessly broken. The problem is that the receiving system does not actually own that memory. The memory appliance does. Suppose that you have a CXL memory device with gobs of memory. The compute machines that are attached will request memory and give it back. But what happens if $VENDOR driver pukes over kernel memory and causes a kernel panic. The exciting data that gets written back to the appliance doesn't matter so much as what the device does in response to the fact that the machine that is bleeding out has some of its memory. A sane spec will have a mechanism for the appliance to say, "You are giving up this memory NOW!"

So if you want the kernel to manage this memory there will have to be two separate concepts, the memory the device owns and cannot be taken away, and memory that it does not own and subject to being ripped away. I'm half of the mind that a better model for this is to expose remote CXL devices in /dev/ and then mmap() that like a normal 'file'.


to post comments

CXL 2: Pooling, sharing, and I/O-memory resources

Posted May 19, 2022 22:35 UTC (Thu) by ejr (subscriber, #51652) [Link]

If memory can disappear, it doesn't remember much.

This all needs specified in terms of failure modes. It may be so specified; I haven't kept up.

CXL 2: Pooling, sharing, and I/O-memory resources

Posted Jun 5, 2022 8:05 UTC (Sun) by njs (subscriber, #40338) [Link]

The kernel has had at least some support for memory disappearing underneath it for a long time -- ECC RAM can say "whoops, this RAM suddenly doesn't work anymore, sorry!" and the kernel will try to recover gracefully. (E.g. it might kill the process that owns that memory but the rest of the system will keep going.)

I have no idea how well or poorly this will extend to CXL failure modes, but it's at least precedented.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds