Uses for CXL memory

Posted Apr 15, 2025 21:52 UTC (Tue) by marcH (subscriber, #57642)
In reply to: Uses for CXL memory by farnz
Parent article: Management of volatile CXL devices

> The hyperscalers love the "pool of memory that can be attached to any server in the rack" application; that's the one use case for CXL DRAM that I think is most likely to get somewhere, since...

Thanks, then the next questions are: how far from this is software?And: is this use case big enough to sustain the whole CXL ecosystem? At least for a while. Cause Small Fish Eat Big Fish etc.

"Killer App" or just "App"? And dies young like most technologies.

> but CXL survives because it allows CPUs and GPUs/NPUs to be connected such that each device's local DRAM is visible cache-coherently to the other, making it simpler for software to get peak performance from CXL attached devices.

Mmmm... I'm afraid there could be "too much" software there! GPUs are already talking to each using NVLink or whatever and the relevant software frameworks already know how to manage communications without hardware provided coherence. So, what will coherence bring to the table? Potentially better performance? Not easy to demonstrate when the raw link rate is much higher in the first place...

There's a saying that goes like "if you have too many excuses, it's probably because none of them is good". There are many potential use cases for CXL that make sense in theory. But AFAIK none you can just get and leverage in production yet. We'll see.

Uses for CXL memory

Posted Apr 16, 2025 10:31 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

CXL is basically a less proprietary variant on what NVLink offers for GPU→GPU comms, and thus supports more device types (like NICs, SSDs, and memory). If CXL is dead on arrival, NVLink should also have been dead on arrival.

Instead, I expect that CXL will gradually replace PCIe as the interface of choice for GPUs, higher speed NICs, SSDs etc, since it's backwards-compatible with PCIe (so you're not cutting off part of your market by putting CXL on your device instead of PCIe), but is a net improvement if the rest of the system supports CXL. And as it's mostly the same as PCIe, it's not a significant extra cost to support CXL as well as PCIe.

And CXL memory support as needed for the hyperscaler application is there already today; this is not a case of "write software to make it happen", this is a case of "if we don't improve software, then this application is less efficient than it might be", since from the host OS's point of view, CXL memory might as well be IMC-attached DDR, just with higher latency than the IMC-attached DRAM. There's wins if software can make use of the fact that 64 GiB of RAM has lower latency than the other 192 GiB, 448 GiB or 960 GiB of RAM, but you can meet the requirement with CXL-unaware software today. In this respect, it's like NUMA; there's wins on offer if you are NUMA-aware, but you still run just fine if you're not.

In particular, you can support CXL memory by rebooting to add or remove it - it's a quicker version of "turn the machine off, plug in more DIMMs, turn the machine on", since you're instead doing "boot to management firmware, claim/release a CXL chunk, boot to OS". It'd be nicer if you can do that without a reboot (by hotplugging CXL memory), but that's a nice-to-have, not a needed to make this product viable.

Uses for CXL memory

Posted Apr 16, 2025 19:29 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

> CXL is basically a less proprietary variant on what NVLink offers for GPU→GPU comms,

I don't know NVLink but it does not seem to offer hardware coherence. Does it?

> If CXL is dead on arrival, NVLink should also have been dead on arrival.

There many intermediate possibilities between "dead on arrival" and "commercial success", notably: losing to the competition, "interesting idea but no thanks", "Embrace, Extend and Extinguish", etc.

> since it's backwards-compatible with PCIe

That's a big advantage, yes.

> it's not a significant extra cost to support CXL as well as PCIe.

I think it really depends what you're looking at. From a pure hardware, CPU development perspective, you could argue most of the development work is done but is it really? You know for sure only when entering actual production and I'm not aware of that yet. Moreover, "developers" tend to ignore everything outside development, notably testing and on-going validation costs.

From a hardware _device_ perspective I'm not so sure. I guess "it depends". CXL smart NICs anyone? A lot of that stuff is obviously confidential. If CXL devices are not commercially successful, CXL support on the CPU side will "bitrot" and could die.

From a software cost perspective, this looks very far from "done" https://docs.kernel.org/driver-api/cxl/maturity-map.html

> And CXL memory support as needed for the hyperscaler application is there already today;

Is it really? Genuine question, I really don't know enough but what I see and read here and there does not give a lot of confidence. I understand there are many different use cases and this seems like the simplest one.

> In this respect, it's like NUMA; there's wins on offer if you are NUMA-aware, but you still run just fine if you're not.

Good!

Uses for CXL memory

Posted Apr 17, 2025 8:52 UTC (Thu) by farnz (subscriber, #17727) [Link]

NVLink is a brand name for multiple different (and incompatible) things. Some variants on NVLink do support cache coherency between GPUs, some don't (it depends on the generation of GPU you're using it with); the current generation does, in part because "AI" workloads need so much GPU memory that Nvidia is using NVLink to support attaching a large chunk of slightly higher latency RAM to a processing board.

And yes, CXL is basically done and ready to use if you're happy using it as "just" cache-coherent PCIe (which is what the AI accelerator world wants from it). The software stuff you've linked there is the stuff you need to do if you want to do more than cache-coherent PCIe - online reallocation of memory ownership, standardised EDAC (rather than per-board EDAC like in PCIe), multi-host support (rather than single-host), and so on. A lot of this is stuff that exists on an ad-hoc basis in various GPUs, NICs and SSDs already; the difference CXL makes is that instead of doing it differently in each driver, you're doing it in the CXL subsystem.

The specific thing that works today is booting systems with a mix of CXL and IMC memory, and rebooting to change the CXL memory configuration. That's enough for the basic hyperscaler application of "memory pool in a rack"; everything else is enhancements to make it better (e.g. being able to assign CXL memory at runtime, having shared CXL memory between two hosts in a rack and more).