CXL 1: Management and tiering

Posted May 14, 2022 16:01 UTC (Sat) by Paf (subscriber, #91811)
In reply to: CXL 1: Management and tiering by MattBBaker
Parent article: CXL 1: Management and tiering

As someone who’s worked in HPC and watched the shared memory machines be displaced by true clustered systems, despite the intense and remarkable engineering that went in to keeping those shared memory machines coherent at scale … yeah.

So this is the thing we’re doing again this week.

That doesn’t mean it’s not worth it - those machines made a lot of sense for a while and shifting trends may make that true again - but the costs of coherency across links like these is *intense*. I wonder where and how much use this will see, what cases it will be fast enough for, etc.

CXL 1: Management and tiering

Posted May 14, 2022 20:16 UTC (Sat) by willy (subscriber, #9762) [Link] (2 responses)

I don't think we're going to see 2048-node clusters built on top of CXL. The physics just doesn't support it.

The use cases I'm seeing are:

- Memory-only devices. Sharing (and cache coherency) is handled by the CPUs that access them. Basically CXL as a replacement for the DDR bus.

- GPU/similar devices. They can access memory coherently, but if you have any kind of contention between the CPU and the GPU, performance will tank. Programs are generally written to operate in phases of GPU-only and CPU-only access, but want migration handled for them.

Maybe there are other uses, but there's no getting around physics.

CXL 1: Management and tiering

Posted May 15, 2022 19:46 UTC (Sun) by Paf (subscriber, #91811) [Link] (1 responses)

This makes more sense - it’s mostly a way to make the clean cases easier and well supported by hardware.

Looking it up, I’m seeing a lot of stuff about disaggregated systems which just seems crazy. But marketing doesn’t have to match reality of intent for the main implementers.

CXL 1: Management and tiering

Posted May 15, 2022 21:58 UTC (Sun) by willy (subscriber, #9762) [Link]

Oh yes, the CXL boosters have a future where everything becomes magically cheap. I don't believe that world will come to pass. I think the future of HPC will remain as one-two socket CPU boxes with one-two GPUs, much more closely connected over CXL, but the scale-out interconnect isn't going away, and I doubt the scale-out interconnect will be CXL. Maybe it will; I've been wrong before.

I have no faith in disaggregated systems. You want your memory close to your [CG]PU. If you upgrade your [CG]PU, you normally want to upgrade your interconnect and memory at the same time. The only way this makes sense is if we've actually hit a limit in bandwidth and latency, and that doesn't seem to have happened yet, despite the sluggish adoption of PCIe4.

The people who claim "oh you want a big pool of memory on the fabric behind a switch connected to lots of front end systems" have simply not thought about the reality of firmware updates on the switch or the memory controller. How do you schedule downtime for the N customers using the [CG]PUs? Tuesday Mornings 7-9am Are Network Maintenance Windows just aren't a thing any more.