kmalloc

Posted Apr 11, 2025 3:20 UTC (Fri) by willy (subscriber, #9762)
In reply to: kmalloc by gmprice
Parent article: Management of volatile CXL devices

Remember that I've been through this stupidity once before already with 3dxp. You're not saying anything new.

And yes, the notion absolutely was that "enlightening" the interpreter / runtime for managed languages was a viable approach. I mean ... maybe? But we're back to the question of "why would I deliberately choose to use slow memory".

What I do see as a realistic workload is "This is a cheap VM that doesn't get to use DRAM". So that's a question of enlightening the hypervisor to give CXL memory to the guest.

HBM is a distraction. That's never been available for CPUs in large quantities; it's always been used as L4 cache. I see it's available in real quantities for GPUs now, but I'm not seeing it on any CPU roadmap.

And, yes, I do want the technology to die. It's the same mistake as ATM and InfiniBand.

kmalloc

Posted Apr 11, 2025 6:37 UTC (Fri) by gmprice (subscriber, #167884) [Link] (1 responses)

Framing the discussion as "why would i deliberately use slower memory" is quite disingenuous considering zswap, zram, and swap all exist.

The answer is pretty clearly - to avoid having to use all of those things.

There is maybe a good argument to formalize a zswap backend allocator that can consume dax-memory, and hide that from the page allocator - but there is still clear value in just exposing it as a page. It's literally DDR behind a controller, not some pseudo-nand or page-fetch interconnect.

kmalloc

Posted Apr 11, 2025 18:29 UTC (Fri) by willy (subscriber, #9762) [Link]

> It's literally DDR behind a controller, not some pseudo-nand or page-fetch interconnect.

That's actually worse than the 3dxp business case. Because in the future where 3dxp had worked, the argument was that it came in VAST quantities. If I remember correctly, the argument at the time was for a 2 socket machine with 128GB of DRAM, 8TB of 3dxp and some amount of NVMe storage.

So you might reasonably want to say "Hey, give me 4TB of slow memory" because you'd designed your algorithm to be tolerant of that latency.

And then 3dxp also had the persistence argument for it; maybe there really was a use-case for storage-presented-as-memory. Again, CXL as envisaged by you doesn't provide that either. It's just DRAM behind a slow interconnect.