kmalloc

Posted Apr 12, 2025 2:59 UTC (Sat) by hnaz (subscriber, #67104)
In reply to: kmalloc by willy
Parent article: Management of volatile CXL devices

> What economic driver will cause consumers to want to buy slower computers?

I would call it the "warm set problem". Not all data that a workloads references needs the same access speeds, and there is more and more data between the hottest and the coldest bits for which storage is too slow, but first-class RAM is getting a bit too overkill and too expensive in terms of capex and power. Compression is a pretty great solution for this, which is why it's used by pretty much every phone, laptop and tablet currently sold, and widely used by hyper-scalers. It's kind of insane how far you can stretch a few gigabytes of RAM with compression, for workloads that would otherwise be dog slow or routinely OOM with nothing in between RAM and storage.

But compression is still using first-class DRAM, specced, clocked and powered for serving the hottest pages to the higher-level caches.

And folks who are worried about CXL access latencies likely won't be excited about the cycles spent on faults and decompression.

I'm not a fan of dumping the placement problem on the OS/userspace. This may kill it before any sort of market for second tier dimms can establish itself. And I'm saying all this despite having been critical of CXL adoption at my company based on the (current) implications on the software stack, the longevity of that design, and the uncertainty around sourcing hardware for that role long-term.

But I can definitely see how it's very attractive to provision less of the high-performance memory for that part of the data that doesn't really need it. Or would only need it if the alternative is compression or storage, which would waste or strand much more CPU. That's just a memory/cache hierarchy thing: the shittier level L is, the more you need of level L-1 to keep the CPU busy.

So I don't quite understand the argument to restrict it to certain types of memory, like zswap. If the latency is good enough for access AND decompression, why wouldn't it be good enough for access? Somewhat slower page cache seems better than the, what, thousand-fold cost of a miss. It's not just about the performance gap to first-class DRAM, but also about the gap to the next best thing.