kmalloc

Posted Apr 11, 2025 19:45 UTC (Fri) by jedix (subscriber, #116933)
In reply to: kmalloc by gmprice
Parent article: Management of volatile CXL devices

Right now, the biggest bottleneck is getting things into memory. Superscalar has shifted what is expensive, time wise. Your benchmark shows that dram is 50% faster than cxl. This will cause more CPU stalls.

Trying to 'not miss the boat' is like predicting the future - you don't know what's going to be needed in memory - otherwise we'd not need CLX at all. If you could solve the issue of knowing what is needed to be fast vs slow, then you could just page the slow stuff out and move on. So, I think we either make the (z)swap approach work or spend the money on dram and not developer effort.

Look at your numbers, it's a solid win against disk. I don't think that's your point, but the tail is a _lot_ better. And that is using the swap code as it is today.

I mean, reusing your old ram is still stupid because you will spend the money you saved on dram on an army of people walking the data center floor replacing dead dimms (and probably killing some servers by accident while they go).

But let's say you are making new shiny tech from 2007 (ddr3) or 2014 (ddr4) to use in your 2025 servers, then you can have faster zswap. If you don't feel old enough yet (I do..): DDR3 is old enough to have voted in the last election. Although DDR4 is newer, it came out 11 years ago, the same year as Shake It Off topped the charts, so.. not yesterday (but, haters gonna hate).

The thing that is really starting to bother me is the amount of time we are spending trying to solve a self-made issue and burning good developers on it.

kmalloc

Posted Apr 11, 2025 19:58 UTC (Fri) by gmprice (subscriber, #167884) [Link]

> And that is using the swap code as it is today.

It's actually not. The pages are pages. The tail latency is avoided precisely because the page is still present and mapped, it's just been demoted to CXL instead of removed from the page tables.

So it's "zswap-ish", because reclaim migrates the page somewhere else - but in this case that somewhere else is another plain old page on another node.

If you enable zswap on top of this and give it the same size as your CXL tier, you'll end up with CXL consumed like zswap as you suggested - but the tail latencies will be higher (maybe not multi-millisecond, but i haven't tested this yet).