I don't know how common that really is. For things allocated with significant frequency, I think they should be fairly cache hot at free time (because if they are allocated frequently, they shouldn't have long lifespans).
The exception are caches, that are reclaimed when memory gets low or a watermark is reached (eg. inode cache, dentry cache, etc). However, with these things, they still need to be found in order to be reclaimed, usually via an LRU list, so the object gets hot when it's found and taken off the list.
OK, you could move the refcount and the lru list into another area... but the other problem with that is that in cache objects you expect to have multiple lookups over the lifetime of the object. And if your lookups have to increment a disjoint refcount object, then you increase your cacheline footprint per lookup by effectively an entire line per object. So you trade slower lookups for faster reclaim, which could easily be a bad tradeoff if the cache is effective (which the dcache is, 99% of the time).
Posted Dec 18, 2008 21:37 UTC (Thu) by ncm (subscriber, #165)
[Link]
This is the danger of armchair coding; you're probably right.
An optimized refcount scheme may take only a few bits per object, for most objects, so I was thinking one cache line might hold refcounts for a hundred objects. Also, a dirty cache line is way more expensive than a clean one (because it must be written back, and isn't a candidate for replacement until that's done) so I meant to concentrate the refcount churn, and segregate it from the (commonly) mostly-unchanging objects. This helps most where you have some semblance of locality. Unfortunately things like inode caches don't.
As always, there's no substitute for measuring, but you have to build it first.