I wonder, though, about the choice to hand all objects back to the CPU that originally allocated them. In particular, suppose we have a large collection of objects freed by CPU A long after they have passed out of B's cache -- e.g., allocated on behalf of a process that has migrated to A. It might be objectively better for CPU B to work on their metadata while it remains in B's cache. Maybe it's better to re-map a fully freed page to an address range that B manages; if we know that CPU A won't be touching that page anyway, it may be unmapped from A's page map at leisure.
I wonder, too, about all these lists. Are the list links adjacent to the actual storage of the objects? Often when an object is freed it has long since passed out of the cache, and touching a list link would bring it back in. A way to avoid touching the actual object is for the metadata to be obtainable by address arithmetic, e.g. masking off the low bits to obtain the address of a page header. Some extra structure helps -- a segment of adjacent pages on (say) a megabyte or gigabyte boundary can share a common header and an array of page headers, so that the containing pages need not be touched, either, and the metadata concentrated in a few busy cache lines. Segment headers pingponging between CPUs would be a bad thing, but each segment might reserve a few cache lines for annotations by other CPUs.
A design appropriate for hundreds of CPUs and hundreds of GB is very different from one for a single CPU and under a GB. It's not clear that any existing SL*B is right for either case, or that any one can be for all. Plenty of room remains for improvement in all regimes, so consolidation seems way premature.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds