Maybe it's a mistake to store the refcounts with the objects. Very often an object is allocated and initialized, and then never modified again, except for the refcount, and perhaps not even looked at again, until freed when the owning process dies. If the allocator provided centralized storage for refcounts, they could be yanked out of the objects, reducing cache churn. Centralizing refcounting could have other benefits, such as allowing optimizations for small counts and for small fluctuations in counts to be abstracted.
Integrating refcounting would change the SLAB interface, but generic refcounting services could be added to all the allocators, providing another way for them to differentiate themselves.