See my above comment -- effectively we do allocate from rlist if it is from the same node. Actually, what really happens is that objects from the same node but different CPU are freed straight onto our freelist rather than our rlist -- they only get sent to rlist when our freelist is trimmed. So it's exactly as you suggest.
The issue of cleaning up rlist is interesting. There are so many ways this can be done and it is about the most difficult part of a slab allocator... No, any CPU can be cleaning its rlist at any time, and yes they might all point to a single remote CPU. That's quite unlikely and the critical section is very short, so hopefully it won't be a problem. But I don't claim to know what the best way to do it is.
Very large number of CPUs I am definitely interested in... so I'm hoping to be as good or better than the other allocators here, but we'll see.