I think it's a pretty neat/clean design. Though it probably has more overhead on very large number of CPUs system than SLUB. But it's a long time ago that I read about SLUB internals to be really sure.
The big question is:
Why not allocate from 'rlist' when you're done with your own freelist? We don't have to update the metadata so it's a very cheap operation.
Pherhaps it would even be better to put items on 'rlist' to also be put on the 'freelist', so we simply allocate the least recently used item first (probably cache-hot).
Sure the remote item might be bounced back to the other CPU, but clearly the code using it doesn't seem to mind which CPU last used it and with the LRU logic it's just as likely still in our CPU cache. And if it did bounce back in te meanwhile it means we are probably dealing with a slab cache that's not that hard used (or it's usage should be optimized and this would be true for all slab allocaters).
And without having looked at the code I'd assume that only one CPU at a time is cleaning up it's 'rlist' and that usage of the 'remote_free' lock is only a "best effort" locking scheme (try_lock()-ish) because maintenance should really be a background task with minimal overhead. Though this might have problems I overlook (or is already done that way).
Just some thoughts I had reading this text, nothing really thoroughly thought through though.