It's appalling that the kernel still uses RB trees in time-critical spots. We were taught in school that (as Risible Hans used to repeat) "Trees are fast!". But on modern hardware, trees have really terrible performance characteristics. They foul caches like nothing else. A re-balancing operation dirties an unconscionable number of cache lines. (For a quick, painless education, google up the explanatory material around Judy trees.) Database designers ended up with what are we used to call "modified B* trees" for reasons that uncannily match our present working-set problems: L1 cache is a lot like core, and RAM behind an oversubscribed bus is a lot like drum, if you squint.
The lesson for us is that to improve lookup performance in the kernel, the first step is to abstract away artifacts that imply a tree: in particular, that it is lousy with pointers. Then, we can drop in different, less binary-tree-ish ordered-map implementations without need to savage all of its user subsystems each time it's improved. Eventually we might settle on a page-aware (or page-oblivious) structure that can be fast on real silicon.
After that, we can begin to relieve abstraction overhead, one user subsystem at a time. Since it's not unusual to spend 60% of map lookup time in pipeline-trashing function-pointer dispatch overhead, getting free of function pointers will be an important step.
Some subsystems might do best with a Stone Age packed-heap structure, instead.