There are indeed a number of high-performance garbage collectors, even real-time garbage collectors. However, even these garbage collectors impose significant overhead and slowdown -- the overhead/slowdown is not eliminated, but rather spread out in bite-sized chunks over time. Alternatively, high-priority tasks are permitted to preempt the garbage collector, which has its own set of advantages and disadvantages. And the key point is that KMAs can be constructed without requiring garbage collection in the common case.
Less-common cases where KMA garbage collection can be helpful include: (1) when flat out of memory, reclaiming blocks from the per-CPU caches lets the system creep along a bit farther, perhaps giving OOM a chance to do its job, (2) when shipping blocks from one NUMA node to another upon kfree(), one might use a timeout to avoid leaving blocks pending for KMAs that care about segregating memory on a NUMA basis (and DYNIX/ptx needed to care, given the huge remote latencies of the NUMA-Q hardware), (3) where real-time response is required, high-priority processes might just dump memory onto a per-CPU list and let low-priority processes clean up after them, and (4) if the Linux kernel ever supported real garbage collection. I don't expect this latter any time soon, especially given that a recent attempt to write a kernel in a garbage-collected language encountered severe GC-related overhead (http://web.cecs.pdx.edu/~mpj/pubs/plos07.html). :-)