the problem is that by that logic, just about everything should be in the kernel.
I really believe that this is premature optimization and micro-benchmark optimization.
to restate an example I've posted before, several years ago at $work, the engineers ran a benchmark that showed a 60x performance improvement (from 30 seconds to .5 seconds) for a read_config() function by creating a shared memory segment and populating that rather than reading from disk.
the problem is that this benchmark was for 100,000 reads of the config file, and the real-world usage showed that this function was executed 50,000 times in an hour across >50 machines. This 6-month engineering project saved a total of 15 cpu seconds in a peak hour across 200 cores.
sometimes "it's faster", even for high multipliers of how much faster just don't matter because the function just isn't called frequently enough.
I have some firewalls that run 'inefficient' fork-for-every-connection proxies (horribly inefficient, right), but on a peak day these systems (on low-end hardware) still don't hit a loadaverage of 0.2, so the inefficiency really doesn't matter.
application logging just isn't that heavy, even on a 'slow' cell phone processor.