"and after you take into account price/performance, most hardware crypto [accelerators] have marginal performance benefits; in fact, more often than not, it's a lose."
I'd mostly agree with that, personally - but there's a very big "except".
The Via C3, Via C7, and newer Intel Xeon processors have hardware crypto acceleration on-CPU for a variety of algorithms. My 400MHz C3 thin clients at work dramatically outperform the 2.2GHz Intel Core 2 Duo machines on many crypto tasks.
Personally I don't care about isolating the keys from the apps using them for my use cases - if I did, I'd be using the TPM, or smart cards. I can see how it'd potentially be useful, but I'm not convinced that keeping the keys in kernel memory is much better than keeping them in a separate user-space process.
(For that matter, even dedicated key-isolation crypto hardware like smartcards have proven vulnerable to power- and timing- attacks. No key isolation is perfect - the question is whether moving it into the kernel is better _enough_ to justify the time/effort/complexity cost).