I think you're missing the point here.
L1 cache is, as I understand it, about as fast as chip registers. So that 16 KB cache has
grown faster and faster over the years.
What is L2 and L3 cache now is *huge* and is likely as fast if not faster than that 1993
Pentium's 16 KB cache.
And all chip cache is hugely faster than system RAM. Look at benchmarks that show the
optimization of certain algorithms. When they use cache-optimized blocks instead of streaming
over the entire data set, performance improves a hundred fold in some cases.