My guess is that the "better CPU" features that Linus refers to is stuff like the AMD K10's nested paging ("Rapid Virtualization Indexing"), and maybe also having a large enough (L2) TLB (although the latter does not just help virtualization).
Larger pages are also helpful for non-virtualized applications that perform lots of memory accesses with low spatial locality with a large-enough footprint. An extreme case would be walking through memory with a 4160 byte stride: Every step would consume a TLB entry and a cache line entry; Once you have run out of TLB entries (on AMD K10: 48 L1 TLB entries, 512 L2 TLB entries, 1024 L1 cache lines, 8192 L2 cache lines), you can start over, and you will have a workload that hits the cache and misses the TLB all the time.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds