The biggest problem on the small side is transparent usage of large TLBs, the idea being something akin to Andrea's CONFIG_PAGE_SHIFT but relative to the TLB size whilst maintaining normal PAGE_SIZE'ed PTEs. One thing that was tossed about at kernel summit was the idea of having the VM provide base page and range hints for contiguous page frames which could be optimized for in the TLB miss handler for software-loaded TLBs (many embedded systems, where TLBs are very small, for example). Namely, for some extra performance hit in the architecture-specific hot path we have the ability to cut off linear page faults directly, rather than speculatively (this is an important distinction between this approach and the rice superpages as well as the approaches used by HP-UX and IRIX).
The other issue is that the d-cache does grow, and the TLB doesn't always scale accordingly. For heavy shared library and multi-threading apps, folks love to toss on copious amounts of slower cache, to the point where there's insufficient TLB coverage to make it out of cache, and thus, thrashing ensues when small pages are used. On ia64 the answer to this is always to bump up PAGE_SIZE, where 64kB tends to be a requirement to make it out of cache (and these are _huge_ TLBs!). On embedded where the TLBs are orders of magnitude smaller and consistently under pressure, bumping up the page size is simply not an option. We don't want a large page size, we want a large TLB entry size that can span multiple pages in order to reduce the amount of application time we waste on linear faulting.
I brought this up at kernel summit, and Linus supported the idea of VM hinting for page ranges, so it will be interesting to see where this work goes. Not only will such things tie in with Christoph's work, it also operates under the assumption that we're not fragmented out of the box, too. Thus, there's also a dependence on Mel's work, especially if one is to consider ways to passively provide hints during page reclaim or so.
It is worth differentiating between large pages and large TLBs. Large pages on embedded outside of application specific use (ie, hugetlbfs) are generally undesirable. The general embedded case is usually reasonably large memory apertures (relative to TLB and PAGE_SIZE), especially in peripheral space. Then a combination of many small files and some very big ones. The places where we have explicit control over the TLB size (ie, ioremap()) are already handled by the architectures that care, so in terms of transparency, it's simply anonymous and file-backed pages where VM hinting is helpful. Background scanning is mentioned from time to time, but is unrealistic for these applications since the system is usually doing run-time power management, also.
The picture today is certainly much less bleak than it was even just a year ago, but there is still a lot of work to be done.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds