LWN.net Logo

Large pages, large blocks, and large problems

Large pages, large blocks, and large problems

Posted Sep 19, 2007 23:42 UTC (Wed) by socket (guest, #43)
Parent article: Large pages, large blocks, and large problems

Memory management is not something I know much about - relevant classes I took at the university were limited to pre-386 assembly and digital logic, so I'm not even sure that my understanding of a "page" is accurate.

Can someone recommend reading materials (preferably free, but don't rule something really good out just on that account) on this stuff?

And otherwise, I'm curious how the proposed changes would affect the work of people who are trying to get Linux to scale down to smaller devices?


(Log in to post comments)

Large pages, large blocks, and large problems

Posted Sep 20, 2007 0:49 UTC (Thu) by drag (subscriber, #31333) [Link]

> And otherwise, I'm curious how the proposed changes would affect the work of people who are trying to get Linux to scale down to smaller devices?

I donno. All of this stuff is over my head, but I expect it would either have a generally null effect to generally positive effect.

I know that a popular task to put Linux to use for is those little embedded 'NAS' controllers. You know, those things running little ARM proccessor or something lightweight like that were you can shove 3 or 4 SATA drives into and they cost around 100-200 bucks or so.

I know that for Gigabit speed networks, and faster interconnects, one of the major problems you have, in terms of performance, is that they are still using very tiny MTU's originally developed for 10Mbit/s networks. 'Jumbo frames' are were you take the small 1500 bytes and bump the size up to 9500bytes or even higher. This leads to significantly less interrupts being generated by the controller and much less TCP overhead. IF all your hardware and network hardware supports it. You can realy get very significant network performance improvements. Sometimes 2x the performance at half the cpu usage.

Then if you take that further and are able to use large packets with large disk blocks, say that if you strip away the ethernet frame and tcp information the datagram of the packet and the size of the disk block is the same size, then I suppose you can reduce overhead and increase performance even more.

All in all this would allow people to make slower/cheaper proccessors and perform better. Cheaper, faster embedded Linux devices.

Of course this is all very idealized. Lots of switches and NICs don't support jumbo packets, most people will still use Widnows with SMB which is just naturally slow, and most people don't have the abilty to configure the network in this way even if they know how. Plus the sorts of CPU they use I don't know if they would even have those large memory page sizes supported.

Oh well.

> Can someone recommend reading materials (preferably free, but don't rule something really good out just on that account) on this stuff?

Ever checked out http://kernelnewbies.org/
or http://www.linux-books.us/linux_general_0014.php ?

Large pages, large blocks, and large problems

Posted Sep 20, 2007 1:58 UTC (Thu) by sayler (guest, #3164) [Link]

In general, I agree with what you say, but keep in mind that Ethernet frames are inherently variable in size, that is, you can have 1500, 1501, 1502, ... byte frames and the transmission time will increase nearly linearly.

We have much coarser choices for page sizes. Even on Alpha (which apparently did a good job here), page size choices were something like 8k ** 2*N where N ran between 0 and 3..

There is some other somewhat interesting data here: http://lists.freebsd.org/pipermail/freebsd-hackers/2003-O... showing measured {i,d}tlb size for various page sizes on various uArchs.

Large pages, large blocks, and large problems

Posted Sep 21, 2007 15:16 UTC (Fri) by jamesh (guest, #1159) [Link]

It is true that ethernet frames are variable size, but it also states that the maximum payload size is 1500 bytes as the grandparent post says. You need to have some upper limit in order to make hardware that can reliably store and forward packets (as a switch would need to do when forwarding a packet to a slower network).

Ethernet frames larger than 1500 bytes are non-standard and commonly known as "jumbo frames". And as you can guess, they'll only work if all the hardware involved in the link supports the larger frames.

Large pages, large blocks, and large problems

Posted Sep 20, 2007 14:04 UTC (Thu) by lethal (subscriber, #36121) [Link]

The biggest problem on the small side is transparent usage of large TLBs, the idea being something akin to Andrea's CONFIG_PAGE_SHIFT but relative to the TLB size whilst maintaining normal PAGE_SIZE'ed PTEs. One thing that was tossed about at kernel summit was the idea of having the VM provide base page and range hints for contiguous page frames which could be optimized for in the TLB miss handler for software-loaded TLBs (many embedded systems, where TLBs are very small, for example). Namely, for some extra performance hit in the architecture-specific hot path we have the ability to cut off linear page faults directly, rather than speculatively (this is an important distinction between this approach and the rice superpages as well as the approaches used by HP-UX and IRIX).

The other issue is that the d-cache does grow, and the TLB doesn't always scale accordingly. For heavy shared library and multi-threading apps, folks love to toss on copious amounts of slower cache, to the point where there's insufficient TLB coverage to make it out of cache, and thus, thrashing ensues when small pages are used. On ia64 the answer to this is always to bump up PAGE_SIZE, where 64kB tends to be a requirement to make it out of cache (and these are _huge_ TLBs!). On embedded where the TLBs are orders of magnitude smaller and consistently under pressure, bumping up the page size is simply not an option. We don't want a large page size, we want a large TLB entry size that can span multiple pages in order to reduce the amount of application time we waste on linear faulting.

I brought this up at kernel summit, and Linus supported the idea of VM hinting for page ranges, so it will be interesting to see where this work goes. Not only will such things tie in with Christoph's work, it also operates under the assumption that we're not fragmented out of the box, too. Thus, there's also a dependence on Mel's work, especially if one is to consider ways to passively provide hints during page reclaim or so.

It is worth differentiating between large pages and large TLBs. Large pages on embedded outside of application specific use (ie, hugetlbfs) are generally undesirable. The general embedded case is usually reasonably large memory apertures (relative to TLB and PAGE_SIZE), especially in peripheral space. Then a combination of many small files and some very big ones. The places where we have explicit control over the TLB size (ie, ioremap()) are already handled by the architectures that care, so in terms of transparency, it's simply anonymous and file-backed pages where VM hinting is helpful. Background scanning is mentioned from time to time, but is unrealistic for these applications since the system is usually doing run-time power management, also.

The picture today is certainly much less bleak than it was even just a year ago, but there is still a lot of work to be done.

Large pages, large blocks, and large problems

Posted Sep 23, 2007 23:23 UTC (Sun) by man_ls (subscriber, #15091) [Link]

I found Henson's article here on LWN quite good: KHB: Transparent support for large pages. The referred paper by Navarro (cited here too) is a joy to read, maybe due to the contribution from Alan Cox.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds