LWN.net Logo

Why add anything?

Why add anything?

Posted Mar 30, 2007 6:16 UTC (Fri) by ncm (subscriber, #165)
Parent article: Application-friendly kernel interfaces

Why should this need a file system, or a device, or a library at all?

It should suffice to call mmap() and ask for an anonymous chunk of 16M, and the kernel can simply recognize that a hugetlb would serve, and use it. If, later, the process unmaps pages within it, the remaining pages can be switched over to the regular mapping scheme; most processes won't. Then it would be easy, safe, and backward-compatible for libc to switch malloc over to allocating hugetlb chunks by default, benefitting everybody.

I would also like to see a flag added to mmap() to require that the mapped block be aligned to match its size; e.g. ask for 16M and the bottom 24 bits of the returned address are 0. (Anybody else remember when 68K chips shipped with only 24 address pins, and Apple stuck annotations in the top 8 bits of addresses because the hardware ignored those bits?)


(Log in to post comments)

Why add anything?

Posted Mar 30, 2007 12:59 UTC (Fri) by mjr (guest, #6979) [Link]

I'm wondering the same myself. I'm not much for low level hacking, but I fail to see what benefits one would reap from yet another interface.

If a separate interface is really necessary for some reason, I'd put the same functionality behind regular libc malloc(); it already does brk() for small allocations and mmap() for large ones I believe, so it could just as well do extra-large allocations via the hugetlb API. (Putting this in malloc instead of mmap would get rid of the partial-munmap issue on the libc end.)

Why add anything?

Posted Mar 30, 2007 23:19 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I can see the value of using mmap() for this, but I don't think you want mmap() guessing based on the size of the request what page size is best.

It's quite possible that 16M of memory will consist of 100 scattered 4K pages of working set and the rest rarely used or even vacant. You wouldn't want to page the whole 16M in and out in that case.

Page granularity seems like a perfectly sensible parameter of an mmap, though.

Why add anything?

Posted Apr 5, 2007 14:15 UTC (Thu) by farnz (guest, #17727) [Link]

Might be worth looking at Linux-mm.org on huge pages. In particular, there's a link to an LWM article on transparent use of huge pages. The "holy grail" is very definitely transparent use, so that whenever possible, all applications gain; anything that makes it easier to move that way is helpful.

One thought; if your mmap parameter is simply a hint that the block will be used in a particular granularity, it's easy to implement. Current mmap sets the parameter to 1 byte (no granularity needed), unless mmaping in hugetlbfs pages, when it sets the parameter to (e.g.) 16M. The kernel then just rounds up to the next highest available page size when possible, or down if not.

Why add anything?

Posted Apr 5, 2007 18:57 UTC (Thu) by joib (guest, #8541) [Link]

One problem is that the number of large page TLB entries is quite limited. E.g. on current Opterons, while you have a 512-entry data TLB for the normal 4K pages, for the 2M large pages you only have 8 entries. So if you have a loop kernel reading/writing from more than 8 big arrays you're going to have TLB trashing.

I would presume that for non-HPC applications these non-streaming, irregular access patterns are even more common. Though supposedly AMD is fixing this issue with the upcoming 'Barcelona' by having 128 2M TLB entries, and additionally supporting 1G pages (don't know how many TLB entries for those).

For comparison, the Intel Woodcrest has 256 4K and 32(?) 2M TLB entries.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds