|
|
Log in / Subscribe / Register

On the proper use of vmalloc()

As those who have looked at kernel programming at all have noticed, there are two basic memory allocation modes in Linux. One of those, which comes down to get_free_pages() in the end, allocates one or more physically contiguous pages which are in the kernel's main virtual address space (except for high memory pages, of course). Most other memory allocation mechanisms, including the slab allocator and kmalloc(), are built on top of get_free_pages(). In the other corner is vmalloc(), which allocates virtually contiguous (but physically dispersed) pages in a separate virtual address space. vmalloc() is relatively slow, but it can perform large allocations that look contiguous to the kernel. It is thus used, for example, to allocate space for code from loadable modules.

Erik Jacobson recently found the limits of kmalloc() while querying /proc/interrupts on a very large system. The code implementing /proc/interrupts attempts to allocate a buffer for its output; the size of that buffer is dependent on the number of processors on the system. On big systems, the required buffer is large and the allocation fails. So Erik submitted a fix which uses vmalloc() to allocate the memory instead.

Linus didn't like it. He pointed out that the seq_file interface should be used instead. Indeed, /proc/interrupts fits naturally into the sort of output seq_file is intended to create, and doing things that way can eliminate the need to allocate a large buffer at all. But Linus also clarified his thoughts on when vmalloc() should be used:

There are basically no valid new uses of it. There's a few valid legacy users (I think the file descriptor array), and there are some drivers that use it (which is crap, but drivers are drivers), and it's _really_ valid only for modules. Nothing else.

That should be sufficiently clear for most readers; perhaps an entry on vmalloc() needs to be added to the coding style document.

There are a few reasons for this stance. Every call to vmalloc() requires page table tweaking and translation buffer flushes, so it will be slow. Space from vmalloc() lies outside of the regular kernel range, which is (on most architectures) covered by a single, large page table entry, so extra translation buffer slots are required to access it. And, on many architectures, the amount of virtual space set aside for vmalloc() is relatively small. For all of these reasons, use of vmalloc() is discouraged, and patches containing vmalloc() calls are increasingly unlikely to make it into the kernel.


to post comments


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds