The 2.5 development series has brought relatively few changes to the way
device drivers will allocate and manage memory. In fact, most drivers
should work with no changes in this regard. There are a few improvements
that have been made, however, that are worth a mention. These include some
changes to page allocation, and the new "mempool" interface. Note that the
allocation and management of per-CPU data is described in a separate article
The old <linux/malloc.h>
include file is gone; it is now
necessary to include <linux/slab.h>
The GFP_BUFFER allocation flag is gone (it was actually removed in
2.4.6). That will bother few people, since almost nobody used it. There
are two new flags which have replaced it: GFP_NOIO and
GFP_NOFS. The GFP_NOIO flag allows sleeping, but no I/O
operations will be started to help satisfy the request. GFP_NOFS
is a bit less restrictive; some I/O operations can be started (writing to a
swap area, for example), but no filesystem operations will be performed.
For reference, here is the full set of allocation flags, from the most
restrictive to the least::
- GFP_ATOMIC: a high-priority allocation which will not sleep;
this is the flag to use in interrupt handlers and other non-blocking
- GFP_NOIO: blocking is possible, but no I/O will be
- GFP_NOFS: no filesystem operations will be performed.
- GFP_KERNEL: a regular, blocking allocation.
- GFP_USER: a blocking allocation for user-space pages.
- GFP_HIGHUSER: for allocating user-space pages where high
memory may be used.
The __GFP_DMA and __GFP_HIGHMEM flags still exist and may
be added to the above to direct an allocation to a particular memory zone.
In addition, 2.5.69 added some new modifiers:
This flag tells the page allocater to "try harder," repeating failed
allocation attempts if need be. Allocations can still fail, but
failure should be less likely.
Try even harder; allocations with this flag must not fail. Needless
to say, such an allocation could take a long time to satisfy.
Failed allocations should not be retried; instead, a failure status
will be returned to the caller immediately.
The __GFP_NOFAIL flag is sure to be tempting to programmers who
would rather not code failure paths, but that temptation should be resisted
most of the time. Only allocations which truly cannot be allowed to fail
should use this flag.
For page-level allocations, the alloc_pages()
functions (and variants) exist as always. They
are now defined in <linux/gfp.h>
, however, and there
are a few new ones as well. On NUMA systems, the allocator will do
its best to allocate pages on the same node as the caller. To explicitly
allocate pages on a different NUMA node, use:
struct page *alloc_pages_node(int node_id,
unsigned int gfp_mask,
unsigned int order);
The memory allocator now distinguishes between "hot" and "cold" pages. A
hot page is one that is likely to be represented in the processor's cache;
cold pages, instead, must be fetched from RAM. In general, it is
preferable to use hot pages whenever possible, since they are already
cached. Even if the page is to be overwritten immediately (usually the
case with memory allocations, after all), hot pages are better -
overwriting them will not push some other, perhaps useful, data from the
cache. So alloc_pages() and friends will return hot pages when
they are available.
On occasion, however, a cold page is preferable. In particular, pages
which will be overwritten via a DMA read from a device might as well be
cold, since their cache data will be invalidated anyway. In this sort of
situation, the __GFP_COLD flag should be passed into the
Of course, this whole scheme depends on the memory allocator knowing which
pages are likely to be hot. Normally, order-zero allocations (i.e. single
pages) are assumed to be hot. If you know the state of a page you are
freeing, you can tell the allocator explicitly with one of the following:
void free_hot_page(struct page *page);
void free_cold_page(struct page *page);
These functions only work with order-zero allocations; the hot/cold status
of larger blocks is not tracked.
Memory pools were one of the very first changes in the 2.5 series - they
were added to 2.5.1 to support the new block I/O layer. The purpose of
mempools is to help out in situations where a memory allocation must
succeed, but sleeping is not an option. To that end, mempools pre-allocate
a pool of memory and reserve it until it is needed. Mempools make life
easier in some situations, but they should be used with restraint; each
mempool takes a chunk of kernel memory out of circulation and raises the
minimum amount of memory the kernel needs to run effectively.
To work with mempools, your code should include
<linux/mempool.h>. A mempool is created with
mempool_t *mempool_create(int min_nr,
is the minimum number of pre-allocated objects that
the mempool tries to keep around. The mempool defers the actual allocation
and deallocation of objects to user-supplied routines, which have the
typedef void *(mempool_alloc_t)(int gfp_mask, void *pool_data);
typedef void (mempool_free_t)(void *element, void *pool_data);
The allocation function should take care not to sleep unless
__GFP_WAIT is set in the given gfp_mask. In all of the
above cases, pool_data is a private pointer that may be used by
the allocation and deallocation functions.
Creators of mempools will often want to use the slab allocator to
do the actual object allocation and deallocation. To do that, create the
slab, pass it in to mempool_create() as the pool_data
value, and give mempool_alloc_slab and mempool_free_slab
as the allocation and deallocation functions.
A mempool may be returned to the system by passing it to
mempool_destroy(). You must have returned all items to the pool
before destroying it, or the mempool code will get upset and oops the
Allocating and freeing objects from the mempool is done with:
void *mempool_alloc(mempool_t *pool, int gfp_mask);
void mempool_free(void *element, mempool_t *pool);
mempool_alloc() will first call the pool's allocation function to
satisfy the request; the pre-allocated pool will only be used if the
allocation function fails. The allocation may sleep if the given
gfp_mask allows it; it can also fail if memory is tight and the
preallocated pool has been exhausted.
Finally, a pool can be resized, if necessary, with:
int mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);
This function will change the size of the pre-allocated pool, using the
given gfp_mask to allocate more memory if need be. Note that, as
of 2.5.60, mempool_resize() is disabled in the source, since
nobody is actually using it.
to post comments)