LWN.net Logo

On not getting burned by kmap_atomic()

"High memory," on a Linux system is, by definition, memory which is not normally mapped into the kernel's virtual address space. It is a mechanism which enables 32-bit architectures to make use of more physical memory than would otherwise be possible. When the kernel needs to directly manipulate the contents of a high-memory page, it must explicitly create a virtual address for it. The traditional functions for creating and removing those addresses are:

    void *kmap(struct page *page);
    void kunmap(struct page *page);

These functions work as intended, but they can be expensive to use. The virtual address space they use is limited, and shared across all processors. As a result, each kmap() and kunmap() invocation requires a global TLB flush. Often, however, high memory does not need to be mapped for long periods of time, and does not need to be shared across processors. To improve performance in such situations, the notion of an "atomic kmap" was added:

    void *kmap_atomic(struct page *page, enum km_type type);
    void kunmap_atomic(void *address, enum km_type type);

Atomic kmaps use a very small set of predefined virtual "slots," which are not shared across processors. The type argument specifies which slot is to be used, with the callers taking responsibility for not stepping on each others' toes. Slots are dedicated to specific purposes - two for code called in user context, two for interrupt handlers, two for page table management, etc. In practice, it all works out; conflicts over atomic kmap slots don't happen.

Another problem has come up, however, and that has led to a small change in the prototypes of the atomic kmap functions in the -mm kernel. The regular kmap functions have a symmetrical interface in that both take a struct page * argument. kunmap_atomic(), instead, takes a void * argument - the kernel virtual address to be unmapped. It is a common mistake, however, to pass in the associated struct page pointer instead. Since the argument type is void *, the compiler does not complain, and the discovery of the problem does not come until (possibly much) later.

The solution is straightforward: redefine the function as follows:

    char *kmap_atomic(struct page *page, enum km_type type);
    void kunmap_atomic(char *address, enum km_type type);

With this change, the compiler will issue a warning whenever somebody tries to pass a struct page pointer to kunmap_atomic().

The patch has generated a surprising number of follow-on fixes, mostly to suppress warnings caused by the change. Many kunmap_atomic() calls now explicitly cast the address argument to the char * type. In the end, though, the result should be one more potential mistake which can be caught before it burns somebody - as long as programmers don't "fix" warnings by casting struct page pointers.


(Log in to post comments)

char *?

Posted Nov 18, 2004 7:35 UTC (Thu) by Ross (subscriber, #4065) [Link]

So virtual addresses are now a pointer to a char, though they will probably
not be used as such and will require casting to another pointer type. Yuck!
That's like the old K&R way of writing generic pointers which was rightfully
considered to be evil. Isn't there a better way?

On not getting burned by kmap_atomic()

Posted Nov 18, 2004 14:31 UTC (Thu) by meuh (subscriber, #22042) [Link]

int kmap_atomic(struct page *page, enum km_type type, void **adress);
void kunmap_atomic(struct page *page, enum km_type type, void *address);

Symmetrical, but more text to type, strictness has a cost.
The main drawback is the higher stack usage for the arguments.

On not getting burned by kmap_atomic()

Posted Nov 18, 2004 18:05 UTC (Thu) by iabervon (subscriber, #722) [Link]

So how come kunmap() takes the page, and kunmap_atomic() doesn't? In there a good reason it wouldn't work due to the difference created by the _atomic?

Another reason for kmap_atomic()

Posted Nov 19, 2004 19:55 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

The article misses the real purpose of kmap_atomic(), which is evident in its name. kmap() can sleep, waiting for a virtual address range to be available. Some callers aren't able to sleep where they call kmap().

kmap_atomic() uses reserve pools of virtual address ranges (page table entries) so that it is always atomic (or fails immediately if the reserve pool is empty). The reason there are multiple pools (chosen by kmap's second argument) is to avoid deadlock. The kmap() succeeding is sometimes a prerequisite to page table entries getting freed up.

Example: Someone does a kmap. System needs to swap out a high memory page to free up a page table entry for the kmap. The swap device can't access high memory, so the device driver has to copy the page to a low memory page "bounce buffer". To do that, it has to kmap both pages. The one reserved bounce buffer PTE slot keeps this from causing a deadlock.

Since these kmappers are always using the same small set of page table entries, it makes an ideal place for a performance improvement with per-cpu page table entries.

Another reason for kmap_atomic()

Posted Feb 22, 2007 8:50 UTC (Thu) by pbreuer (guest, #43542) [Link]

Are you SURE kmap can sleep? This is said also by Rubini in LDD (http://www.xml.com/ldd/chapter/book/ch13.html "kmap returns a kernel virtual address for any page in the system. For low-memory pages, it just returns the logical address of the page; for high-memory pages, kmapcreates a special mapping. Mappings created with kmap should always be freed with kunmap; a limited number of such mappings is available, so it is better not to hold on to them for too long. kmap calls are additive, so if two or more functions both call kmap on the same page the right thing happens. Note also that kmap can sleep if no mappings are available.").

However, I see no way it can. In kernel 2.6.17, for example, in highmem.h

static inline void *kmap(struct page *page) { return page_address(page); }

and page_address() is defined in mm.h:

#define page_address(page) ((page)->virtual)

or

#define page_address(page) \
__va( (((page) - page_zone(page)->zone_mem_map) << PAGE_SHIFT) \
+ page_zone(page)->zone_start_paddr)

OK, so you think page_zone() might sleep? No. That's also in mm.h:

static inline zone_t *page_zone(struct page *page)
{
return zone_table[page->flags >> ZONE_SHIFT];
}

so I don't see a sleep.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds