December 4, 2007
This article was contributed by Daniel Drake
When developing kernel code, it is usually important to consider
constraints and requirements of architectures other than your
own. Otherwise, your code may not be portable to other architectures, as I
recently discovered when an unaligned memory access bug was reported
in a driver which I develop. Not having much familiarity with the concepts
of unaligned memory access, I set out to research the topic and complete my
understanding of the issues.
Certain architectures rule that memory
accesses must meet some certain alignment criteria or are otherwise
illegal. The exact criteria that determines whether an access is suitably
aligned depends upon the address being accessed and the number of bytes
involved in the transaction, and varies from architecture to architecture.
Kernel code is typically written to obey natural alignment
constraints, a scheme that is sufficiently strict to ensure portability to
all supported architectures. Natural alignment requires that every N byte
access must be aligned on a memory address boundary of N. We can express
this in terms of the modulus operator: addr % N must be
zero. Some examples:
- Accessing 4 bytes of memory from address 0x10004 is aligned
(
0x10004 % 4 = 0).
- Accessing 4 bytes of memory from address 0x10005 is unaligned
(
0x10005 % 4 = 1).
The phrase "memory access" is quite vague; the context here is
assembly-level instructions which read or write a number of bytes to or
from memory (e.g.
movb,
movw,
movl
in x86 assembly). It is relatively easy to relate these to C statements,
for example the instructions that are generated when the following code is
compiled would likely include a single instruction that accesses two bytes
(16 bits) of data from memory:
void example_func(unsigned char *data) {
u16 value = *((u16 *) data);
[...]
}
The effects of unaligned access vary from architecture to
architecture. On architectures such as ARM32 and Alpha, a processor
exception is raised when an unaligned access occurs, and the kernel is able
to catch the exception and correct the memory access (at large cost to
performance). Other architectures raise processor exceptions but the
exceptions do not provide enough information for the access to be
corrected. Some architectures that are not capable of unaligned access do
not even raise an exception when unaligned access happens, instead they
just perform a different memory access from the one that was requested and
silently return the wrong answer.
Some architectures are capable of performing unaligned accesses without
having to raise bus errors or processor exceptions, i386 and x86_64 being
some common examples. Even so, unaligned accesses can degrade performance
on these systems, as Andi Kleen explains:
On Opteron the typical cost of a
misaligned access is a single cycle and some possible penalty to load-store
forwarding. On Intel it is a bit worse, but not all that much. Unless you
do a lot of accesses of it in a loop it's not really worth something caring
about too much.
At the end of the day, if you write code that causes unaligned accesses
then your software will not work on some systems. This applies to both
kernel-space and userspace code.
The theory is relatively easy to get to grips with, but how does this apply
to real code? After all, when you allocate a variable on the stack, you
have no control over its address. You don't get to control the addresses
used to pass function parameters, or the addresses returned by the memory
allocation functions. Fortunately, the compiler understands the alignment
constraints of your architecture and will handle the common cases just
fine; it will align your variables and parameters to suitable boundaries,
and it will even insert padding inside structures to ensure the access to
members is suitably aligned. Even when using the GCC-specific packed
attribute (which tells GCC not to insert padding), GCC will
transparently insert extra instructions to ensure that standard accesses to
potentially unaligned structure members do not violate alignment
constraints (at a cost to performance).
In order to illustrate a situation that might cause unaligned memory
access, consider the example_func() implementation from
above. The first line of the function accesses two bytes (16 bits) of data
from a memory address passed in as a function parameter; however, we do not
have any other information about this address. If the data
parameter points to an odd address (as opposed to even), for example
0x10005, then we end up with an unaligned access. The main
places where you will potentially run into unaligned accesses are when
accessing multiple bytes of data (in a single transaction) from a pointer,
and when casting variables to types of increased lengths.
Conceptually, the way to avoid unaligned access is to use byte-wise memory
access because accessing single bytes of memory cannot violate alignment
constraints. For example, for a little-endian system we could replace the
example_func() implementation with the following:
void fixed_example_func(unsigned char *data) {
u16 value = data[0] | data[1] << 8;
[...]
}
memcpy() is another possible alternative in the general case,
as long as either the source or destination is a pointer to an 8-bit data
type (i.e. char). Inside the kernel, two macros are provided
which simplify unaligned accesses: get_unaligned() and
put_unaligned(). It is worth noting that using any of these
solutions is significantly slower than accessing aligned memory, so it is
wise to completely avoid unaligned access where possible.
Another option is to simply document the fact that
example_func() requires a 16-bit-aligned data parameter, and
rely on the call sites to ensure this or simply not use the
function. Linux's optimized routine for comparing two ethernet addresses
(compare_ether_addr()) is a real life example of this: the
addresses must be 16-bit-aligned.
I have applied my newfound knowledge to the task of writing some kernel
documentation, which covers this topic in more detail. If you want to learn
more, you may want to read the most recent
revision (as of this writing) of the document. Additionally, the initial
revision of the document generated a lot of interesting discussion, but
be aware that the initial attempt contained some mistakes. Finally, chapter
11 of Linux Device Drivers
touches upon this topic.
I'd like to thank everyone who helped me improve my understanding of
unaligned access, as this article would not have been possible without
their assistance.
(
Log in to post comments)