LWN.net Logo

The hazards of 32/64-bit compatibility

By Jake Edge
September 22, 2010

A kernel bug that was found—and fixed—in 2007 has recently reared its head again. Unfortunately, the bug was reintroduced in 2008, leaving a rather large pile of kernel versions that are vulnerable to a local privilege escalation on x86_64 systems. Though perhaps difficult to do, it would seem that some kind of regression testing suite for the kernel might be able to detect these kinds of problems before they get released to the world.

There are two semi-related bugs that are both currently floating around, which is causing a bit of confusion. One was originally CVE-2007-4573, and was reintroduced in a cleanup patch in June 2008. The reintroduced vulnerability has been tagged as CVE-2010-3301 (though the CVE entry is simply reserved at the time of this writing). Ben Hawkes found a somewhat similar vulnerability—also exploiting system calls from 32-bit binaries on 64-bit x86 systems—which led him to the discovery of the reintroduction of CVE-2007-4573.

There are numerous pitfalls when trying to handle 32-bit binaries making system calls on 64-bit systems. Linux has a set of functions to handle the differences in arguments and calling conventions between 32 and 64-bit system calls, but it has always been tricky to get right. What we are seeing today are two instances where it wasn't done correctly, and the consequences of that can be dire.

The 2007 problem stemmed from a mismatch between the use of the %eax 32-bit register to store the system call number (which is used as an index into the syscall table) and the use of the %rax 64-bit register (which contains %eax as its low 32 bits) to do the indexing. In the "normal" system call path, %eax was zero-extended before the 32-bit system call number from user space was stored, but there was a second path into that code where the upper 32 bits in %rax were not cleared.

The ptrace() system call has the facility to make other system calls (using the PTRACE_SYSCALL request type) and also gives a user the ability to set register values. An attacker could set the upper 32 bits of %rax to a value of their choosing, make a system call with a seemingly valid index (in %eax) and end up indexing somewhere outside of the syscall table. By arranging to have exploit code at the designated location, the attacker can get the kernel to run his code.

The ptrace() path was fixed by Andi Kleen in September 2007 by ensuring that %eax (and other registers) were zero-extended. But zero-extending %eax was removed in Roland McGrath's clean up patch in June 2008. When Hawkes and Robert Swiecki recently noticed the problem, they had little difficulty in modifying an exploit from 2007 to get a root shell on recent kernels.

CVE-2010-3301 was resolved by a pair of patches. McGrath put the zero-extension of the %eax register back into the ptrace path, while H. Peter Anvin made the validity test of the system call number look at the entire %rax register. Either would be sufficient to close the current hole, but Anvin's patch will prevent any new paths into the system call entry code from running afoul of this problem in the future.

The fact that the old exploit was useful implies that someone could have written a test case in 2007 that might have detected the reintroduction of the problem. A suite of such regression tests, run regularly against the mainline, would be quite useful as a way to reduce regressions, both for normal bugs as well as for security holes. Not all kernel bugs will be amenable to that kind of testing, but, for those that are, it seems like an idea worth pursuing.

The other problem that Hawkes found (CVE-2010-3081, also just reserved) is that the compat_alloc_user_space() function did not check to see that the pointer which is being returned is actually a valid user-space pointer. That routine is used to allocate some stack space for massaging 32-bit data into its 64-bit equivalent before making a system call. Hawkes found two places (and believes there are others) where the lack of an access_ok() call in that path could be exploited to allow attackers to write to kernel memory.

One of those was in a video4linux ioctl(), but the more easily exploited spot was in the IP multicast getsockopt() call. It uses a 32-bit unsigned length parameter provided by user space that can be used to confuse compat_alloc_user_space() into returning a pointer into kernel memory. The compat_mc_getsockopt() call then writes user-supplied values using those pointers. That can be fairly easily turned into an exploit as Hawkes noted:

This path allows an attacker to write a chosen value to anywhere within the top 31 bits of the kernel address space. In practice, this seems to be more than enough for exploitation. My proof of concept overwrote the interrupt descriptor table, but it's likely there are other good options too.

Anvin patched compat_alloc_user_space() so that it always does the access_ok() check. That should take care of the two problem spots that Hawkes found as well as any others that are lurking out there. But there have been a whole lot of kernels released with one or both of these bugs, and there have been other bugs associated with 64-bit/32-bit compatibility. It is a part of the kernel that Hawkes calls "a little bit scary":

Not just because it's an increased attack surface versus having purely 32-bit or purely 64-bit modes, but because of the type of input processing that has to be performed by any such compatibility layer. It invariably involves a significant amount of subtle bit wrangling between 32/64-bit values, using primitives that I'd argue most programmers aren't normally exposed to. The possibility of misuse and abuse is very real.

Perhaps 32-bit compatibility for x86_64 kernels would be a good starting point for regression testing. Some enterprise distributions were not affected by CVE-2010-3301 because of the ancient kernels (like RHEL's 2.6.18) they are based on, but CVE-2010-3081 was backported into RHEL 5, which required that kernel to be updated. The interests of distribution vendors would be well-served by better—any—regression testing so a project of that sort would be quite welcome. The vendors may already be running some tests internally, but regression testing is just the kind of project that would benefit from some cross-distribution collaboration.

It should also be noted that a posting to the full-disclosure mailing list claims that the vulnerability in compat_mc_getsockopt() has been known for nearly two-and-a-half years by black (or at least gray) hats. According to the post, it was noticed when the vulnerability was introduced in April 2008. Certainly there are some that are following the commit-stream to try to find these kinds of vulnerabilities; it would be good if the kernel had a team of white hats doing the same.


(Log in to post comments)

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 11:18 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

This may be a silly question (especially at this stage), but how hard would it be to do more of the 32bit/64bit compatibility in user space? Like say, would a special C library for mixed systems work? I realise of course that anyone doing their own system calls would probably also need some fixing, but people who do that sort of thing should be capable of handling it.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 13:46 UTC (Thu) by nelhage (subscriber, #59579) [Link]

In order to run 32-bit code, the userspace process has to run in 32-bit compatibility mode at the hardware level -- this is a setting on x86_64 processors to temporarily emulate 32-bit processors. So the kernel needs at least that much support, since I believe entering and exiting compatibility mode is a privileged operation.

You could potentially imagine doing the syscall 32/64 marshalling in user space, but you have the problem that unless you do something really clever and/or scary, that marshalling has to run in 32-bit mode because of the above fact, even though it's trying to talk to 64-bit kernel interfaces. Since the 64-bit kernel interfaces rely on using e.g. the full 64-bit registers, this is probably impossible.

You could perhaps have some new interface that accepts 64-bit data, but in a form accessible to this 32-bit shim layer, but there would almost certainly be a performance cost associated.

There are other problems, too, such as this shim having to know about virtually every ioctl in the kernel, something that would be nigh-impossible to keep up to date.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 15:01 UTC (Thu) by avik (guest, #704) [Link]

Switching modes is not a privileged operation; all it takes is a far jump (setting up the segment for the jump _is_ a privileged operation).

AFAIK the Windows kernel does not support 32-bit entries at all; the compatibility code is completely in userspace.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 16:01 UTC (Thu) by nelhage (subscriber, #59579) [Link]

Interesting. So the kernel could conceivably set up a segment so that libc could jump into a 64-bit compat shim before making syscalls.

The problem of libc having to know about all these random ioctl calls so that it can marshal their parameters between 32-bit and 64-bit mode still exists, though, and I don't see a good solution there.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 16:07 UTC (Thu) by avik (guest, #704) [Link]

This marshalling needs to exist anyway, the only question is whether in the kernel or userspace.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 17:03 UTC (Thu) by nelhage (subscriber, #59579) [Link]

Right, but the kernel has to know the layout and structure of all of these structs, anyways, since it has to extract the data to use it. So it's only a little more work, comparatively, to also have 32-bit parse code.

Whereas currently, libc doesn't have to know anything about ioctl formats, it just passes a pointer along. And so if you compile a new kernel module that has some random new ioctl()s, and install the corresponding user programs, everything works. But if libc has to do the marshaling, I also need to update my libc, which is much harder.

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 17:35 UTC (Thu) by avik (guest, #704) [Link]

The trick is that the kernel provides the compatibility library, not libc.

(Userspace vs kernel) != (libc vs kernel)

64-bit pure distros?

Posted Sep 23, 2010 14:09 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

These days it seems to me that everything I run is 64-bit. My two proprietary software concessions (video driver, and flash) are 64-bit. Would it be feasible for popular distros to offer 64-bit pure systems, and configure out this compat layer from the kernel? I'd imagine that, build-wise, it would be pretty much equivalent to the normal 64+32 distro, except with a different kernel image and the package manager just wouldn't install the 32-bit binaries.

64-bit pure distros?

Posted Sep 23, 2010 15:26 UTC (Thu) by dag- (subscriber, #30207) [Link]

I was considering the same thing. Is it possible to disable the 32bit compatibility mode by booting with some kernel boot parameter (or manipulating /proc after boot) ?

64-bit pure distros?

Posted Sep 23, 2010 18:37 UTC (Thu) by dtlin (✭ supporter ✭, #36537) [Link]

This has been brought up before. See the first part of http://lwn.net/Articles/405955/, which instructs the kernel to execute everything looking like a 32-bit ELF using /bin/echo (which obviously doesn't execute much).

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 16:57 UTC (Thu) by nix (subscriber, #2304) [Link]

> This path allows an attacker to write a chosen value to anywhere within the top 31 bits of the kernel address space.

So, is that the top four bytes of the kernel address space, or every 8-billionth byte throughout the address space?

(I suspect what is meant is 'to anywhere within the top half'? Perhaps?)

(I am definitely being too pedantic.)

The hazards of 32/64-bit compatibility

Posted Sep 23, 2010 17:06 UTC (Thu) by nelhage (subscriber, #59579) [Link]

What is meant is the top ~2 billion (2^31) addresses. That is to say, any address accessible via a 31-bit offset from the top of the kernel address space.

The expression "top 31 bits of kernel address space" is a bit jargony, but I suspect most kernel developers would get what it means without thinking too hard.

The hazards of 32/64-bit compatibility

Posted Oct 3, 2010 23:00 UTC (Sun) by nix (subscriber, #2304) [Link]

I thought I knew what it meant, too, but the more I thought about it the more the meaning slipped away from me.

The hazards of 32/64-bit compatibility

Posted Sep 24, 2010 14:45 UTC (Fri) by price (guest, #59790) [Link]

The expression "N bits of address space" is used in the kernel tree itself in places like Documentation/x86/x86_64/mm.txt. It's well understood to mean "a region of address space 2^N bytes wide", but is more concise.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds