|
|
Log in / Subscribe / Register

Two topics in user-space access

By Jonathan Corbet
March 5, 2019
Kernel code must often access data that is stored in user space. Most of the time, this access is uneventful, but it is not without its dangers and cannot be done without exercising due care. A couple of recent discussions have made it clear that this care is not always being taken, and that not all kernel developers fully understand how user-space access should be performed. The good news is that kernel developers are currently working on a set of changes to make user-space access safer in the future.

User-space and kernel addresses

The kernel provides a whole set of functions that allow kernel-space code to access user data. Naturally, these functions have to handle all of the possible things that might happen, including data that has been paged out to disk or addresses that don't point to any valid data at all. In the latter case, functions like copy_from_user() will return -EFAULT, which is usually then passed back to user space. The faulty application, which is certainly checking for error returns from system calls, can then do the right thing.

Unpleasant things can happen, though, if the address passed in from user space points to kernel data. If the kernel actually dereferences those addresses, it could allow an attacker to get at data that should be protected. The access_ok() function exists to prevent this from happening, but it can't work if kernel developers forget to call it before passing an address to low-level user-space access functions like __copy_from_user() (the higher-level functions call access_ok() internally). This particular omission has led to some severe vulnerabilities in the past.

This problem was, until recently, aggravated by the fact that, if an attacker tried to exploit a missing-access_ok() vulnerability using a kernel-space address that turned out to be invalid, the kernel would helpfully return -EFAULT. That would allow attackers to probe the kernel's address space at leisure until the target data structures had been found. Back in August 2018, Jann Horn added a check to catch this case and cause a kernel oops when it happens; attackers with access to a missing-access_ok() vulnerability were deprived of the ability to quietly dig around in kernel space, but there were some other, unexpected consequences as well.

As reported by Changbin Du, kernel probes ("kprobes") can be configured to access strings at any address — in either user or kernel space. The chances of such probes seeing invalid addresses are relatively high and, after Horn's patch, they would cause a kernel oops. Linus Torvalds pulled the suggested fix, but objected to the idea that a single function in kprobes (or anywhere else in the kernel) could accept both user-space and kernel addresses and manage to tell them apart.

On most architectures supported by Linux, it is usually relatively easy to distinguish user-space addresses from kernel-space addresses; that is because the two are confined to different parts of the overall address space. On 32-bit x86 systems, the default was for user space to own addresses below 0xc0000000, with the kernel owning everything above that point. Among other things, this layout improves performance by avoiding the need to change page tables when switching between user and kernel mode. But there is nothing that requires the address space to be laid out that way. A classic example is the "4G:4G" mode for x86, which gave the entire 32-bit address space to user space, then switched page tables on entry into the kernel so that the kernel, too, had the full address space.

When something like 4G:4G is in effect, the same address can be meaningful in both user and kernel space, but will point to different data. There is, at that point, no way to reliably distinguish the two types of addresses just by looking at them. There are other environments where the address spaces can overlap in this way, and defensive technologies like kernel page-table isolation are pushing even plain x86 systems in that direction. As a result, any attempt to handle both user-space and kernel addresses without knowing which they are is going to end in grief sooner or later. That explains why Torvalds became so unhappy at any attempt to do so.

The solution for kprobes will be to require accesses to specify whether they are meant for user space or kernel space. To that end, Masami Hiramatsu has been working on a patch set to add a new set of accessors for user-space data. Once those are added, and after some time has passed, it's likely that the current accessors will be changed to work with kernel-space data only.

Kprobes are not the only place where addresses have been mixed up in this way; it turns out that BPF programs will call bpf_probe_read() with either type of address and expect it to work. Changing that, Alexei Starovoitov said, could break existing user code. Torvalds responded, though, that: "It appears that people haven't understood that kernel and user addresses are distinct, and may have written programs that are fundamentally buggy". He would like to see such uses start to fail on the x86 architecture sometime soon so that users will fix their code before something more unpleasant happens.

The solution here will be similar to what is being done with kprobes. Two new functions (with names like bpf_user_read() and bpf_kernel_read()) will be introduced, and developers will be strongly encouraged to convert their code over to them. Eventually, bpf_probe_read() will go away entirely. But, as Torvalds noted, that will not be happening in the immediate future: "It's really a 'long-term we really need to fix this', where the only question is how soon 'long-term' is".

Keeping user space walled off

While the kernel must often access user space, unpleasant things can happen when the kernel does so accidentally. Many types of attacks depend on getting the kernel to read data (or execute code) that is located in user space and under the attacker's control. To prevent such things from happening, processor vendors have implemented features to prevent the kernel from accessing user-space pages from random places. Intel's supervisor-mode access prevention (SMAP) and Arm's privileged access never (PAN) mechanisms are examples of this type of feature; when this protection is available, the kernel tries to make use of it.

This protection must, of course, be removed whenever the kernel legitimately needs to get at user-space memory. For the most part, this is handled within the user-space access functions themselves, but there are cases where higher-level code may need to disable user-space access protection. If nothing else, the instructions to enable and disable protection are expensive, so code that performs a series of accesses can be sped up by just disabling protection once for the entire series. This is managed with calls to functions like user_access_begin() and user_access_end().

The code that runs with user-space access protection disabled should be as short as possible. The more code that runs, the bigger the chance that it could contain an exploitable bug. But there is another hazard to be aware of: a call to schedule() could result in another process taking over the processor — with user-space access protection still disabled. Once that happens, there is no knowing when protection could be enabled again or how much buggy code might be executed in the meantime.

The desire to prevent this situation is why user_access_begin() comes with a special rule: users should call no other functions while user-space access prevention is disabled. But, as Peter Zijlstra noted, this rule is "currently unenforced and (therefore obviously) violated". That seems likely to change, though, as a result of his patch set enhancing the objtool utility with the ability to identify (and complain about) function calls in these sections of code. Functions known to be safe to call can be specially marked; the functions that perform user-space access are about the only obvious candidates for this annotation.

Both of these cases show that user-space access is trickier and less well understood than many developers expect. A couple of long-time kernel developers (at least) were surprised to learn that any particular address can be valid (but mapped differently) in both kernel and user space. It seems, though, that at least some of these problems can be addressed with better APIs and better tools.

Index entries for this article
Kernelcopy_*_user()
KernelSecurity/Security technologies


to post comments

Two topics in user-space access

Posted Mar 5, 2019 19:40 UTC (Tue) by nix (subscriber, #2304) [Link]

It's terribly easy to get this wrong. DTrace got it right from year zero in the early 2000s, with copyin() functions the user had to call explicitly to get data from userspace, because the SPARC has multiple architectural address spaces, and Solaris used them as you'd expect to separate userspace and kernelspace. But because for a long time x86 used only one address space... things regressed in the Linux x86 port, and it was only when I upgraded to a server with SMAP that things started failing anywhere I could see so I had a chance to fix it and put in appropriate SMAP-disabling instructions in the right places.

The real problem isn't that these accessors are needed -- it's that so many arches have no way to make the lack of them obvious. There should always be a way, even an expensive way that nobody would ever run when not debugging, to have your tests explode if this sort of thing isn't being dealt with right. But right now it's down to finding hardware that has the feature, and that's tricky when it's hard to even figure out before you buy a machine whether it has SMAP or not: x86 is improving, but even today there are x86 CPUs on sale that don't have SMAP :( and most of the installed base doesn't, no doubt including most of the installed base of developers trying to work with stuff like this.

Two topics in user-space access

Posted Mar 6, 2019 2:12 UTC (Wed) by klossner (subscriber, #30046) [Link] (1 responses)

A couple of long-time kernel developers (at least) were surprised to learn that any particular address can be valid (but mapped differently) in both kernel and user space.
Those of us who are really long-term developers expect this thanks to our experience with 16-bit machines. On a PDP-11/45 running Unix, there was no way that kernel and userspace could have shared a 64KB address space.

Two topics in user-space access

Posted Mar 18, 2019 21:51 UTC (Mon) by valarauca (guest, #109490) [Link]

>Those of us who are really long-term developers expect this thanks to our experience with 16-bit machines. On a PDP-11/45 running Unix, there was no way that kernel and userspace could have shared a 64KB address space.

They actually did. This is where `setbrk` came from, as it described where your program ended, and where the kernel started.

It should also be noted that there was no memory protection, so when tasks were switched, the whole "userland" address space (south of setbrk) was written to disk, and another chunk of memory was loaded and jumped too.

IBM's System360 was the first computer to do virtual memory in hardware.

Two topics in user-space access

Posted Mar 6, 2019 3:39 UTC (Wed) by unixbhaskar (guest, #44758) [Link]

Having two different and distinct call make sense for ebpf stuff. Access and end should demark the session within it.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds