A canary for timer-expiration functions
A bug that allows an attacker to overwrite a function pointer in the kernel opens up a relatively easy way to compromise the kernel—doubly so, if an attacker simply needs to wait for the kernel use the compromised pointer. There are various techniques that can be used to protect kernel function pointers that are set at either compile or initialization time, but there are some pointers that are routinely set as the kernel runs; timer completion functions are a good example. An RFC patch posted to the kernel-hardening mailing list would add a way to detect that those function pointers have been changed in an unexpected way and to stop the kernel from executing that code.
The patch from Kees Cook is targeting a
class of vulnerabilities that arrange to overwrite the function
field in struct timer_list. That field is the function that will
be called when the timer expires and it conveniently (from an attacker's
perspective) passes the next field in that structure (data) to the
function. So an attacker who finds a way to overwrite function
can likely overwrite data as well, leading to a fairly
straightforward way to call some code of interest and to pass it an
argument. As Cook put it: "This provides attackers
with a ROP-like
primitive for performing limited kernel
function calls
without needing all the prerequisites to stage a ROP attack.
"
Exploits
In the patch, he pointed to two recent exploits that used this technique. The first was described by its discoverer, Philip Pettersson, in December 2016. It uses an AF_PACKET socket (for raw socket handling as used by tools like tcpdump) and manipulates the version of the packet-socket API requested using setsockopt(). By changing from TPACKET_V3 to TPACKET_V1 at just the right time (to take advantage of a race condition), his demonstration exploit will cause the memory containing a timer_list to be freed without deleting the timer.
So the timer object will be used by the kernel after it has been freed. By arranging for that memory to be reallocated somewhere that the attacker can write to (Pettersson mentions using the add_key() system call to do so), function and data can be overwritten. In the example, it actually does that twice, first to change the vsyscall table from read-only to read-write, then to register a world-writable sysctl (/proc/sys/hack) that sets the path of the modprobe executable. It arranges that the exploit program gets run as modprobe (as root), which leads to the execution of a root shell.
The second recent exploit was the subject of a lengthy Project Zero blog post in May by Andrey Konovalov, who discovered the flaw using the syzkaller fuzzer. It uses a heap buffer overflow in the AF_PACKET code. By arranging the heap appropriately and sending a packet with the contents of interest (the exploit code, effectively), it will place the code into memory. But that memory is user-space memory, and the Intel supervisor mode access protection (SMAP) and supervisor mode execution protection (SMEP) features will prevent the kernel from directly accessing or executing code there. So Konovalov used the same technique as Pettersson to simply disable those protections by calling native_write_cr4() to change the CR4 register bits as the expiration function of a socket timer.
Once that is done, it sets up a new compromised socket and ring buffer combination that turns a transmit function pointer into a pointer to a commit_creds(prepare_kernel_cred(0)) call in user space. Then simply transmitting a packet using the socket invokes the code, which gives the current process root privileges.
It is interesting to note that both of these vulnerabilities can be exploited by non-privileged users on distributions (e.g. Fedora, Ubuntu) where user namespaces are enabled and unrestricted. Both require the CAP_NET_RAW capability to create packet sockets, which can be acquired by unprivileged users by creating a new user namespace. While the problem is not directly attributable to the user namespace code itself, it does further highlight the dangers of expanding user privileges that namespaces provide. Both Pettersson and Konovalov warn against allowing unprivileged users to create user namespaces.
Both also avoid kernel address-space layout randomization (KASLR), SMAP, and SMEP protections. Pettersson's exploit uses hardcoded offsets for the calls of interest to avoid KASLR, while Konovalov reads dmesg to pluck out the kernel's text address. SMAP/SMEP are either bypassed by using kernel memory directly (Pettersson) or by explicitly disabling the features (Konovalov).
A possible fix
Cook's patch would add a canary field to struct timer_list just prior to the function field. When a timer is initialized, the canary would be set to a value calculated by XORing the addresses of the timer and the function, along with a random number that only the kernel would know. The idea is that the canary value would also be overwritten if the function pointer is. So, before calling the function when the timer expires, the canary would be recalculated and compared with the stored value; if they differ, the function pointer has been changed and will not be called. A warning will be logged as well.
Unfortunately, Cook soon realized that his patch was incomplete. He had addressed timers that were set up using the setup_timer_*() macros and the add_timer() function, but missed many static timer initializations that use DEFINE_TIMER(). He promised a revised version of the patch to handle that case.
But it turns out that will require some extensive refactoring of the timer code, he said in response to an email query. That is a bigger job than he expected, but does provide a nice cleanup, he said. He may also have to weaken the canary for the static timers, he said in the patch followup. As with many cross-subsystem patch sets that change code across the tree, getting something like that into the mainline may be difficult. Cook outlined some of the problems he and others have encountered trying to do so in a ksummit-discuss thread back in June.
As the two exploits showed, though, the problem is real. Some kind of solution that would simply eliminate that class of vulnerabilities would be welcome. Whether Cook's canary can be that solution remains to be seen, however.
| Index entries for this article | |
|---|---|
| Kernel | Security/Kernel hardening |
| Security | Linux kernel/Hardening |
Posted Aug 16, 2017 18:46 UTC (Wed)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Aug 16, 2017 23:36 UTC (Wed)
by kees (subscriber, #27264)
[Link]
In a perfect world, we could just identify all timer function callbacks by a common class and use CFI[2] to isolate timers to only calling timer callbacks. Until we have CFI, though, this seems like a nice change to make. (The refactoring is large but mostly mechanical, and would seem to have stand-alone benefit.)
[1] http://www.openwall.com/lists/kernel-hardening/2017/08/16/18
Posted Aug 17, 2017 10:04 UTC (Thu)
by epa (subscriber, #39769)
[Link] (9 responses)
An uglier approach, but perhaps easier to transition to, is to make a big static array (in a piece of generated C code) containing the address of every function in the kernel. Then you replace function pointers with an index into this array. Now an attacker can jump to an arbitrary kernel function, but not to arbitrary addresses. A small refinement is to only store the needed functions in the array -- kernel functions which aren't currently referenced by function pointers don't need to appear.
Would it be possible to use sparse or another static analyser to automate converting function pointer code to this style?
In user space, could it have benefits too?
Posted Aug 17, 2017 10:30 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (6 responses)
So you're going to forbid out-of-tree modules?
Posted Aug 20, 2017 9:26 UTC (Sun)
by jzbiciak (guest, #5246)
[Link] (4 responses)
You could, in principle, allow modules to register their entry points as well. Perhaps structure it as a tuple, with <module,entry> pairs, with only the module part being determined dynamically at module-load time. Indeed, we implemented something like this in a small secure-kernel + secure-module API that I helped develop for an embedded processor. Each module defined its set of entry points, and received a load-time assigned module ID. The module ID did require an extra level of indirection to find the entry point table for the module, so that was one downside.
Posted Aug 20, 2017 9:35 UTC (Sun)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Aug 20, 2017 9:47 UTC (Sun)
by jzbiciak (guest, #5246)
[Link] (2 responses)
It's still fixed tables of addresses, converted to small, easily-validated numbers. I can validate that the module ID and index are both in-range with a single comparison each. The tables themselves can live in read-only memory as well. This limits the attacker to only being able to select among the fixed list of entry points determined at compile time, while still permitting loadable modules. Now, if someone could modprobe evil.ko, then yeah, they can subvert this. But if they can do that, there are much shorter, more obvious paths to running their code.
Posted Aug 20, 2017 9:53 UTC (Sun)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Aug 20, 2017 10:16 UTC (Sun)
by jzbiciak (guest, #5246)
[Link]
I'm saying that each module provides its own table in an rodata type of section that can be mapped read-only. The only dynamic run-time bit, then, is a module ID, which indexes a writable table of pointers to read-only tables of function pointers. All of the tables of function pointers at least are read-only. Something like this very rough sketch: Now, since module_fxn_table_ptrs only needs modification at module load time, you could imagine spending the cost of establishing a writeable mapping to it during module load, and then discarding that writeable mapping once the module's table is registered. That means you'd only have read-only tables for these function pointers under normal circumstances, with a small window during module load where a writeable mapping exists. Of course, that idea assumes we don't have a full MMU to work with (as was the case on that embedded processor). We do have a full MMU at our disposal. So here's an even better idea that keeps every table read-only the entire time. Put the function pointer tables in their own dedicated multiple-of-the-page sized section. Let's call it .rodata.fxn_ptr. The kernel maps its own .rodata.fxn_ptr read-only at some virtual address determined at runtime (ASLR). As modules get loaded, map their .rodata.fxn_ptr pages directly after the kernel's, also read-only. The module ID now just becomes "how many pages to skip to get to the start of my module's table." That also removes the indirection. When a module gets unloaded, unmap its table. Now, you do have a new resource to manage. But, if you force everyone into no more than, say, 8K, that still gives you 1024 entry points on a 64-bit machine.
Posted Aug 21, 2017 15:45 UTC (Mon)
by josh (subscriber, #17465)
[Link]
There are already config options in the kernel that you might have to change if you want to build out-of-tree modules, such as not pruning away unused symbols that no in-tree module uses. This would simply be another such option.
Posted Aug 17, 2017 11:00 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (1 responses)
I'm afraid it takes only one exploit writer to develop a technique using only the available functions to defeat this mechanism...
Posted Aug 18, 2017 7:30 UTC (Fri)
by epa (subscriber, #39769)
[Link]
A canary for timer-expiration functions
A canary for timer-expiration functions
[2] http://www.openwall.com/lists/kernel-hardening/2017/08/05/1
A canary for timer-expiration functions
A canary for timer-expiration functions
A canary for timer-expiration functions
A canary for timer-expiration functions
A canary for timer-expiration functions
A canary for timer-expiration functions
A canary for timer-expiration functions
/* This part lives in the kernel */
void *const kernel_fxn_ptrs[] = { ... }; /* this is read-only */
void *const *module_fxn_table_ptrs[MAX_MODULES]; /* this is read-write */
/* This part lives in each module */
void *const module_fxn_ptrs[] = { ... }; /* this is read-only */
A canary for timer-expiration functions
the most an attacker can do, if they manage to overwrite the enum value, is to cause control to jump to a different one of the functions in the set.
A canary for timer-expiration functions
A canary for timer-expiration functions
