Cook: Security things in Linux v5.3

[Posted November 15, 2019 by corbet]

Kees Cook catches up with the security improvements in the 5.3 kernel. "In recent exploits, one of the steps for making the attacker’s life easier is to disable CPU protections like Supervisor Mode Access (and Execute) Prevention (SMAP and SMEP) by finding a way to write to CPU control registers to disable these features. For example, CR4 controls SMAP and SMEP, where disabling those would let an attacker access and execute userspace memory from kernel code again, opening up the attack to much greater flexibility. CR0 controls Write Protect (WP), which when disabled would allow an attacker to write to read-only memory like the kernel code itself. Attacks have been using the kernel’s CR4 and CR0 writing functions to make these changes (since it’s easier to gain that level of execute control), but now the kernel will attempt to 'pin' sensitive bits in CR4 and CR0 to avoid them getting disabled. This forces attacks to do more work to enact such register changes going forward."

Modify CR4 and CR0 directly using mov

Posted Nov 15, 2019 14:49 UTC (Fri) by zainryan (guest, #131584) [Link] (1 responses)

How about if the attacker directly execute "mov %0,%%cr4" rather than calling functions like native_write_cr4, then there is really no way to "pin", right?

Modify CR4 and CR0 directly using mov

Posted Nov 15, 2019 15:16 UTC (Fri) by farnz (subscriber, #17727) [Link]

If the attacker can execute arbitrary code in ring 0, they have full control of the kernel, and nothing done in kernel code can protect against them.

The point of pinning here is that there are techniques like ROP that allow you to call your choice of kernel code with your choice of arguments. By teaching native_write_cr4 to never unset some security-relevant bits, you've forced the attacker to find a harder route to arbitrary code execution than just "call native_write_cr4 to disable SMEP and then branch to my formerly userspace code". This, in turn, makes it less likely that any given bug will be actively exploited - it becomes harder to write a reliable exploit - and gives the kernel developers more time between "this bug permits ROP" and "this bug is being exploited in the wild".

Cook: Security things in Linux v5.3

Posted Nov 18, 2019 1:05 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (6 responses)

> Gustavo A.R. Silva landed the last handful of implicit fallthrough fixes left in the kernel, which allows for -Wimplicit-fallthrough to be globally enabled for all kernel builds. This will keep any new instances of this bad code pattern from entering the kernel again. With several hundred implicit fallthroughs identified and fixed, something like 1 in 10 were missing breaks, which is way higher than I was expecting, making this work even more well justified.

That is a frightening statistic. Thanks to Mr. Silva for fixing those bugs!

Cook: Security things in Linux v5.3

Posted Nov 19, 2019 19:25 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (5 responses)

The Linux 5.3-rc3 kernel (most recent kernel source I have here) contains 265,350 switch cases. If "several hundred" of these didn't have a break, this would be at most 999, ie 0.38%, and at least 200, 0.08%. Assuming that "something like 1 in 10" really means "1 in 10" and not "less than 1 in 10", 90% of these were correct, or, swapped and relative to the other base, between 0.038% and 0.0008% of the switch-cases in the tree were erroneously missing a break. Further, the fact that these missing breaks hadn't already been fixed strongly suggests that these were all switch-cases which were never tested, IOW, that they were essentially dead code.

I'm purposely not adding an interpretation here, just some missing facts and a very probable conjecture.

Cook: Security things in Linux v5.3

Posted Nov 19, 2019 19:38 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

The correct lower bound for the wrong cases is -0.008%, not 0.0008%.

Cook: Security things in Linux v5.3

Posted Nov 21, 2019 7:09 UTC (Thu) by bosyber (guest, #84963) [Link] (1 responses)

Though one mistake might under the right/wrong circumstances be enough if it can be leveraged. Thus, a small percent of a large number mainly shows how hard it is to get things secured; still, this comment does add some useful context to these numbers, thanks.

Cook: Security things in Linux v5.3

Posted Nov 21, 2019 18:59 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link]

The kernel has obviously improved because these bugs were fixed. But calling this "a frightening statistic" is IMHO seriously overemphasizing this: Chances are that more more critical errors were fixed in the 'ordinary' way during this period. There are also some more general conclusions to be drawn here:

the kernel (and probably "code written in C" in general) would hugely benefit from adding a proper multiway conditional to the language: For the 'large' estimate (999), the kernel source would contain 264,351 break; lines with a total size of about 1.51M bytes. That's a lot of text people wouldn't have to type.
as opposed to popular myths, the overwhelming majority of switch cases with or without break; is correct.
code review could obviously be improved as none of these missing breaks should ever have made it into the tree

Cook: Security things in Linux v5.3

Posted Nov 22, 2019 6:53 UTC (Fri) by roc (subscriber, #30627) [Link] (1 responses)

> Further, the fact that these missing breaks hadn't already been fixed strongly suggests that these were all switch-cases which were never tested, IOW, that they were essentially dead code.

"Essentially dead" is poorly defined, and I think fuzzy thinking in this area lures people into a false sense of security.

Just because path is rarely taken in real-world usage or in tests, doesn't necessarily mean it's hard for an attacker to cause that path to be taken. For example lots of error handling code is easy to trigger but won't be triggered unless someone specifically targets it.

Cook: Security things in Linux v5.3

Posted Nov 22, 2019 19:02 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link]

I was writing about kernel runtime failures caused by coding errors, ie, not about security but about safety. As my statement doesn't make any sense when interpreted in the way you have chosen to interpet it, the possibility that the interpretation could be wrong should have suggested itself.