Impedance matching for BPF and LSM

By Jake Edge
February 26, 2020

The "kernel runtime security instrumentation" (KRSI) patch set has been making the rounds over the past few months; the idea is to use the Linux security module (LSM) hooks as a way to detect, and potentially deflect, active attacks against a running system. It does so by allowing BPF programs to be attached to the LSM hooks. That has caused some concern in the past about exposing the security hooks as external kernel APIs, which makes them potentially subject to the "don't break user space" edict. But there has been no real objection to the goals of KRSI. The fourth version of the patch set was posted by KP Singh on February 20; the concerns raised this time are about its impact on the LSM infrastructure.

The main change Singh made from the previous version effectively removed KRSI from the standard LSM calling mechanisms by using BPF "fexit" (for function exit) trampolines on the LSM hooks. That trampoline can efficiently call any BPF programs that have been placed on the hook without the overhead associated with the normal LSM path; in particular, it avoids the cost of the retpoline mitigation for the Spectre hardware vulnerability. The KRSI hooks are enabled by static keys, which means they have zero cost when they are not being used. But it does mean that KRSI looks less like a normal LSM as Singh acknowledged: "Since the code does not deal with security_hook_heads anymore, it goes from 'being a BPF LSM' to 'BPF program attachment to LSM hooks'."

Casey Schaufler, who has done a lot of work on the LSM infrastructure over the last few years, objected to making KRSI a special case, however: "You aren't doing anything so special that the general mechanism won't work." Singh agreed that the standard LSM approach would work for KRSI, "but the cost of an indirect call for all the hooks, even those that are completely unused is not really acceptable for KRSI’s use cases".

Kees Cook focused on the performance issue, noting that making calls for each of the hooks, even when there is nothing to do, is common for LSMs. Of the 230 hooks defined in the LSM interface, only SELinux uses more than half (202), Smack is next (108), and the rest use less than a hundred—several use four or less. He would like to see some numbers on the performance gain from using static keys to disable hooks that are not required. It might make sense to use that mechanism for all of LSM, he said. Singh agreed that it would be useful to have some performance numbers; "I will do some analysis and come back with the numbers." It is, after all, a bit difficult to have a discussion about improving performance without some data to guide the decision-making.

There are several intertwined pieces to the disagreement. The LSM infrastructure has generally not been seen as a performance bottleneck, at least until the advent of KRSI; instead, over recent years, the focus has been on generalizing that infrastructure to support arbitrary stacking of multiple LSMs in a running system. That has improved the performance of handling calls into multiple hooks (when more than one LSM defines one for a given operation) over the previous mechanism, but it was not geared to the high-performance requirements that KRSI is trying to bring to the LSM world.

In addition, the stacking work has made it so that LSMs can be stacked in any order; each defined hook for a given operation is called in order, the first that denies the operation "wins" and the operation fails without calling any others. That infrastructure is in place, but KRSI upends it to a certain extent. KRSI comes from the BPF world, so the list traversal and indirect calls used by the LSM infrastructure are seen as performance bottlenecks. KRSI always places itself (conceptually) last on the list and uses the BPF trampoline to avoid that overhead. That makes it a special case, unlike the other "normal" stackable LSMs, but that may be a case of a premature optimization, as Cook noted.

BPF developer Alexei Starovoitov does not see it as "premature" at all, however. "I'm convinced that avoiding the cost of retpoline in critical path is a requirement for any new infrastructure in the kernel." He thought that the LSM infrastructure should consider using static keys to enable its hooks, and that the mechanism employed by KRSI should be used there:

Just compiling with CONFIG_SECURITY adds "if (hlist_empty)" check for every hook. Some of those hooks are in critical path. This load+cmp can be avoided with static_key optimization. I think it's worth doing.

[...] I really like that KRSI costs absolutely zero when it's not enabled. Attaching BPF prog to one hook preserves zero cost for all other hooks. And when one hook is BPF powered it's using direct call instead of super expensive retpoline.

But the insistence on treating KRSI differently than the other LSMs means that perhaps KRSI should go its own way—or work on improving the LSM infrastructure as a whole. Schaufler said that he had not "gotten that memo" on avoiding retpolines and that the LSM infrastructure is not new. He is interested in looking at using static keys, but is concerned that the mechanism is too specific to use in the general case, where multiple LSMs can register hooks to be called:

I admit to being unfamiliar with the static_key implementation, but if it would work for a list of hooks rather than a singe hook, I'm all ears.

The new piece is not KRSI per se, Singh said, but the ability to attach BPF programs to the security hooks is new. There are techniques available to make that have zero cost, so it makes sense to use them:

There are other tracing / attachment [mechanisms] in the kernel which provide similar [guarantees] (using static keys and direct calls) and it seems regressive for KRSI to not leverage these known patterns and sacrifice performance [especially] in hotpaths. This provides users to use KRSI alongside other LSMs without paying extra cost for all the possible hooks.

[...] My analogy here is that if every tracepoint in the kernel were of the type:

if (trace_foo_enabled) // <-- Overhead here, solved with static key
   trace_foo(a);  // <-- retpoline overhead, solved with fexit trampolines

It would be very hard to justify enabling them on a production system, and the same can be said for BPF and KRSI.

The difficulty is that the LSM interface came about under a different set of constraints, Schaufler said. Those constraints have changed over time and the infrastructure is being worked on to improve its performance, but it still needs to work with the existing LSMs:

The LSM mechanism is not zero overhead. It never has been. That's why you can compile it out. You get added value at a price. You get the ability to use SELinux and KRSI together at a price. If that's unacceptable you can go the route of seccomp, which doesn't use LSM for many of the same reasons you're on about.

When LSM was introduced it was expected to be used by the lunatic fringe people with government mandated security requirements. Today it has a much greater general application. That's why I'm in the process of bringing it up to modern use case requirements. Performance is much more important now than it was before the use of LSM became popular.

[...] If BPF and KRSI are that performance critical you shouldn't be tying them to LSM, which is known to have overhead. If you can't accept the LSM overhead, get out of the LSM. Or, help us fix the LSM infrastructure to make its overhead closer to zero. Whether you believe it or not, a lot of work has gone into keeping the LSM overhead as small as possible while remaining sufficiently general to perform its function.

The goal of eliminating the retpoline overhead is reasonable, Cook said, but the LSM world has not yet needed to do so. "I think it's a desirable goal, to be sure, but this does appear to be an early optimization." He noted there is something of an impedance mismatch; the BPF developers do not want to see any performance hit associated with BPF, but the LSM developers "do not want any new special cases in LSM stacking". So he suggested adding a "slow" KRSI that used the LSM stacking infrastructure as it is today, followed by work to optimize that calling path.

But Starovoitov thought that KRSI should perhaps just go its own way. He proposed changing the BPF program type from BPF_PROG_TYPE_LSM to BPF_PROG_TYPE_OVERRIDE_RETURN and moving KRSI completely out of the LSM world: "I don't see anything in LSM infra that KRSI can reuse." He does not see a slow KRSI as an option and suggested that perhaps the LSM interface and the new BPF program type should be made mutually exclusive at kernel build time:

It may seem as a downside that it will force a choice on kernel users. Either they build the kernel with CONFIG_SECURITY and their choice of LSMs or build the kernel with CONFIG_BPF_OVERRIDE_RETURN and use BPF_PROG_TYPE_OVERRIDE_RETURN programs to enforce any kind of policy. I think it's a pro not a con.

There are, of course, lots of users of the LSM interface, including most distributions, so it might difficult to go that route, Schaufler said. But Singh noted that the users of a BPF_PROG_TYPE_OVERRIDE_RETURN feature may be highly performance-sensitive such that they already disable LSMs "because of the current performance characteristics". But Singh did think that BPF_PROG_TYPE_OVERRIDE_RETURN might be useful on its own, separate from the KRSI work; he plans to split that out into its own patch set. He agreed with Cook's approach, as well, and plans to re-spin the KRSI patches to use the standard LSM approach as a starting point; "we can follow-up on performance".

The clash here was the classic speed versus generality tradeoff that pops up in the kernel (and elsewhere) with some frequency. The BPF developers are laser-focused on their "baby"—and squeezing every last drop of performance out of it—but there is a wider world in kernel-land, some parts of which have different requirements. It would seem that a reasonable compromise has been found here. Preserving the generality of the LSM approach while gaining the performance improvements that the BPF developers have been working on would be a win for both, really—taking the kernel and its users along for the ride.

Index entries for this article
Kernel	BPF/Security
Kernel	Security/Security modules

Impedance matching for BPF and LSM

Posted Feb 26, 2020 23:27 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (20 responses)

> When LSM was introduced it was expected to be used by the lunatic fringe people with government mandated security requirements. Today it has a much greater general application.
Uh no. SELinux is still pretty much in the same niche.

Impedance matching for BPF and LSM

Posted Feb 26, 2020 23:54 UTC (Wed) by TheJH (subscriber, #101155) [Link] (19 responses)

SELinux runs on every (non-ancient) Android phone.

Impedance matching for BPF and LSM

Posted Feb 26, 2020 23:55 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (17 responses)

Thank you for confirming my post. Modern Android OS is about as far from regular Unix as it can get.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 5:11 UTC (Thu) by re:fi.64 (subscriber, #132628) [Link]

Your post was saying it was still in the same niche. Regardless of Android's status, it's still using SELinux, this it's pretty easy to say it has acquired widespread use and more notably on consumer devices.

Even then, RHEL uses SELinux by default and has widespread use in enterprise.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 10:47 UTC (Thu) by beagnach (guest, #32987) [Link] (15 responses)

Using the word "niche" to describe a technology with an install base in the billions doesn't really feel quite right.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 18:01 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Yet Android is a niche system, it's not a general-purpose operating system. It's specifically designed for one particular use-case with all the design decisions being guided by it.

So as a result, Android now looks almost nothing like Unix.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 18:20 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (12 responses)

>Yet Android is a niche system, it's not a general-purpose operating system

You would have to discount Android, Chrome OS, RHEL/CentOS, Fedora, CoreOS and several others

There are more mobile/tablet/chromebook users using their devices for all sorts of things. I think that would qualify as general purpose anyway

Impedance matching for BPF and LSM

Posted Feb 27, 2020 19:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

Again, these are all niche users, very much in line with "crazy NSA-like tailored environments" where a small cabal of engineers produces a package and end-users are not supposed to tinker with it.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 19:50 UTC (Thu) by pizza (subscriber, #46) [Link] (3 responses)

Android's SELinux-enabled "niche" has between one and two orders of magnitude larger deployment than every other use of the Linux kernel combined.

(If anything, "general purpose UNIX-like Linux" is the actual niche use case these days..)
("niche" does not mean "

Impedance matching for BPF and LSM

Posted Feb 27, 2020 19:53 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Sure. As I said, Android or CoreOS are basically examples of NSA-like crazy environments. They are specifically designed to be inflexible with as few tuning knobs accessible by end-users (or application developers) as possible.

It's no wonder that SELinux can work within these environments.

SELinux still fails within environments that require flexibility or extensibility.

Impedance matching for BPF and LSM

Posted Feb 29, 2020 12:31 UTC (Sat) by cpitrat (subscriber, #116459) [Link]

Reading this thread gives me the impression that you talked about niche usage not knowing it was widely used and are now calling anything that uses it a niche usage just to avoid admitting you're wrong. I may be wrong of course, but your definition of niche usage seems very unusual. I'd say your usage of niche is a niche usage.

Impedance matching for BPF and LSM

Posted Feb 29, 2020 20:45 UTC (Sat) by zlynx (guest, #2285) [Link]

> SELinux still fails within environments that require flexibility or extensibility.

Only with administrators who can't be bothered to learn how it works.

This reminds me of PHP web developers who can't be bothered to learn Unix file permissions and mark everything chmod 777.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 20:13 UTC (Thu) by SEJeff (guest, #51588) [Link] (6 responses)

Redhat Enterprise Linux nice and "crazy NSA-like tailored environments"? As is my laptop currently running Fedora? I'm a longtime fan of your comments, but this is a bit much Cyberax.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 20:18 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

I've yet to see Enterprise RedHat with SELinux in enforcing mode. I've only heard that they exist somewhere.

Many RedHat forks (Amazon Linux, Scientific Linux) also pretty much ignore SELinux and barely test it.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 20:39 UTC (Thu) by mohg (guest, #114025) [Link] (1 responses)

I've used Scientific Linux (6, 7) and CentOS (8) with SELinux enforcing (on 7 and 8; can't remember about 6) for 6+ years. Works fine for me. I find it a well documented and implented feature.

As a binary rebuild of RHEL, Scientifix Linux supports whatever the equivalent RHEL does.
I have no idea in what sense it could be said to "pretty much ignore SELinux".

Impedance matching for BPF and LSM

Posted Feb 27, 2020 20:46 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> As a binary rebuild of RHEL, Scientifix Linux supports whatever the equivalent RHEL does.
> I have no idea in what sense it could be said to "pretty much ignore SELinux".
The problem is that SL doesn't do anything with SELinux. If you use it as a RHEL rebuild it works just as RHEL.

However, plenty of software doesn't support it. Like SUN (RIP) Grid Engine forks, or good old Hadoop.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 20:59 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

>I've yet to see Enterprise RedHat with SELinux in enforcing mode. I've only heard that they exist somewhere.

I have worked in multiple large enterprises which had SELinux in enforcing mode. I am not sure what this argument is about

Impedance matching for BPF and LSM

Posted Feb 29, 2020 2:14 UTC (Sat) by Rudd-O (guest, #61155) [Link]

All my machines run Fedora, and all run in enforcing mode.

Perhaps the "niche" is only on your mind, brah.

Impedance matching for BPF and LSM

Posted Mar 3, 2020 17:45 UTC (Tue) by frostsnow (subscriber, #114957) [Link]

As a counter to all the "we use Linux in enforcing mode" comments, at my current position we systematically disable SELinux & in order to not run into arcane permission issues.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 22:34 UTC (Thu) by beagnach (guest, #32987) [Link]

> Android is a niche system, it's not a general-purpose operating system.

Again... "niche" and billions just don't seem to fit together. If you're wanting to contrast with "general purpose" why not just use the term "special purpose"?

Sorry to be nit-picky but I find your use of that term in this context quite jarring.

Impedance matching for BPF and LSM

Posted Feb 27, 2020 17:04 UTC (Thu) by theonewolf (guest, #118690) [Link]

SELinux is also a core piece of Red Hat Enterprise CoreOS and the OpenShift distribution of Kubernetes.

That makes it being used in Microsoft Azure (OpenShift offering) and other places where OpenShift is being deployed (AWS, on premise).