Security requirements for new kernel features

By Jonathan Corbet
July 28, 2022

The relatively new io_uring subsystem has changed the way asynchronous I/O is done on Linux systems and improved performance significantly. It has also, however, begun to run up a record of disagreements with the kernel's security community. A recent discussion about security hooks for the new uring_cmd mechanism shows how easily requirements can be overlooked in a complex system with no overall supervision.

Most of the operations that can be performed within io_uring follow the usual I/O patterns — open a file, read data, write data, and so on. These operations are the same regardless of the underlying device or filesystem that is doing the work. There always seems to be a need for something special and device-specific, though, and io_uring is no exception. For the kernel as a whole, device-specific operations are made available via ioctl() calls. That system call, however, has built up a reputation as a dumping ground for poorly thought-out features, and there is little desire to see its usage spread.

In early 2021, io_uring maintainer Jens Axboe floated an idea for a command passthrough mechanism that would be specific to io_uring. A year and some later, that idea has evolved into uring_cmd, which was pulled into the mainline during the 5.19 merge window. There is a new io_uring operation that, in turn, causes an invocation of the underlying device or filesystem's uring_cmd() file_operations function. The actual operation to be performed is passed through to that function with no interpretation in the io_uring layer. The first user is the NVMe driver, which provides a direct passthrough operation.

Missing security hooks

Just over one year ago, there was a bit of a disagreement after the developers of the kernels Linux Security Module (LSM) and auditing subsystems figured out that there were no security or auditing hooks in all of that new io_uring code. That put io_uring operations outside the control of any security module that a given system might be running and made those operations invisible to auditing. Those gaps were filled in, but not before the security developers expressed their unhappiness about how io_uring had been designed and merged without thought for LSM and audit support.

Given that, one might expect that the addition of a new feature like uring_cmd would have seen more involvement from the security community. To an extent, that happened; Luis Chamberlain posted a patch adding LSM support back in March. In short, it added a new security_uring_async_cmd() hook that would be called before passing a command through to the underlying code; it could examine that command and decide whether to allow or deny the operation. There were some disagreements over how well this would work; in particular, Casey Schaufler complained that security modules would have to gain an understanding of every device-specific command, which clearly would not scale well. The conversation wound down shortly thereafter.

When the new feature was pushed into the mainline, there was no LSM support included with it. On July 13, Chamberlain reposted his patch adding the new security hook. Schaufler was equally unimpressed this time around:

You're passing the complexity of uring-cmd directly into each and every security module. SELinux, AppArmor, Smack, BPF and every other LSM now needs to know the gory details of everything that might be in any arbitrary subsystem so that it can make a wild guess about what to do. And I thought ioctl was hard to deal with.

SELinux and audit maintainer Paul Moore agreed with that assessment. The end result, he said, was that security modules would be unable to distinguish between low-level operations, so they would end up simply enabling all io_uring passthrough commands for any given subsystem or none of them; "I think we can all agree that is not a good idea". He later acknowledged that there does not appear to be a better solution at hand and merging Chamberlain's patch looked like the only path forward: "Without any cooperation from the io_uring developers, that is likely what we will have to do". The current plan appears to be to get Chamberlain's patch into the mainline during the next merge window, with backports to the stable kernels to be done thereafter.

Grumpiness

This particular problem appears to be solved, albeit in a way that is less than satisfying to the security community. A better solution may materialize in the future, though providing a way to control access to device-specific functionality in a general way is a hard problem. But a harder problem may be addressing the residual grumpiness in the security community and preventing such problems from recurring in the future. As Moore put it:

I feel that expressing frustration about the LSMs being routinely left out of the discussion when new functionality is added to the kernel is a reasonable response; especially when one considers the history of this particular situation.

For his part, Axboe acknowledged that the security concerns should not have been allowed to fall through the cracks, but he didn't necessarily offer a lot of hope for changes in the future:

I guess it's just somewhat lack of interest, since most of us don't have to deal with anything that uses LSM. And then it mostly just gets in the way and adds overhead, both from a runtime and maintainability point of view, which further reduces the motivation.

Even when the motivation is there, mistakes can happen. Kernel development is a complex business. A lot of effort has gone into making the kernel sufficiently modular that developers need not worry about what is happening in the rest of the system, but there are limits to how far that process can go.

For example, developers must be aware of locking and the locking requirements of subsystems they call into or things may go badly wrong. Memory must be handled according to the constraints placed on the memory-management subsystem, and developers creating complex caches may have to implement shrinkers to release memory on demand. CPU hotplug affects many subsystems and must be taken into account. The same is true of power-management events. Changes to the user-space API can create unhappiness years later. Inattention to latency constraints may create trouble in realtime applications. A failure to properly document a subsystem will make life harder for developers and users — but they are all used to that by now.

And, of course, a failure to provide proper security hooks will hobble the ability of administrators to control process behavior by way of LSM policies.

The fact that developers do not always succeed in keeping all of these constraints in mind — and consequently make mistakes — is unsurprising. Catching such omissions is one of the reasons for the existence of the kernel's sometimes tiresome review process. But nothing ensures that a given change will be properly reviewed by, for example, a developer who understands the needs of Linux security modules, and there is little that forces the suggestions from any such review to be heeded.

So important things will occasionally fall through the cracks, and it is not clear that much can be done to improve the situation. It would be wonderful if more companies would pay developers to spend more time reviewing patches to provide, as an example, an overall security-oriented eye on code heading into the mainline, but that does not appear to be the world that we are living in. Attempts to impose requirements with a more bureaucratic process would mostly create friction and lead to the distribution of more out-of-tree (and severely unreviewed) code.

The best path toward improvement may be, as Axboe put it, "one subsystem being aware of another one's needs". Working toward that goal — and the ability to fix mistakes in the stable kernels when they do happen — seems to work reasonably well most of the time.

Index entries for this article
Kernel	Development model/Code review
Kernel	io_uring
Kernel	Security/Security modules

Security requirements for new kernel features

Posted Jul 28, 2022 14:51 UTC (Thu) by khuey (guest, #158560) [Link] (2 responses)

How is the situation here different from the situation with ioctls? Wouldn't security modules need to grok (or alternatively just ignore) every random ioctl command too?

Security requirements for new kernel features

Posted Jul 28, 2022 15:02 UTC (Thu) by magfr (subscriber, #16052) [Link]

I guess it is a question of an old and well known horror versus an entirely new horror.

In a perfect world all of the security stuff would be unnecessary but the world is sadly not perfect.

Security requirements for new kernel features

Posted Jul 28, 2022 22:51 UTC (Thu) by cschaufler (subscriber, #126555) [Link]

Security modules do have to deal with the hideousness of ioctls. SELinux introduces a sophisticated set of classes for them, but it's still somewhat wonkey. Smack relies on the correct use of ioctl command conventions (_IOC) by the driver implementations, even though the reliability of that is at best questionable. Neither is especially satisfactory. That's one reason there's a flap over io_uring_cmd. The collective community has had the opportunity to learn the lesson. It's disappointing that we have to have this bruhaha over and over.

Performance impact

Posted Jul 29, 2022 8:24 UTC (Fri) by zdzichu (subscriber, #17118) [Link] (23 responses)

io_uring, apart from being a nice async interface, seem to be all about raw speed. It is achieving millions of IOps per core. What if we add LSM hooks, audit hooks, etc. and performance goes down? In few years someone will come with "io_uring lite" without those speed bumps.

Performance impact

Posted Jul 29, 2022 13:01 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (22 responses)

I think that would be better solved by asking those that want "ricer car" speeds to just compile a kernel without any LSM hooks whatsoever.

Performance impact

Posted Jul 29, 2022 15:17 UTC (Fri) by jhoblitt (subscriber, #77733) [Link]

I suspect a more common response would be for LSM policies applied to containers to block io_uring syscalls.

Performance impact

Posted Jul 29, 2022 18:14 UTC (Fri) by josh (subscriber, #17465) [Link] (19 responses)

Please don't denigrate people who don't want the overhead of things like the audit subsystem. It is the job of the security subsystems to add zero overhead for people who aren't using them.

Performance impact

Posted Jul 30, 2022 20:52 UTC (Sat) by andresfreund (subscriber, #69562) [Link]

For production use having to build a custom kernel requires a decent scale to be a good decision. The overhead of various unused features in common distribution kernels is a problem.

Performance impact

Posted Aug 4, 2022 21:44 UTC (Thu) by cschaufler (subscriber, #126555) [Link] (17 responses)

Yes, and it's the job of the IO subsystems to allow for security enforcement for the systems that do want them. When LSM was introduced the additional restrictions provided were only used by a handful of government and affiliated agencies. Today the system that doesn't use security modules is an odd duck indeed. I seriously doubt you have any idea just how much of the work that goes into a security facility is focused on making sure that it performs well for those who don't know they want it or think they know they don't want it. Unfortunately, this often results in security features that are slower then they should be because they can't be properly integrated. This adds to the Common Wisdom that security impacts performance.

I cherish the memory of the Unix system that ran a sophisticated management program five to ten times faster when audit was enabled than when it wasn't. When the characteristics of disparate sub-systems provide mutual benefit it's a wonderful thing. You'll never know that can happen if you don't at least try.

Performance impact

Posted Aug 5, 2022 1:28 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

> When LSM was introduced the additional restrictions provided were only used by a handful of government and affiliated agencies.

And it's pretty much used in these situations now. SELinux is useful if you are a giant corp with a huge development staff that is OK with torturing themselves by writing SELinux policies.

> Today the system that doesn't use security modules is an odd duck indeed.

Like, pretty much all classic desktops? I've yet to see a developer with "serious" LSMs like SELinux turned on.

I think, some serious soul-searching on the side of LSM developers is in order.

Performance impact

Posted Aug 5, 2022 8:47 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

Dunno about the Red Hat side of things, but I'm pretty certain SUSE (SLES, OpenSUSE) comes with SELinux enabled and functional.

The OP said that Android comes with SELinux switched on.

I think you're forgetting that (a) your "classic desktop" is actually a niche use case for Linux, and (b) even then, all of the "big boys" - RH, Ubuntu, SUSE - probably do have SELinux switched on. It's just not that visible ...

My system is gentoo - of course I haven't enabled it. But as Linux goes, gentoo and stuff like that is very much the minority ...

Cheers,
Wol

Performance impact

Posted Aug 9, 2022 12:58 UTC (Tue) by anton (subscriber, #25547) [Link]

All I could find says that Ubuntu does not have SELinux enabled by default. You apparently don't count Debian among the big boys, but it does not have SELinux enabled by default, either.

Performance impact

Posted Aug 5, 2022 9:03 UTC (Fri) by mw_skieske (guest, #144003) [Link] (1 responses)

> Like, pretty much all classic desktops? I've yet to see a developer with "serious" LSMs like SELinux turned on.

Hi there!

Do you know every Fedora Desktop has, in fact SELinux in enforcing mode?

❯ getenforce
Enforcing

kind regards

People who don't do SELinux are just lazy.

Performance impact

Posted Aug 5, 2022 12:39 UTC (Fri) by corbet (editor, #1) [Link]

That last line could really have been done without; there is no need to insult people you disagree with on something like this. Please don't do that here.

Performance impact

Posted Aug 5, 2022 13:33 UTC (Fri) by pizza (subscriber, #46) [Link]

> Like, pretty much all classic desktops? I've yet to see a developer with "serious" LSMs like SELinux turned on.

Fedora and RHEL, at least, have SELinux on out of the box.

All but one of my Linux installations have SELinux enabled (the exception is a heavily-used snowflake shell server whose install predates this SELinux stuff), including my two daily-use desktops (and several others I am responsible for).

Additionally, nearly every Android device out there relies on SELinux to enforce app isolation.

Performance impact

Posted Aug 8, 2022 22:43 UTC (Mon) by cschaufler (subscriber, #126555) [Link] (10 responses)

As others have pointed out, all Redhat desktops ship with SELinux full-up enabled. Ubuntu ships with AppArmor. I personally have systems running with both SELinux and AppArmor. Some with Smack and AppArmor, too. And there's every cloud provider, every phone and almost every IoT device. When we search our souls it's not about whether we should do a better job of getting out of the way, it's about how we can provide more of the features developers are screaming for and still maintain performance. If you are content with access controls from the 1970's I'm fine with that. But that puts you as far outside the mainstream as Bell & LaPadula was in 1984.

Performance impact

Posted Aug 8, 2022 22:57 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

I have yet personally to see a person with SELinux enabled. I don't see Fedora desktops often, but all the people I know that are using them have SELinux disabled because they work on low-level stuff.

AppArmor in Ubuntu is a bit more sane, because it doesn't require crazy labelling and impenetrable policies.

> And there's every cloud provider

Not every. And I know how one large provider works internally (a couple of years outdated, but I doubt it has changed much).

Heck, here's what EC2 offers for their own supported in-house distribution:

> [ec2-user@ip-172-31-0-166 ~]$ getenforce
> Disabled

> When we search our souls it's not about whether we should do a better job of getting out of the way, it's about how we can provide more of the features developers are screaming for and still maintain performance.

How long did it take to build stackable LSMs? For a decade the inability to run multiple LSMs made anything but SELinux/AppArmor impractical.

Sorry. But right now LSMs are just an impediment that most people try to wave away so it won't bother them. Large companies like Google have time and money to invest in getting it into shape, sure. But that's a far cry from being a useful and productive feature. Unlike cgroups or namespaces that are widely accepted by developers.

Performance impact

Posted Aug 9, 2022 16:16 UTC (Tue) by cschaufler (subscriber, #126555) [Link] (7 responses)

Stacking LSMs could have been completed a decade ago had it not been for some of the design choices forced upon the security module developers to ensure that performance impact on systems that don't use LSMs is minimized. I understand and appreciate that in whatever subset of the Linux development community you reside LSM is not considered useful. I personally have little interest in the device driver infrastructure, which many developers consider most critical. I am concerned that work I do in LSM does not interfere with device drivers *to the extent possible*. If you need to blame CAP_SYS_ADMIN on somebody, I'm probably the best target. The Linux kernel does lots of things for lots of reasons. I have no idea what you work on, or why, but I'm willing to wager a refreshing beverage that we could have made security a much bigger pain than it is now.

Performance impact

Posted Aug 9, 2022 20:15 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> I understand and appreciate that in whatever subset of the Linux development community you reside LSM is not considered useful.

It's not that it's not useful, additional mitigations are great. It's that the amount of effort that needs to be expended to make use of SELinux is just not comparable with the amount of protection it provides. I long ago tried to make sense of policies and to create my own toy policies, but failed miserably. TOMOYO is rigorously undocumented and I haven't touched Smack because it doesn't even look in any way "simplified".

AppArmor is a bit better, since it at least doesn't require labelling across all of the filesystem which is nothing but security theater compared to just using paths. Its policies are also easier to understand.

One feature that I really personally would have liked is an ability to use LSMs to _grant_ permissions instead of taking them away.

Performance impact

Posted Aug 9, 2022 20:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

And yeah, at this point I'd just prefer a well-tailored ad-hoc solution like pledge()/unveil() in OpenBSD to the generalized LSM system that Linux has.

Authoritative hooks

Posted Aug 9, 2022 20:29 UTC (Tue) by corbet (editor, #1) [Link] (4 responses)

It seems you have one point of agreement with Casey, anyway: at one point, at least, he too wanted authoritative hooks in the LSM subsystem.

Authoritative hooks

Posted Aug 10, 2022 21:39 UTC (Wed) by cschaufler (subscriber, #126555) [Link] (3 responses)

Had we adopted authoritative LSM hooks the landscape would be very different indeed. Stacking of modules would have been impossible. What would happen if module A said "yes" and module B said "no"? You'd have to define some sort of peeking order for the modules, which wouldn't make each module authoritative now, would it? What we could do is refactor the traditional Linux discretionary controls into an LSM and insert it at the front of the list. You could then implement POSIX ACLs in an LSM, replacing the mode-bit only hooks with ACL cognizant ones. To forgo DAC all you would have to do is drop that module from the list. Now I suppose one might only want to drop certain of the controls (e.g. signal delivery ) and not the whole set. That's solvable, but hideous. Too much "fine granularity" for my taste.

Authoritative hooks

Posted Aug 11, 2022 0:09 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Had we adopted authoritative LSM hooks the landscape would be very different indeed. Stacking of modules would have been impossible. What would happen if module A said "yes" and module B said "no"?

Various systems (like IAM policies in AWS or ACLs in Windows) typically consider "Deny" to be a veto on any allowing ACLs/policies.

Authoritative hooks

Posted Aug 11, 2022 17:50 UTC (Thu) by cschaufler (subscriber, #126555) [Link] (1 responses)

This is exactly the "bail on fail" model of permissive hooks that we have today. What you can't do is what you had asked for, which is to provide a mechanism for a hook to grant access instead of denying it as would occur otherwise. We could make it possible, but that would have -- wait for it -- performance impact. :)

Authoritative hooks

Posted Aug 11, 2022 18:50 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> What you can't do is what you had asked for, which is to provide a mechanism for a hook to grant access instead of denying it as would occur otherwise.

That would actually help and make time investment into SELinux be worthwhile, as it will open up _new_ possibilities. Performance impact is another question, and it'd be interesting to see if removing the DAC entirely in favor of MAC would help.

Performance impact

Posted Aug 23, 2022 7:26 UTC (Tue) by daenzer (subscriber, #7050) [Link]

> I don't see Fedora desktops often, but all the people I know that are using them have SELinux disabled because they work on low-level stuff.

I've been working on the graphics stack (mostly between Mesa / mutter / Xwayland) as part of the Red Hat desktop group for 3 years. In this time, I've never had to disable SELinux on Fedora. AFAIK my colleagues are leaving it enabled as well.

There can sometimes be minor SELinux related issues when upgrading to a new beta release, but those are usually quickly fixed.

It seems to me what you think you know about this is hearsay and/or outdated.

Performance impact

Posted Jul 30, 2022 10:14 UTC (Sat) by Wol (subscriber, #4433) [Link]

Not knowing anything about io_uring, but I would have thought a "simple" fix was adding a security module pointer to the uring itself. If that contains a pointer, that is the "god" uring security monitor. Any io_uring call must register its security module with god, because, if god has been so configured, "no security module, no run ...".

That way, people who don't want the hassle/overhead just don't bother registering god with io_uring. People who are paranoid, or need accounting, or whatever, configure god to reject calls it doesn't know about (and the writers of said calls will quickly get bug reports saying "your io_uring call doesn't work - missing security module").

And if this is added *quickly*, before io_uring gets too embedded, it means that "no security module no run" is a realistic option. The later it gets left, the harder it gets to turn that on without all hell breaking loose ...

Cheers,
Wol

Security requirements for new kernel features

Posted Aug 15, 2022 18:03 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> For example, developers must be aware of locking and the locking requirements of subsystems they call into or things may go badly wrong. Memory must be handled according to the constraints placed on the memory-management subsystem, and developers creating complex caches may have to implement shrinkers to release memory on demand. CPU hotplug affects many subsystems and must be taken into account. The same is true of power-management events. Changes to the user-space API can create unhappiness years later. Inattention to latency constraints may create trouble in realtime applications. A failure to properly document a subsystem will make life harder for developers and users — but they are all used to that by now.

> And, of course, a failure to provide proper security hooks will hobble the ability of administrators to control process behavior by way of LSM policies.

My $DAYJOB recently introduced a checklist in the pull request template. It pertains mostly release notes and documentation, but I imagine it could at least help here. Of course people will ignore it, will mis-judge the requirements etc, but maybe in the case of the bigger pull requests someone will insist on it being at least seriously considered.