LWN.net Logo

Advertisement

GStreamer, Embedded Linux, Android, VoD, Smooth Streaming, DRM, RTSP, HEVC, PulseAudio, OpenGL. Register now to attend.

Advertise here

BSD-style securelevel comes to Linux — again

By Jonathan Corbet
September 11, 2013
Most of the hand-wringing over the UEFI secure boot mechanism has long passed; those who want to run Linux on systems with secure boot enabled are, for the most part, able to do so. Things are quiet enough that one might be tempted to believe that the problem is entirely solved. As it happens, though, the core patches that implement the lockdown that some developers think is necessary for proper secure boot support still have not made their way into the mainline. The developer behind that work is still trying to get it merged though; in the process, he has brought back an old idea that was last rejected in 1998.

By Matthew Garrett's reading of the secure boot requirements, a system running in secure boot mode must not allow any user to change the running kernel; not even root is empowered to do so. Just over one year ago, Matthew posted a set of patches that implemented the necessary restrictions. In secure boot mode (as defined by the absence of a new capability called, at that time, CAP_SECURE_FIRMWARE), the kernel would not allow the loading of unsigned kernel modules, direct access to I/O ports or I/O memory, or, most controversially, use of the kexec_load() system call to reboot directly into a new kernel. As one might expect, not everybody liked this type of restriction, which flies in the face of the longstanding Unix tradition of giving root enough rope to shoot itself in the foot.

So there were discussions around various aspects of these patches, but one of the biggest problems only came to light later. It seems that there is a fundamental flaw in the capability model: it is nearly impossible to add new capability bits without risking problems with applications that do not know about the new bits. In particular:

  • Some capability-aware applications work by turning off every capability that they do not think they need. If a new bit is added controlling functionality that such an application uses, it will unknowingly disable a necessary capability and cease to work properly. From the point of view of users of this application, this kind of change constitutes an incompatible ABI change.

  • Other applications work in a blacklist-oriented mode, turning off capabilities that are known not to be needed. In essence, such an application simply sets the capability mask to zero, then sets the bits corresponding to the capabilities it wants. If some sort of security-related functionality is put behind a new bit that is unknown to this kind of application, that application will leave the capability enabled. That, in turn, could make the application insecure.

In this case, the biggest risk is that whitelist-style applications would inadvertently turn off CAP_SECURE_FIRMWARE, essentially putting themselves into secure boot mode even if the system as a whole is not running in that mode. That could cause things to break in mysterious ways. What it comes down to is that, if one is designing a capability-based system, one really must come up with the full list of needed capabilities at the outset. Back in 1998, when capabilities for Linux were being hashed out, nobody had UEFI secure boot in mind. So there is no relevant capability bit available, and adding one now is not really an option.

More recently, Matthew posted a new patch set that eliminates the new capability. Instead, all of the secure boot restrictions were tied to the existing flag controlling whether unsigned kernel modules can be loaded. Matthew's reasoning was that the restriction on module loading exists to prevent the loading of arbitrary code into the running kernel, so it made sense to lock down any other functionality that might make it possible to evade that restriction. Other developers disagreed, though, saying that they needed the ability to restrict module loading while still allowing other functionality — kexec_load() in particular — to be used normally. After some discussion, Matthew backed down and withdrew the patches.

Eventually he came back with what he called his final attempt at providing a kernel lockdown facility that wasn't tied to the secure boot mechanism itself. This time around, we have a new sysfs file at /sys/kernel/security/securelevel that accepts any of three values. If it is set to zero (the default), everything works as it always has, with no new restrictions. Setting it to one invokes "secure mode," in which all of the restrictions related to secure boot go into effect. Secure mode is also irrevocable; once it has been enabled, it cannot be disabled (short of compromising the kernel, at which point the battle is already lost). There is also an interesting "permanently insecure" mode obtained by setting securelevel to -1; the system's behavior is the same as with a setting of zero, but it is no longer possible to change the security level.

In the UEFI secure boot setting, the bootstrap code would take pains to set securelevel to one before allowing any processes to run. That helps to avoid race conditions where the system is subverted before the lockdown can be applied.

Some readers will, by now, have recognized that "securelevel" looks an awful lot like the BSD functionality that goes by the same name; it was clearly patterned after BSD's version. Amusingly, this is not the first time that securelevel has been considered for Linux; there was an extensive discussion on the subject in early 1998, when Alan Cox was pushing strongly for a securelevel feature. At that time, Linus rejected the feature because he had something much better in mind: capabilities. As is usually the case, Linus won out, and Linux got capabilities instead of securelevel.

More than fifteen years later, it seems that we might just end up with both mechanisms. Thus far, Matthew's latest patch set has not resulted in many screams of agony, so it might just pass review this time — though, at this point, it is almost certainly too late for 3.12. Meanwhile, Vivek Goyal has posted the first version of a signed kexec patch set that would limit kexec_load() to signed images. That would allow some useful features (kdump, for example) to continue to work properly in the secure boot environment without leaving kexec_load() completely open. That, too, will make the secure boot restrictions a bit more palatable and increase their chances of being merged.


(Log in to post comments)

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 19:17 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

>What it comes down to is that, if one is designing a capability-based system, one really must come up with the full list of needed capabilities at the outset.

That's what always bugs me with Linux capabilities - they are not. They are actually _roles_, not capabilities.

In a true capability-based system there's no question about the full list of caps, an application MUST use caps to work with resources. I.e. process must use "start_process_cap" object to start new processes and it would fail if it can't get access to it. It also makes little sense for processes themselves to drop caps, because they won't have anything unnecessary in their environment already (in a properly set system).

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 20:11 UTC (Wed) by geofft (subscriber, #59789) [Link]

This is complicated because securing ring 0 is something of a new hardware nuance / functionality that didn't hugely make sense on old hardware -- anyone with the ability to write to the boot disk as fsuid 0 used to be able to get their code in kernelspace trivially, so that security distinction didn't make sense, but now it does.

If you wanted to be really conservative, then "load_modules_for_secure_boot" would be a totally different capability from "load_modules", and the latter wouldn't do anything if the machine and kernel was booted in secure-boot mode. Same with "access_io_ports_for_secure_boot", "kexec_load_for_secure_boot", etc. But then you've broken all userspace at once because there are new capabilities to match new hardware functionality. (Which is sensible -- if the capability system was designed before e.g. USB, then maybe "access_usb" is a new capability.)

Even though technically you haven't broken userspace, and old userspace works in non-secure-boot mode, but not with secure boot enabled, in practice nobody is going to be pleased with that argument.

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:09 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

The point of a true cap-based system is that the environment itself takes care of capabilities.

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:11 UTC (Wed) by geofft (subscriber, #59789) [Link]

I'm not sure I follow -- can you expand on what you mean by "the environment itself"?

Do you mean that the hardware/firmware platform is supposed to provide enumerated capabilities to the OS?

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:17 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

No. A capability-based OS would simply not provide required caps to programs that don't need them.

Also, the capability-based security literature means quite a different thing by 'capabilities'. The actual example of capabilities in Linux are not security cap. bits, but file handles. A program can securely transfer them, use them and can't forge them.

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:26 UTC (Wed) by geofft (subscriber, #59789) [Link]

Yeah, I'm using "capability" here in the research-literature sense, not in the Linux sense.

I think "don't provide capabilities to programs that don't need them" is so underspecified as to not be useful. Let's take the USB example -- say a process is the USB driver on some awesome microkernel architecture. Then USB 3 shows up, and something in the USB 3 spec means that several users want to be more careful about what can speak to USB 3 host controllers (maybe it interacts with power consumption), but several other users also don't care. Should the USB capability -- the ability to drive any USB 1 or 2 host controller on the system -- also grant access to a USB 3 host controller? "Yes" means that you've lost some of the security promise of a capability architecture; "no" means that the users who don't care complain about breaking userspace.

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 22:51 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Nope. USB3 controller would have its own capability, so only processes that need it would be able to get access to it.

If USB3 needs some special handling then this logic would be encapsulated in some kind of server process.

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 23:01 UTC (Wed) by geofft (subscriber, #59789) [Link]

Yes, but now you took an OS where you could plug in a flashdrive and have it work, changed hardware, and now that no longer works without modifications to userspace. By making USB3 a separate capability, you've broken userspace.

(Or so goes the argument against adding a new Linux-style capability for Secure Boot.)

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 23:13 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Sure, but these modifications will be confined to policies. End-user programs won't have to be modified in any way.

BSD-style securelevel comes to Linux — again

Posted Sep 12, 2013 20:22 UTC (Thu) by zooko (subscriber, #2589) [Link]

Sigh. I wonder how much damage has been done by Linux using the word "capabilities" for their non-capabilities access control scheme?

"It seems that there is a fundamental flaw in the capability model: it is nearly impossible to add new capability bits without risking problems with applications that do not know about the new bits."

If you mean Linux's non-capabilities "capabilities", then yes! Your article succinctly explains the fundamental problem with them. If you mean real capabilities, then no! Real capability systems do not have this problem.

Blame POSIX not Linux

Posted Sep 12, 2013 22:26 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

The terminology problem is from POSIX, not Linux. There was a POSIX group ("POSIX Security Extensions") that defined a draft spec that used the term "capabilities" for something completely different than what many other people called capabilities. Linux implemented that draft POSIX spec, and thus uses its terminology.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 1:33 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I've always rejected the concept of a program dropping privileges, and I run systems in which untrusted programs never do that and instead receive an environment with only the capabilities they need. It's based on regular Linux kernel capabilities, though: the program capexec sets the privileges (capabilities, uid, gid) of a new process and then execs the untrusted program. Process 1 has all capabilities, but system configuration determines what lesser capabilities all the other processes have.

Whether a program chooses its own capabilities or some OS facility establishes them, it seems to me the issue of changing the capabilities in future kernel releases is the same. If you make a certain capability bit give less privilege in Release 2 than it did in Release 1, you'll have trouble. If you never do, you can't ever tighten security.

And sometimes, it's a matter of opinion whether a certain capability bit is more powerful in Release 2 because the set of things that are possible in the two releases is different.

BSD-style securelevel comes to Linux — again

Posted Sep 19, 2013 21:29 UTC (Thu) by mm7323 (guest, #87386) [Link]

I've always rejected the concept of a program dropping privileges, and I run systems in which untrusted programs never do that and instead receive an environment with only the capabilities they need.

Being able to drop caps can be useful to programs which may startup, perform some privileged actions, then drop the caps that aren't needed.

The simplest example would be a server process that wished to bind to a low port (<1024) using CAP_NET_BIND_SERVICE. Once the bind() is done, the capability can be dropped, but the already obtained file descriptor for the socket stands and can still be used.

There are other ways this could be done, but using libcap to drop capabilities at the right time is straight forward to implement.

BSD-style securelevel comes to Linux — again

Posted Sep 19, 2013 22:48 UTC (Thu) by hummassa (subscriber, #307) [Link]

Why not
chown nginx.nginx /dev/port/80
and then starting nginx as user nginx, group nginx? no privileges used, only at install-time...

BSD-style securelevel comes to Linux — again

Posted Sep 20, 2013 0:41 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

because you would have to create the whole infrastructure for /dev/port/80 first. and if you can get agreement from everyone on how that would work, you will have done something very impressive.

BSD-style securelevel comes to Linux — again

Posted Sep 20, 2013 11:55 UTC (Fri) by cortana (subscriber, #24596) [Link]

Perhaps I'm naïve in hoping that systemd's tmpfiles.d mechanism can be adopted more widely.

echo c /dev/port/80 0700 nginx nginx - maj:min > /etc/tmpfiles.d/nginx.conf

BSD-style securelevel comes to Linux — again

Posted Sep 20, 2013 2:18 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Well, the cases I can think of that this scheme misses as-is are: what interface for port 80? TCP or UDP? Which IP addresses? Then how to persist that information (udev I assume). That's a lot of bikesheds.

BSD-style securelevel comes to Linux — again

Posted Sep 20, 2013 1:46 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

The simplest example would be a server process that wished to bind to a low port (<1024) using CAP_NET_BIND_SERVICE. Once the bind() is done, the capability can be dropped, but the already obtained file descriptor for the socket stands and can still be used.

That's exactly what I object to and what I don't do on my system. Instead, a privileged program dedicated to binding sockets runs with CAP_NET_BIND_SERVICE and binds the socket, then execs the server program. The server program inherits the file descriptor, but no capabilities.

That way, I don't have to trust the server program to use CAP_NET_BIND_SERVICE properly, and drop it properly. I only have to trust the one binder program, which does the job for everyone and is very stable.

(And besides, I don't like having all that duplicate socket setup code in every server program - another good reason to have a separate program dedicated to that).

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:15 UTC (Wed) by dashesy (subscriber, #74652) [Link]

Overall the secure boot seems to be useful to machine owners. I just wish the highest security mode was called "OWNER" so that later if one is locked in a Linux system, he knows he is not the real owner, but maybe just on a lease. It would be even cooler if one had to write the name of the owner to the sysfs file to enable this mode, so anyone could cat /sys/owner

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 21:57 UTC (Wed) by bronson (subscriber, #4806) [Link]

On most machines I picture:

$ cat /sys/owner
wheel

Seems highly unlikely that even 0.01% of users would bother changing it from the default...?

BSD-style securelevel comes to Linux — again

Posted Sep 11, 2013 22:11 UTC (Wed) by dashesy (subscriber, #74652) [Link]

Since only owners (and not even root) can change hypothetical /sys/owner a user can change it if she is the owner. If you buy a phone, and cat /sys/owner shows Random Vendor, and cannot change that value, then you have just leased the phone, but at least you know this up front. It would be interesting to buy a car with IVI, and look at its owner.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 1:17 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

You seem to be saying if one isn't running with full privileges, then one is not the owner of the system (someone else is), and that misses the point of secure boot.

Secure boot is about saying, "I own this system, but don't let me modify my kernel." Reason: someone might trick me into trying to modify the kernel against my interests. Or I could be walking in my sleep.

It's like a werewolf chaining himself up at sunset on a full moon night.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 1:28 UTC (Fri) by dashesy (subscriber, #74652) [Link]

As long as there is anyway to own the device, you are the owner. If it requires soldering (or connecting BIOS to flash programmer) though, that does not count.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 2:08 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

You lost me in the circular definition: anyone who is capable of owning is the owner. In normal English, anyone who actually does own is the owner. This appears to parse as, "the owner is a person who is capable of being the owner."

So who is the person identifed in the sysfs file? The person who owns or the person who is capable of owning (there could be many or none, I guess). Likewise, does the OWNER security mode mean programs have the privileges of owning or just are capable of getting them?

What it seems to come around to is that the highest security mode has to be called something other than OWNER in order for it to make any sense for a person to choose to run in a lower mode -- and that choice does make sense.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 17:19 UTC (Fri) by rsidd (subscriber, #2582) [Link]

I think you mean "pwn", not "own" :) In normal English, owners are not the same as superusers or sysadmins or vendors.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 17:25 UTC (Fri) by dashesy (subscriber, #74652) [Link]

Good point :)

Well for me I own a machine if I can do whatever I want with it (of course as long as it does not hurt others). Maybe I should have phrased it this way: I do not own a system if I cannot change /sys/owner name.

BSD-style securelevel comes to Linux — again

Posted Sep 12, 2013 12:56 UTC (Thu) by nsheed (subscriber, #5151) [Link]

"...giving root enough rope to shoot itself in the foot" - what kind of Rube Goldberg api call is that then ?

BSD-style securelevel comes to Linux — again

Posted Sep 12, 2013 13:55 UTC (Thu) by fuhchee (subscriber, #40059) [Link]

(Hey, don't ruin the joke by belabouring it!)

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 3:41 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

Actually, what several of us (including HPA) in the thread were calling for was not for this to use the existing Linux capabilities (i.e. something tied to a running process, or filesystem point)

but instead to be using capabilities in the general sense, a bitmask that enables/disables things feature by feature for the entire system.

This doesn't suffer the major nightmare of the current per process capabilities system does.

It would also allow for a system to be locked down MORE than what Matthew is looking for, allowing the lockdown to be used by more people.

for example, you may want a lockdown capability that disabled ALL module loading after a specific point in the boot process

or one that completely disabled the ability to mount a device.

such lockdown capabilities would be very useful to have on a voting machine for example.

Yes, Matthew does have a point in that there is the possibility of someone adding a "do_evil()" syscall and a corresponding "prevent_do_evil" lockdown capability. In such a case, someone trying to run a locked down machine with a new kernel, but an old userspace would inadvertently allow the do_evil() call.

But if someone is really trying to run a locked down system, why would they be upgrading the kernel without upgrading the corresponding userspace? as long as triggering unknown lockdown capabilities doesn't cause an error, the new userspace will run just fine on the old kernel.

The idea that there is a one-size-fits-all definition of what a securelevel locked down system should consist of is just faulty. Different people with different use cases will want different amounts of lockdown. what's very reasonable for one person is completely wrong for another.

David Lang

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 8:13 UTC (Fri) by ernest (subscriber, #2355) [Link]

>>But if someone is really trying to run a locked down system, why would they be upgrading the kernel without upgrading the corresponding userspace? as long as triggering unknown lockdown capabilities doesn't cause an error, the new userspace will run just fine on the old kernel.

There can be many good reasons for this of course: Upgrade of some hardware in the old but otherwise perfectly fine system? maybe something broke down but can only be replaced by something too new for the current kernel.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 8:56 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

there are good reasons to upgrade the kernel, but are there good reasons to upgrade the kernel without being willing to upgrade anything else?

remember, this isn't the home user we are talking about here, this is someone who is trying to lock down the system in a way that even root can't change it.

anyone going to that much effort isn't going to be randomly upgrading one component.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 19:36 UTC (Fri) by khim (subscriber, #9252) [Link]

there are good reasons to upgrade the kernel, but are there good reasons to upgrade the kernel without being willing to upgrade anything else?

Depends on your definition of “anything”. Kernel is often upgraded if you need/want to support new hardware capabilities. Sometimes you then need to upgrade some low-level components (things like modproble), but you don't expect to change the setup of the whole system just because you've installed new CPU and want to use AVX512 in your programs.

BSD-style securelevel comes to Linux — again

Posted Sep 14, 2013 1:26 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

remember that we are not talking about POSIX capabilities that your programs all need to know about.

We are talking about features that you can turn off to lock your machine down (protecting it even from root)

I would expect that there will be one program to do this, and it will probably be executed exactly once per boot cycle. (unless it's a developers machine)

So saying that if you upgrade the kernel and are trying to lock down the machine, you need to check for new lockdown flags that may have been introduced and decide if you want them doesn't seem at all unreasonable to me. In fact, it sounds like what would happen anyway with anyone competent dong a kernel upgrade, you would check new kernel compile options to see if something new pops up that may be a problem.

Look at the namespace features for a perfect example.

BSD-style securelevel comes to Linux — again

Posted Sep 13, 2013 22:38 UTC (Fri) by hallyn (subscriber, #22558) [Link]

"... an old idea that was last rejected in 1998"

False.

https://lkml.org/lkml/2006/8/2/180

Need Capabilities + Incapabilities

Posted Sep 21, 2013 8:21 UTC (Sat) by ldo (subscriber, #40946) [Link]

Seems to me the answer to the issue of forward/backward compatibility when changing capability bits is to have two parts to the capability mask:
  • A set of bits like the present one, where each set bit gives the process some ability. Existing bits should never be overloaded to add new abilities, as that could compromise the security of existing applications; instead, new abilities require new capability bits to enable them. Correspondingly, no new restrictions should be imposed on existing capability bits, to avoid breaking the functionality of existing applications; a particular bit, once defined, will always refer to the same set of abilities.
  • A new set of incapability bits. These start out set to 1 for every process. As new security restrictions need to be added to the capability system, new bits can be assigned here that, when cleared, impose those restrictions. Leaving those bits set (the default) means the restrictions are not imposed on the process.

So the convention for applications is

  • whitelist the capability bits, and
  • blacklist the incapability bits.

That is, start out with the capability mask all 0, and the incapability mask all 1; set all the bits in the former for things that you know you need, and clear all the bits in the latter for things that you know you don’t need. Leave everything you don’t know about in its default state; 0 for capabilities, 1 for incapabilities. This will ensure maximum compatibility in the face of changes to the security model in the future.

Thoughts?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds