LSM stacking and the future

Posted Nov 25, 2019 22:14 UTC (Mon) by vadim (subscriber, #35271)
In reply to: LSM stacking and the future by Cyberax
Parent article: LSM stacking and the future

> It's academy-inspired uber-complicated all-powerful monsters that are inherently complicated.

Because interactions are complicated. Eg, I want Apache to listen on port 80, but not on port 22 under any circumstance. I want Apache to serve files from my home directory, but not my GPG keys.

Then you have messes like PAM, which are pretty tricky to secure.

> It's not like we haven't see alternatives. OpenBSD has very practical and extremely useful unveil()/pledge() support, for example. Which is STILL impossible to express completely in Linux even with unholy brew of eBPF and SELinux.

Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.

But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.

> No. My point would be to set permissions to 600 (or even 000) and then use LSMs to grant additional access. If one then turns off LSM they lose access.

What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.

LSM stacking and the future

Posted Nov 25, 2019 23:01 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (18 responses)

> Because interactions are complicated. Eg, I want Apache to listen on port 80, but not on port 22 under any circumstance. I want Apache to serve files from my home directory, but not my GPG keys.
pledge()/unveil() do both just fine in practice.

> Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.
The problem is, even with all its brokenness, seccomp still can not express full pledge()/unveil() semantics.

This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely. It won't be uber-flexible NSA-Flask-compatible, and it would require extensions on case-by-case basis, sure. But it also would be much more usable.

> But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.
In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.

> What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.
You're missing the point. If users or application developers see SELinux interfering with their work, they simply turn SELinux off instead of fixing whatever is wrong. There's no downside to doing this as LSMs fail open.

The ONLY way to fix this in the long term is to make LSMs mandatory.

LSM stacking and the future

Posted Nov 26, 2019 0:12 UTC (Tue) by vadim (subscriber, #35271) [Link] (17 responses)

> pledge()/unveil() do both just fine in practice.

I'm not an user of *BSD, how do you implement those policies with pledge/unveil?

> This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely.

Sure, improvements can be made.

> In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.

It matters because:

1. I can modify the policy without touching the source code.
2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
4. Tools like 'sandbox' that sandbox arbitrary applications.

> The ONLY way to fix this in the long term is to make LSMs mandatory.

Ohh. I finally get it.

That's a pointless waste of time. You can't fix willful stupidity by technical measures, it never worked and never will. If somebody wants to disable security, they will do so. People will disable it, not compile it, patch the kernel, choose another distribution, run everything as root, whatever.

LSM stacking and the future

Posted Nov 26, 2019 0:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

> I'm not an user of *BSD, how do you implement those policies with pledge/unveil?
I don't use OpenBSD but I installed it in a VM just to check this.

You do:
1. unveil() directories that you want to be readable, this will automatically make everything else closed off.
2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.

> 1. I can modify the policy without touching the source code.
How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.

> 2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.

> 3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
Realistically neither is the policy writer.

> 4. Tools like 'sandbox' that sandbox arbitrary applications.
unveil()/pledge() them from wrapper scripts. Add more pledges as needed on case-by-case basis.

rootless <1024

Posted Nov 26, 2019 8:04 UTC (Tue) by zdzichu (subscriber, #17118) [Link] (6 responses)

By the way, is the rule of requiring root for lower ports even sensible today? Seems like cargo culting.

rootless <1024

Posted Nov 26, 2019 8:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Correct. It makes no sense at all in the modern world. Up until recently (before the advent of systemd) it had actual _negative_ security as it forced all kinds of programs to be launched as root only to listen on a privileged port.

These days it also can be worked around using ambient caps acquired in a helper wrapper (regular caps are lost on exec).

rootless <1024

Posted Nov 26, 2019 14:27 UTC (Tue) by jem (subscriber, #24231) [Link] (4 responses)

Isn't the point of requiring root for ports less than 1024 that they can be trusted to some degree? So you can say ssh some-well-known-host, and rely on that some random joker with an ordinary account on the host hasn't discovered that port 22 is free and started his or her own password stealer.

rootless <1024

Posted Nov 26, 2019 14:53 UTC (Tue) by vadim (subscriber, #35271) [Link]

That's not very much security, since most any random joker can run a VM with a sshd on port 22, and then get people to connect there by say, messing with DNS, registering domain names that are off by one character, and such things.

Also due to said VMs the scenario of people being given shell accounts is becoming rarer by the day, anyway.

Also there's plenty important stuff on ports > 1024, such as administrative consoles like Cockpit on port 9090. So if you've got user access, there's nothing much preventing you for putting up a fake Cockpit page of your own.

rootless <1024

Posted Nov 26, 2019 19:57 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

That was the idea way back then, but in practice it's not really relevant anymore. Moreover, it resulted in daemons like MySQL or Postgres actually standardizing on ports >1024 to avoid doing the open-the-socket-then-drop-privs dance.

Had there been something like systemd from the start (heck, even a better designed inetd) then this might have turned out differently.

Even for ports <1024 you shouldn't really trust them implicitly.

rootless <1024

Posted Nov 26, 2019 21:24 UTC (Tue) by rodgerd (guest, #58896) [Link]

Maybe in 1982, when you could still count the number of Unix machines in the world, and have a reasonable chance of knowing who was configuring them.

Since Unix became popular in, oh, the mid-nineties, it's been a toxic heritage that causes more harm than good, leading programs to run as root simply because they wanted a well-known port, while providing absolutely no security benefit whatsoever.

This is a classic example where mindless adherence to "Unix tradition" has cause more harm than good, all for a lack of critical thinking.

rootless <1024

Posted Nov 28, 2019 2:49 UTC (Thu) by flussence (guest, #85566) [Link]

Its security value over the internet is nil, especially in the face of things like win32, but in a closed system (loopback or authenticated VPN) it still provides some assurance that the other end is what you think it is.

LSM stacking and the future

Posted Nov 26, 2019 8:24 UTC (Tue) by vadim (subscriber, #35271) [Link] (8 responses)

> 1. unveil() directories that you want to be readable, this will automatically make everything else closed off.

I see. Well, that's far worse than SELinux.

One big problem I see is that you can only sandbox once. Unveil requires blocking off further changes at the end. So you both can't confine further something in an already confined environment, and can't expand the confinement either.

The first is problematic because now your rule set encompasses anything that could possibly be called by the main process. Eg, you can confine Apache to /home/user/public_html, but what if you call a CGI that reads something in /home/user/.cgi? Now you need to allow that, and you can't give the permission to that particular CGI because once Apache or its wrapper finished with the pledge stuff, it's set in stone. So you give that permission to Apache, adding it to a heap of stuff that Apache can do, because something that it calls needs that. Hardly pretty, very manageable, or very secure.

The second is problematic because you make things that operate with above normal permissions impossible. Eg, think about tools like ping that execute with more privileges than their caller, but that are coded in a way that their usage is safe. Think for instance of a CGI calling scp. You must now allow Apache access to your ssh keys, which makes it able it to serve them to anyone who succeeds in tricking Apache into doing it.

Also this would seem not to allow for new users to be created, unless one can pledge("/home/*/foo")

> 2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.

Sure, if you have cooperation from the program, in that it allows to work on a socket passed on stdin at all. And if you need more than one of those now you need systemd support in that program. And what about port 8080?

Let's see what we're up to by now:

1. Write a wrapper that will forbid Apache from making listen() calls, and unveil() anything needed.
2. Write a systemd service that will listen on 80 and 8080, and pass those sockets to Apache
3. Ensure Apache is happy with not being able to listen to anything but what is passed to it from systemd
4. Ensure Apache can get multiple sockets from systemd
5. Ensure that neither Apache nor anything it calls will ever try to unveil anything, because that won't work.
6. Ensure that either anything Apache calls is fine with the pledge() being made, or that it's okay for the pledge() being rescinded on exec (there goes our listen() security!)
7. Accept that adding new users will require completely shutting down and restarting Apache

I don't know, this doesn't look particularly elegant to me. Lots of potential trouble already, and we've not even done much yet!

> How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.

Anybody using setroubleshoot is effectively doing it

> We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.

Again, the point is confinement, not allowing formerly root-only things safely. I don't see why a thing should be able to open ports >= 1024 without my permission

> Realistically neither is the policy writer.

There's no perfection for sure, but at least the policy's writer is ideally an uninvolved third party who will ask useful questions like "Why does it want to do that?". Because if the developer of a thing is up to no good, or just not concerned about security, then clearly we benefit from an outside opinion.

LSM stacking and the future

Posted Nov 26, 2019 20:53 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

> The first is problematic because now your rule set encompasses anything that could possibly be called by the main process.
Nope. You can have multiple wrappers that run multiple Apache copies. As I said, I'm interested in reliable _practical_ solutions that just work.

Now let's see what you need to do in SELinux to do the same: Apache listening on port 80 and serving the ~/public directory while denying access to everything else. It's a simple task, right?

First, you need to create a label. Let's call it apache_file_t. And add it to ~/public. This will have an unfortunate side-effect of disabling user_home_t label on it, so if you have policies targeted for user_home_t then they might need an adjustment. For example, your backup utility might _lose_ access to ~/public if its policy just says "allow user_home_t read".

OK. From here on, files created in ~/public will have the apache_file_t label. However, since "file is an object blah-blah" if you move a file into ~/public it will NOT be automatically accessible. You need to remember to relabel it. The reverse is also true, if you move a file from ~/public it will still retain its labels and remain accessible.

But wait, there's more! SELinux can only take away rights. Typically home directories are set to 770 mode, so that they are accessible only for their users and user groups. So you need to make sure Apache is in the same group as yourself.

But OK, let's move on to listening on port 80. SELinux can... do nothing! It's only used to restrict access, not to grant it. So you have to start Apache as root and then let it drop privs. SELinux does allow taking away most of root's capabilities, so that's fine.

Now suppose that SELinux is turned off. Suddenly your home directory becomes accessible for Apache, which is in the same user group as your home directory. Whoops. And Apache is also started as root.

Let's compare with unveil(). You need to add access for ~/public, so you write a helper wrapper that does unveil() for that directory. Nothing else is affected, you don't need to modify your backup utility's policy. And unveil() can't be turned off, it's a core kernel feature.

LSM stacking and the future

Posted Nov 26, 2019 22:03 UTC (Tue) by vadim (subscriber, #35271) [Link] (6 responses)

Nope. You can have multiple wrappers that run multiple Apache copies.

No, I'm not talking about multiple Apache copies. I'm talking about Apache calling other binaries. That is, a situation where you have:

Wrapper -> Apache -> CGI_1
                  -> CGI_2
                  -> CGI_3

What I'm saying is that you have several problems there:

The Wrapper makes it impossible for any of its children (Apache, or Apache's children) to pledge/unveil anything, because pledge/unveil work by listing what you will do, and closing off the rest, after which the functionality is closed off to any children. This means that if anything wants to drop privileges further, now it can't. If it thinks that's an error, it won't run. Otherwise it'll run with more privileges than it needs, and the Wrapper is actually compromising the security of it.
This system means that you need to pledge/unveil everything Apache or any of its children might ever want, and grant that access to that Apache instance and every child. Which means Wrapper must pledge/unveil everything Apache, CGI_1, CGI_2, and CGI_3 at once. You can't allow things for Apache and deny them to the CGIs, or lockdown each CGI in its own particular way... unless you skip on locking down Apache of course.
For pledge() specifically you can drop the lockdown on exec, but of course that now means the CGIs are free to do whatever they want.
It's also an inflexible system in that it requires a full restart to change what you unveil. You must either unveil a subdirectory under which anything will be accessible regardless of what it is, or if you are selective, you only get to do it once in Wrapper, after which it's set in stone and requires a full restart of the Apache instance.

As I said, I'm interested in reliable _practical_ solutions that just work.

And I'm explaining why it's not very practical in practice

That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to. This is exactly the behavior I want my system to have.

Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel. Or just hack it up. This looks like a promising place for a "return 0". Or perhaps here. Took me about 10 minutes and I never even touched BSD.

Besides which, look at that lovely BYPASSUNVEIL constant. And oh dear, there's a hardcoded list of bypassed rules right in the kernel source.

LSM stacking and the future

Posted Nov 26, 2019 22:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> 1. This means that if anything wants to drop privileges further, now it can't. If it thinks that's an error, it won't run. Otherwise it'll run with more privileges than it needs, and the Wrapper is actually compromising the security of it.
Nothing stops you from making unveil() nestable. Each successful invocation can further reduce the access. I think that's how pledge() works as well.

> 2. This system means that you need to pledge/unveil everything Apache or any of its children might ever want, and grant that access to that Apache instance and every child.
Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.

> 3. For pledge() specifically you can drop the lockdown on exec, but of course that now means the CGIs are free to do whatever they want.
Uh? Nope. pledge() is inherited across exec() calls.

> 4. It's also an inflexible system in that it requires a full restart to change what you unveil.
So does SELinux. You can't change labels of a running process.

> And I'm explaining why it's not very practical in practice
Well, no you have not.

> That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to.
And that's why it's dumb and is turned off in most cases.

> Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel.
Nope. unveil() can't be turned off. You need to replace the kernel and reboot the system. Running unveil() on an older kernel also results in -ENOSYS.

Meanwhile, SELinux can be turned off with one command.

Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.

LSM stacking and the future

Posted Nov 27, 2019 1:19 UTC (Wed) by vadim (subscriber, #35271) [Link] (4 responses)

> Nothing stops you from making unveil() nestable. Each successful invocation can further reduce the access.

Sure does: the interface. What unveil() does is first to forbid everything, then allow whatever you pass to unveil.

This means that if you don't block off unveil after making your list of exceptions, a child process or an exploit could just unveil("/") and unblock everything.

> I think that's how pledge() works as well.

pledge() has two modes:

1. Pass on the restrictions to the child. Great, unless your child can't work with those. So if you block something major, you're going to have a hard time exec()ing much after that.
2. Remove all restrictions from the child. Which means you restricted yourself, but your child can do whatever it wants.

> Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.

Nope! See, SELinux has the concept of transition rules: https://danwalsh.livejournal.com/23944.html

Which means, I can do this:

1. Confine apache, so that it can only do apache things.
2. Confine CGI, so that it can only do CGI things.
3. Write an apache -> CGI transition rule. Which means CGI rules don't pollute my Apache rules, and the CGI doesn't get to listen on ports.

This means I can have a setup where every piece is locked down to be able to do no more than it's supposed to.

> So does SELinux. You can't change labels of a running process.

But you can change the labels of files on disk, which means for instance I can take a running libvirt, and give it a disk image on a removable drive. All I need to do is to label it, and it works. I don't need to bring libvirt down and all my VMs with it, so that it can have /mnt/external added to its allowed paths list.

> Meanwhile, SELinux can be turned off with one command.

Which can be disabled with SELinux itself, if you want to. After that, reboot time.

> Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.

setsebool -P httpd_enable_homedirs 1
chcon -R -t httpd_sys_content_t ~user/public_html

LSM stacking and the future

Posted Nov 27, 2019 1:23 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> This means that if you don't block off unveil after making your list of exceptions, a child process or an exploit could just unveil("/") and unblock everything.
Uhh, no? unveil("/") will simply return -EPERM. So for example, you can only call unveil("~/public/www") if the parent unveiled("~/public").

LSM stacking and the future

Posted Nov 27, 2019 10:49 UTC (Wed) by vadim (subscriber, #35271) [Link] (2 responses)

Hmm, interesting. Are you saying unveil works differently after a fork()? Eg, as I understand it:

// the whole filesystem is available at the start

unveil("/tmp", "r"); // now only /tmp is visible
unveil("/var", "r"); // now I can see both /tmp and /var

Are you saying the second statement will fail if I insert a fork() (perhaps with an exec) in the middle?

LSM stacking and the future

Posted Nov 27, 2019 11:48 UTC (Wed) by johill (subscriber, #25196) [Link] (1 responses)

I think you have to call unveil(NULL, NULL) to "stop" the ability to unveil more, but typically you would of course do that since it's otherwise useless?

LSM stacking and the future

Posted Nov 27, 2019 12:11 UTC (Wed) by vadim (subscriber, #35271) [Link]

And that's exactly the point I'm making:

unveil is a nice, handy mechanism. But it doesn't nest well. Since unveil builds a list of what you want to allow, you need to lock it up with unveil(NULL, NULL). Once you do so, any further unveil(), whether under a currently locked directory or not fails.

This means it's not a good thing for things that could nest. Sample scenario:

We have a "convert_image" program that does some conversion. We secure it with unveil to ensure it doesn't touch anything it's supposed to, if say, libjpeg happens to have an exploit. Great. It works the way it should from the commandline.

Now that we have a well protected tool, we can call it from Apache and not worry much. Wonderful!

But, let's suppose that since it's so awesome, we've now applied unveil to apache too, which calls convert_image through a CGI. apache calls unveil(NULL, NULL) as it should, and eventually runs convert_image. At that point, one of two things happens:

A. convert_image notices it can't secure itself and refuses to work
B. convert_image ignores the failure and plows ahead, allowing an exploit to work within what Apache is allowed to do.

So, while an interesting tool, it's a limited one, with gotchas like the above.