LSM stacking and the future
LSM stacking and the future
Posted Nov 25, 2019 22:14 UTC (Mon) by vadim (subscriber, #35271)In reply to: LSM stacking and the future by Cyberax
Parent article: LSM stacking and the future
Because interactions are complicated. Eg, I want Apache to listen on port 80, but not on port 22 under any circumstance. I want Apache to serve files from my home directory, but not my GPG keys.
Then you have messes like PAM, which are pretty tricky to secure.
> It's not like we haven't see alternatives. OpenBSD has very practical and extremely useful unveil()/pledge() support, for example. Which is STILL impossible to express completely in Linux even with unholy brew of eBPF and SELinux.
Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.
But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.
> No. My point would be to set permissions to 600 (or even 000) and then use LSMs to grant additional access. If one then turns off LSM they lose access.
What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.
Posted Nov 25, 2019 23:01 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (18 responses)
> Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.
This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely. It won't be uber-flexible NSA-Flask-compatible, and it would require extensions on case-by-case basis, sure. But it also would be much more usable.
> But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.
> What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.
The ONLY way to fix this in the long term is to make LSMs mandatory.
Posted Nov 26, 2019 0:12 UTC (Tue)
by vadim (subscriber, #35271)
[Link] (17 responses)
I'm not an user of *BSD, how do you implement those policies with pledge/unveil?
> This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely.
Sure, improvements can be made.
> In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.
It matters because:
1. I can modify the policy without touching the source code.
> The ONLY way to fix this in the long term is to make LSMs mandatory.
Ohh. I finally get it.
That's a pointless waste of time. You can't fix willful stupidity by technical measures, it never worked and never will. If somebody wants to disable security, they will do so. People will disable it, not compile it, patch the kernel, choose another distribution, run everything as root, whatever.
Posted Nov 26, 2019 0:26 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (16 responses)
You do:
> 1. I can modify the policy without touching the source code.
> 2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
> 3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
> 4. Tools like 'sandbox' that sandbox arbitrary applications.
Posted Nov 26, 2019 8:04 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link] (6 responses)
Posted Nov 26, 2019 8:08 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
These days it also can be worked around using ambient caps acquired in a helper wrapper (regular caps are lost on exec).
Posted Nov 26, 2019 14:27 UTC (Tue)
by jem (subscriber, #24231)
[Link] (4 responses)
Posted Nov 26, 2019 14:53 UTC (Tue)
by vadim (subscriber, #35271)
[Link]
Also due to said VMs the scenario of people being given shell accounts is becoming rarer by the day, anyway.
Also there's plenty important stuff on ports > 1024, such as administrative consoles like Cockpit on port 9090. So if you've got user access, there's nothing much preventing you for putting up a fake Cockpit page of your own.
Posted Nov 26, 2019 19:57 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Had there been something like systemd from the start (heck, even a better designed inetd) then this might have turned out differently.
Even for ports <1024 you shouldn't really trust them implicitly.
Posted Nov 26, 2019 21:24 UTC (Tue)
by rodgerd (guest, #58896)
[Link]
Since Unix became popular in, oh, the mid-nineties, it's been a toxic heritage that causes more harm than good, leading programs to run as root simply because they wanted a well-known port, while providing absolutely no security benefit whatsoever.
This is a classic example where mindless adherence to "Unix tradition" has cause more harm than good, all for a lack of critical thinking.
Posted Nov 28, 2019 2:49 UTC (Thu)
by flussence (guest, #85566)
[Link]
Posted Nov 26, 2019 8:24 UTC (Tue)
by vadim (subscriber, #35271)
[Link] (8 responses)
I see. Well, that's far worse than SELinux.
One big problem I see is that you can only sandbox once. Unveil requires blocking off further changes at the end. So you both can't confine further something in an already confined environment, and can't expand the confinement either.
The first is problematic because now your rule set encompasses anything that could possibly be called by the main process. Eg, you can confine Apache to /home/user/public_html, but what if you call a CGI that reads something in /home/user/.cgi? Now you need to allow that, and you can't give the permission to that particular CGI because once Apache or its wrapper finished with the pledge stuff, it's set in stone. So you give that permission to Apache, adding it to a heap of stuff that Apache can do, because something that it calls needs that. Hardly pretty, very manageable, or very secure.
The second is problematic because you make things that operate with above normal permissions impossible. Eg, think about tools like ping that execute with more privileges than their caller, but that are coded in a way that their usage is safe. Think for instance of a CGI calling scp. You must now allow Apache access to your ssh keys, which makes it able it to serve them to anyone who succeeds in tricking Apache into doing it.
Also this would seem not to allow for new users to be created, unless one can pledge("/home/*/foo")
> 2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.
Sure, if you have cooperation from the program, in that it allows to work on a socket passed on stdin at all. And if you need more than one of those now you need systemd support in that program. And what about port 8080?
Let's see what we're up to by now:
1. Write a wrapper that will forbid Apache from making listen() calls, and unveil() anything needed.
I don't know, this doesn't look particularly elegant to me. Lots of potential trouble already, and we've not even done much yet!
> How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.
Anybody using setroubleshoot is effectively doing it
> We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.
Again, the point is confinement, not allowing formerly root-only things safely. I don't see why a thing should be able to open ports >= 1024 without my permission
> Realistically neither is the policy writer.
There's no perfection for sure, but at least the policy's writer is ideally an uninvolved third party who will ask useful questions like "Why does it want to do that?". Because if the developer of a thing is up to no good, or just not concerned about security, then clearly we benefit from an outside opinion.
Posted Nov 26, 2019 20:53 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
Now let's see what you need to do in SELinux to do the same: Apache listening on port 80 and serving the ~/public directory while denying access to everything else. It's a simple task, right?
First, you need to create a label. Let's call it apache_file_t. And add it to ~/public. This will have an unfortunate side-effect of disabling user_home_t label on it, so if you have policies targeted for user_home_t then they might need an adjustment. For example, your backup utility might _lose_ access to ~/public if its policy just says "allow user_home_t read".
OK. From here on, files created in ~/public will have the apache_file_t label. However, since "file is an object blah-blah" if you move a file into ~/public it will NOT be automatically accessible. You need to remember to relabel it. The reverse is also true, if you move a file from ~/public it will still retain its labels and remain accessible.
But wait, there's more! SELinux can only take away rights. Typically home directories are set to 770 mode, so that they are accessible only for their users and user groups. So you need to make sure Apache is in the same group as yourself.
But OK, let's move on to listening on port 80. SELinux can... do nothing! It's only used to restrict access, not to grant it. So you have to start Apache as root and then let it drop privs. SELinux does allow taking away most of root's capabilities, so that's fine.
Now suppose that SELinux is turned off. Suddenly your home directory becomes accessible for Apache, which is in the same user group as your home directory. Whoops. And Apache is also started as root.
Let's compare with unveil(). You need to add access for ~/public, so you write a helper wrapper that does unveil() for that directory. Nothing else is affected, you don't need to modify your backup utility's policy. And unveil() can't be turned off, it's a core kernel feature.
Posted Nov 26, 2019 22:03 UTC (Tue)
by vadim (subscriber, #35271)
[Link] (6 responses)
No, I'm not talking about multiple Apache copies. I'm talking about Apache calling other binaries. That is, a situation where you have:
What I'm saying is that you have several problems there: And I'm explaining why it's not very practical in practice That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to. This is exactly the behavior I want my system to have.
Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel. Or just hack it up. This looks like a promising place for a "return 0". Or perhaps here. Took me about 10 minutes and I never even touched BSD.
Besides which, look at that lovely BYPASSUNVEIL constant. And oh dear, there's a hardcoded list of bypassed rules right in the kernel source.
Posted Nov 26, 2019 22:47 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
> 2. This system means that you need to pledge/unveil everything Apache or any of its children might ever want, and grant that access to that Apache instance and every child.
> 3. For pledge() specifically you can drop the lockdown on exec, but of course that now means the CGIs are free to do whatever they want.
> 4. It's also an inflexible system in that it requires a full restart to change what you unveil.
> And I'm explaining why it's not very practical in practice
> That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to.
> Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel.
Meanwhile, SELinux can be turned off with one command.
Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.
Posted Nov 27, 2019 1:19 UTC (Wed)
by vadim (subscriber, #35271)
[Link] (4 responses)
Sure does: the interface. What unveil() does is first to forbid everything, then allow whatever you pass to unveil.
This means that if you don't block off unveil after making your list of exceptions, a child process or an exploit could just unveil("/") and unblock everything.
> I think that's how pledge() works as well.
pledge() has two modes:
1. Pass on the restrictions to the child. Great, unless your child can't work with those. So if you block something major, you're going to have a hard time exec()ing much after that.
> Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.
Nope! See, SELinux has the concept of transition rules: https://danwalsh.livejournal.com/23944.html
Which means, I can do this:
1. Confine apache, so that it can only do apache things.
This means I can have a setup where every piece is locked down to be able to do no more than it's supposed to.
> So does SELinux. You can't change labels of a running process.
But you can change the labels of files on disk, which means for instance I can take a running libvirt, and give it a disk image on a removable drive. All I need to do is to label it, and it works. I don't need to bring libvirt down and all my VMs with it, so that it can have /mnt/external added to its allowed paths list.
> Meanwhile, SELinux can be turned off with one command.
Which can be disabled with SELinux itself, if you want to. After that, reboot time.
> Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.
setsebool -P httpd_enable_homedirs 1
Posted Nov 27, 2019 1:23 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Nov 27, 2019 10:49 UTC (Wed)
by vadim (subscriber, #35271)
[Link] (2 responses)
// the whole filesystem is available at the start
unveil("/tmp", "r"); // now only /tmp is visible
Are you saying the second statement will fail if I insert a fork() (perhaps with an exec) in the middle?
Posted Nov 27, 2019 11:48 UTC (Wed)
by johill (subscriber, #25196)
[Link] (1 responses)
Posted Nov 27, 2019 12:11 UTC (Wed)
by vadim (subscriber, #35271)
[Link]
unveil is a nice, handy mechanism. But it doesn't nest well. Since unveil builds a list of what you want to allow, you need to lock it up with unveil(NULL, NULL). Once you do so, any further unveil(), whether under a currently locked directory or not fails.
This means it's not a good thing for things that could nest. Sample scenario:
We have a "convert_image" program that does some conversion. We secure it with unveil to ensure it doesn't touch anything it's supposed to, if say, libjpeg happens to have an exploit. Great. It works the way it should from the commandline.
Now that we have a well protected tool, we can call it from Apache and not worry much. Wonderful!
But, let's suppose that since it's so awesome, we've now applied unveil to apache too, which calls convert_image through a CGI. apache calls unveil(NULL, NULL) as it should, and eventually runs convert_image. At that point, one of two things happens:
A. convert_image notices it can't secure itself and refuses to work
So, while an interesting tool, it's a limited one, with gotchas like the above.
LSM stacking and the future
pledge()/unveil() do both just fine in practice.
The problem is, even with all its brokenness, seccomp still can not express full pledge()/unveil() semantics.
In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.
You're missing the point. If users or application developers see SELinux interfering with their work, they simply turn SELinux off instead of fixing whatever is wrong. There's no downside to doing this as LSMs fail open.
LSM stacking and the future
2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
4. Tools like 'sandbox' that sandbox arbitrary applications.
LSM stacking and the future
I don't use OpenBSD but I installed it in a VM just to check this.
1. unveil() directories that you want to be readable, this will automatically make everything else closed off.
2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.
How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.
We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.
Realistically neither is the policy writer.
unveil()/pledge() them from wrapper scripts. Add more pledges as needed on case-by-case basis.
rootless <1024
rootless <1024
Isn't the point of requiring root for ports less than 1024 that they can be trusted to some degree? So you can say ssh some-well-known-host, and rely on that some random joker with an ordinary account on the host hasn't discovered that port 22 is free and started his or her own password stealer.
rootless <1024
rootless <1024
rootless <1024
rootless <1024
rootless <1024
LSM stacking and the future
2. Write a systemd service that will listen on 80 and 8080, and pass those sockets to Apache
3. Ensure Apache is happy with not being able to listen to anything but what is passed to it from systemd
4. Ensure Apache can get multiple sockets from systemd
5. Ensure that neither Apache nor anything it calls will ever try to unveil anything, because that won't work.
6. Ensure that either anything Apache calls is fine with the pledge() being made, or that it's okay for the pledge() being rescinded on exec (there goes our listen() security!)
7. Accept that adding new users will require completely shutting down and restarting Apache
LSM stacking and the future
Nope. You can have multiple wrappers that run multiple Apache copies. As I said, I'm interested in reliable _practical_ solutions that just work.
LSM stacking and the future
Nope. You can have multiple wrappers that run multiple Apache copies.
Wrapper -> Apache -> CGI_1
-> CGI_2
-> CGI_3
As I said, I'm interested in reliable _practical_ solutions that just work.
OK. From here on, files created in ~/public will have the apache_file_t label. However, since "file is an object blah-blah" if you move a file into ~/public it will NOT be automatically accessible. You need to remember to relabel it. The reverse is also true, if you move a file from ~/public it will still retain its labels and remain accessible.
Let's compare with unveil(). You need to add access for ~/public, so you write a helper wrapper that does unveil() for that directory. Nothing else is affected, you don't need to modify your backup utility's policy. And unveil() can't be turned off, it's a core kernel feature.
LSM stacking and the future
Nothing stops you from making unveil() nestable. Each successful invocation can further reduce the access. I think that's how pledge() works as well.
Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.
Uh? Nope. pledge() is inherited across exec() calls.
So does SELinux. You can't change labels of a running process.
Well, no you have not.
And that's why it's dumb and is turned off in most cases.
Nope. unveil() can't be turned off. You need to replace the kernel and reboot the system. Running unveil() on an older kernel also results in -ENOSYS.
LSM stacking and the future
2. Remove all restrictions from the child. Which means you restricted yourself, but your child can do whatever it wants.
2. Confine CGI, so that it can only do CGI things.
3. Write an apache -> CGI transition rule. Which means CGI rules don't pollute my Apache rules, and the CGI doesn't get to listen on ports.
chcon -R -t httpd_sys_content_t ~user/public_html
LSM stacking and the future
Uhh, no? unveil("/") will simply return -EPERM. So for example, you can only call unveil("~/public/www") if the parent unveiled("~/public").
LSM stacking and the future
unveil("/var", "r"); // now I can see both /tmp and /var
LSM stacking and the future
LSM stacking and the future
B. convert_image ignores the failure and plows ahead, allowing an exploit to work within what Apache is allowed to do.