Shuttleworth: Losing graciously

Posted Feb 19, 2014 18:55 UTC (Wed) by fandingo (guest, #67019)
In reply to: Shuttleworth: Losing graciously by paulj
Parent article: Shuttleworth: Losing graciously

I think the primary reason why it was moved out of the kernel is that the policies authorizing access are not simple and may not follow traditional methods. Here are a couple of situations where non-traditional policies may be desired

* Only allow PID X to manage cgroup subtree /A/B/C.
* Processes in subtree /A/B/C/ may move their processes in /D/E/F but should not have any further control.

Traditional file permissions and even ACLs are incapable of dealing with either situation. In the first, case there's no way to grant that access without allowing all processing owned by that same user/group from controlling /A/B/C. In the second situation, there's no way to allow a one-way move.

Yeah, we can start adding more files to the cgroups to attempt finer control, but now things get messy, and it's likely the who permission issue goes out the window. That's just two random policies that a system may want to enforce. To do #1 you would likely have to give one of U, G, or O +w to the subtree for that user, but that's a broken definition because hidden kernel policy will revoke writes by other processes, even though they have the proper U or G or fall into O. In the end, a cgroup would spout many files to cover all sorts of policy combinations, and the kernel developers will be left with a monstrosity that can't be fixed because next time the same cries about API will arise.

A filesystem hierarchy is not suited to this complexity. It will have to be contorted into all kinds ways where the permissions scheme (even with ACLs) doesn't match traditional behavior.

It's far preferable to take the LSM approach. The kernel provides the primitives throughout the kernel subsystems and talks to another module to establish and enforce policy. It's true that LSM modules are kernel modules, but they remain separate from the LSM code. (I wouldn't have a problem with a cgroup manager living as a kernel module, but I don't see any inherent benefit either.)

Shuttleworth: Losing graciously

Posted Feb 19, 2014 20:36 UTC (Wed) by HelloWorld (guest, #56129) [Link] (3 responses)

> * Only allow PID X to manage cgroup subtree /A/B/C.
> * Processes in subtree /A/B/C/ may move their processes in /D/E/F but should not have any further control.
>
> Traditional file permissions and even ACLs are incapable of dealing with either situation.
But similar restrictions might also make sense for other kinds of hierarchically organised objects. So why not generalise the existing access control mechanisms to allow for things like that instead of inventing something new?

Shuttleworth: Losing graciously

Posted Feb 19, 2014 22:28 UTC (Wed) by fandingo (guest, #67019) [Link] (1 responses)

The only existing mechanism that could possibly be used would be an LSM. That probably wouldn't be a bad approach.

I don't think that there's any way that traditional permissions, even with ACLs, could be massaged into giving the necessary flexibility and clean interfaces.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 22:46 UTC (Wed) by dlang (guest, #313) [Link]

the thing is, existing LSMs know how to deal with permissions to filesystem objects. SELinux and AppArmor work on the existing cgroups interfaces today (as Cyberax has noted).

Plus there is the entire extended ACL structure thats available (but very seldom used because it's not needed)

On Linux, permissions for filesystem objects have not been limited to the unix wrx bits for a long time.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 22:49 UTC (Wed) by vonbrand (subscriber, #4458) [Link]

So the solution to the problem that the API isn't well known/standard is to create another totally new, in practice untested, "general hierarchical security model" to be applied across the board to anything with a hierarchical structure. That sounds much, much harder to do right to me.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 23:00 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

>* Only allow PID X to manage cgroup subtree /A/B/C.
>* Processes in subtree /A/B/C/ may move their processes in /D/E/F but should not have any further control.
Then they should talk to a some kind of privileged program that can do this. Traditional UNIX used suid programs for that, and it totally makes sense to use something like cgmanager/systemd for this.

However, such situations are not really normal. In particular, changing levels in cgroups hierarchy is not a trivial operation - new subtree might have limits that the subtree which is being moved already exceeds.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 23:32 UTC (Wed) by fandingo (guest, #67019) [Link] (3 responses)

> However, such situations are not really normal. In particular, changing levels in cgroups hierarchy is not a trivial operation - new subtree might have limits that the subtree which is being moved already exceeds.

That's really a question of policy, though. If the policy says that some processes need to move to /D/E/F, then they need to go there, regardless of what the resource controllers say. (I'd argue that the process should be moved first, and then the resource controller terminates processes to get back into proper configuration. I don't think that it is acceptable to leave a process in the wrong the subtree.)

It's worth noting that systemd-login, which is not PID 1, does the second action today on user login.

> Then they should talk to a some kind of privileged program that can do this. Traditional UNIX used suid programs for that, and it totally makes sense to use something like cgmanager/systemd for this.

If there are acknowledged shortcomings of cgroupfs, shouldn't the API be changed to support all reasonable actions? Why should the kernel keep interfaces that clearly have shortcomings that cannot be resolved without massive API incompatibilities?

Shuttleworth: Losing graciously

Posted Feb 19, 2014 23:40 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> That's really a question of policy, though. If the policy says that some processes need to move to /D/E/F, then they need to go there, regardless of what the resource controllers say.
You policy might say that a process can use 100G of RAM, but that's not going to help you if you only have 500Mb. Right now if you try to do this trick the kernel simply kills the over-limit processes.

> If there are acknowledged shortcomings of cgroupfs, shouldn't the API be changed to support all reasonable actions? Why should the kernel keep interfaces that clearly have shortcomings that cannot be resolved without massive API incompatibilities?
Like filesystem interface? Perhaps we should switch to DBUS instead of using open()?

Shuttleworth: Losing graciously

Posted Feb 19, 2014 23:43 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

backwards compatibility would be a reason for keeping poor interfaces, but if you are going to break them, then you need to do so once, not multiple times.

And the new systemd API is just that, a systemd API, by definition it doesn't deal with use cases that don't use systemd.

and the currently proposed single-writer API in known to not support all use cases, so why should it replace the existing API?

Shuttleworth: Losing graciously

Posted Feb 19, 2014 23:59 UTC (Wed) by fandingo (guest, #67019) [Link]

> and the currently proposed single-writer API in known to not support all use cases, so why should it replace the existing API?

You are talking about the Google multi-hierarchy complaint, right? The cgroups maintainers are on record as saying that this is not reasonable, and they intend to eliminate it.

> backwards compatibility would be a reason for keeping poor interfaces

The cgroups developers believe them to be broken, not poor. Plus, they get in the way of fixing cgroups since cgroupfs leaks too many implementation details.