Control groups - The conclusion

Posted Aug 14, 2014 7:00 UTC (Thu) by pixelpapst (guest, #55301)
Parent article: Control groups, part 7: To unity and beyond

Well, I certainly arrived at a similar conclusion regarding grouping based on session management primitives, but your articles also were quite skillfully crafted to lead me there. :)

What I'm missing a bit is the discussion of how the in-kernel data structures might / should evolve from here. To me it seems pretty obvious that, once the unified hierarchy can be assumed at compile time (possibly with a CONFIG_CGROUP_UNIFIED_ONLY flag), the process' (task group leader's) data structure would gain a
struct cgroup *cgrp;
element, and the horribly-to-traverse struct cgrp_cset_link hopefully a thing of the past. Or, if your caveat on separate hierarchies for accounting (upward) and control (downward) proves important enough, two
struct cgroup *cgrp_acc;
struct cgroup *cgrp_ctl;
elements. In any case, getting than horrid NxM data structure out of possibly hot paths can only be a good thing.

I also expect more controllers to move their accounting or control data into struct cgroup itself. I agree that the cgroup.controllers / cgroup.subtree_control interface is suboptimal (I would have preferred them to be the same file at least). I'm not quite sure how this "entrenches the current structure" though, and what you would have done differently here.

Evolving our cgroups in the direction of your hypothetical hgroups could then start by the kernel automatically creating a new cgroup instance on setsid() and setpgid() (or possibly moving the process to a different cgroup in the case of the latter). Later a struct cgroup could even be embedded in the session/pgroup data structures, while still taking care that processes could be moved between cgroups. I'm not sure if in such a world you'd still create the set_domainid() system call, or the "upper layers" of the cgroups tree would just be handled by systemd using traditional cgroup knobs.

You briefly touch on the cost of scheduling in such a multi-level world. Constraining the scheduler to one fixed level of the hierarchy is certainly one way to approach this; but it might be efficient enough if the scheduler would just work on two levels: non-leaf groups (the most specific group that still has scheduling enabled) and leaves (processes / threads). The usual caveats about carefully considering cache-line bouncing apply.

Control groups - The conclusion

Posted Aug 15, 2014 9:11 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> What I'm missing a bit is the discussion of how the in-kernel data structures might / should evolve from here.

The omission of "should" was deliberate. "should" is a question of large scale project engineering which I don't feel qualified to comment on - quite different from analysing code and code history, and finding and describing patterns which I do feel qualified for.

"might" I thought I was quite clear on: copy autogroups. Leave cgroups alone and create something that provides the same functionality using a better model and is active only on processes in 'root' cgroups.

If it were up to me (which it isn't) I would freeze development of the cgroups filesystem interface as of 3.15 and start adding all the important functionality to Linux in more idiomatic ways. Once the functionality was available in a "sane" way, I would encourage major users to migrate to the new (presumably better) API and eventually deprecate cgroups.