Shuttleworth: Losing graciously

Posted Feb 19, 2014 18:48 UTC (Wed) by smurf (subscriber, #17840)
In reply to: Shuttleworth: Losing graciously by HelloWorld
Parent article: Shuttleworth: Losing graciously

> The reason devices are exposed as “files” in /dev is precisely that
> one can do things like access control just as if they were proper files.

"It behaves like a plain file" doesn't work for quite a few device nodes, and most Linux subsystems are not controlled by echo: You don't emit sound by "cat rhapsody.wav >/dev/snd" these days, and you don't resize a LVM partition by "echo 10TB >/sys/devices/virtual/block/volgroup/master/varlog/size".

This is Linux. This is not Plan 9 where you can open a TCP connection with mkdir. cgroupfs is fine for introspection, but control? that always seemed a bit unnatural to me.

Besides, pragmatically, a sensible "cgroupctl"-style program will have a --help option and a manpage. To me that seems a lot more useful than traipsing around in cgroupfs and wondering which magic mkdir+echo+mv combo I need to evoke to limit my disk copy program's memory usage.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 20:22 UTC (Wed) by HelloWorld (guest, #56129) [Link] (9 responses)

> "It behaves like a plain file" doesn't work for quite a few device nodes,
So what? Just because you can't read(2) or write(2) to some device nodes doesn't mean you need to use another interface for things like poll(2) or chmod(2). Stop thinking about the “file system” and start thinking about a general hierarchical namespace for all kinds of objects. This is where we are today with files, sockets, fifos, devices files etc.. It's only natural to extend that further.

> This is Linux. This is not Plan 9 where you can open a TCP connection with mkdir.
Uh, I know this is Linux and not Plan 9. How is that supposed to be an argument? We should learn from Plan 9 instead of taking that kind of “us vs. them” stance.

> cgroupfs is fine for introspection, but control? that always seemed a bit unnatural to me.
And to me it seems unnatural that access control for cgroups is supposed to be done through a completely different mechanism than access control to files or devices. Though I agree with you that the current cgroups API isn't ideal. For one thing, I think the natural thing is to use
ln /proc/42 /sys/fs/cgroup/yaddah/cgroup.procs
and not
echo 42 > /sys/fs/cgroup/yaddah/cgroup.procs
to add processes to a cgroup.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 21:17 UTC (Wed) by smurf (subscriber, #17840) [Link] (8 responses)

> And to me it seems unnatural that access control for cgroups
> is supposed to be done through a completely different mechanism
> than access control to files or devices

I strongly suspect that the main reason for that is because you're used to it.

> ln /proc/42 /sys/fs/cgroup/yaddah/cgroup.procs

Linking.
Across file systems.
Yeah, right.

Sorry, but this is the point where I stop responding to you.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 21:44 UTC (Wed) by HelloWorld (guest, #56129) [Link] (7 responses)

> Linking.
> Across file systems.
> Yeah, right.
So what? It's not allowed for conventional file systems because it doesn't make sense there. It does make sense for this case, so there's no reason for it not to be allowed.

> Sorry, but this is the point where I stop responding to you.
You're doing as if I had somehow offended you. I haven't.

Shuttleworth: Losing graciously

Posted Feb 19, 2014 22:06 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (6 responses)

Not only are you linking across filesystems (how would one find out that it is hardlinked elsewhere?), but you're hardlinking a directory. When process 42 ends, does the "hardlink" disappear? If not (as one might expect of hardlinks), does a new process with PID 42 get put there? The /sys and /proc filesystems are already pretty magical, but those are only around read and write (AFAIK), not how many other syscalls as well. Really, even echoing the PID to a file is racy. I'd much rather have something like a procfd to use here.

These behaviors you're asking for are quite different than the usual semantics these tools imply. Sure, filesystems and cgroups are both hierarchical, but there is such a thing as stretching a metaphor too far. To make a meta-metaphor: Should we abandon databases and just use spreadsheets instead? Vice versa? They're both "just" grids of data cells.

Shuttleworth: Losing graciously

Posted Feb 20, 2014 0:13 UTC (Thu) by HelloWorld (guest, #56129) [Link] (5 responses)

Alright, you have a point. Using link(2) is probably not a good idea.

Shuttleworth: Losing graciously

Posted Feb 20, 2014 2:08 UTC (Thu) by MrWim (subscriber, #47432) [Link] (4 responses)

rename() might be though. AFAIU all pids have to appear in the cgroup tree so to put a pid in a cgroup you have to remove it from another. You would need permissions for both cgroups and it happens atomically.

Shuttleworth: Losing graciously

Posted Feb 20, 2014 15:20 UTC (Thu) by HelloWorld (guest, #56129) [Link] (3 responses)

rename() was my first thought. But that would remove the process from the /proc directory, and that doesn't really make sense, does it?

Shuttleworth: Losing graciously

Posted Feb 20, 2014 15:23 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

I think the suggestion was to move the pid from one cgroup directory to another, not from /proc.

Shuttleworth: Losing graciously

Posted Feb 20, 2014 17:34 UTC (Thu) by HelloWorld (guest, #56129) [Link] (1 responses)

Well, that would work, but then how do you move a process that isn't a member of any cgroup into one?

Shuttleworth: Losing graciously

Posted Feb 20, 2014 18:15 UTC (Thu) by MrWim (subscriber, #47432) [Link]

That's what I meant by "all pids have to appear in the cgroup tree so to put a pid in a cgroup you have to remove it from another". My assumption is that there cannot be a process which isn't a member of any cgroup. If init starts in a cgroup and it's children end up in the same cgroup and there's no way of unlink()ing pids from the cgroup tree then you're guaranteed that every process is in the tree.

In that setup you can't steal other users processes and put them in your subtree, you can only move pids around in the trees you own. You can then use whichever cgroup manager that you desire in your subtree. Containers work while still only co-operating with the kernel, rather than having to communicate with other user-space programs running outside.