The kdbuswreck

By Jonathan Corbet
April 22, 2015

Few readers will have failed to notice by now that the attempted merging of the kdbus interprocess communication system into the 4.1 kernel has failed to go as well as its proponents would have liked. As of this writing, the discussion continues and nothing has been merged. This article constitutes an attempt to derive a bit of light from the massive amounts of heat that

Some corrections have been applied to this article; the old text remains and is ~~striked through~~.

have been generated so far, with a specific focus on the issue of metadata and capabilities.

Some observers have portrayed the opposition to kdbus as a front in the systemd wars, the intent being to obstruct its merging and set back the perceived systemd agenda. There have been a few messages mentioning systemd and expressing a lack of trust in its developers, but that has been the smallest part of the conversation; it can be safely disregarded. That is not where the serious objections come from.

As was mentioned last week, there is a certain level of discomfort with the core aspect of the design of kdbus: that it implements the D-Bus protocol. Some developers would rather not see kdbus in the kernel at all; others wish that it were an add-on to a more generic messaging solution. With regard to the D-Bus design, this message from Havoc Pennington, one of the original designers of D-Bus, is worth a read. In short: he acknowledges that D-Bus is not perfect, but asserts that it does incorporate a lot of lessons from previous attempts and, as a result, it has been successful.

The most specific advocate of a more general messaging solution is arguably Alan Cox. His latest suggestion would appear to be to go back to the old AF_BUS approach; this patch implemented something D-Bus-like over sockets, but was rejected by the networking maintainers. Alan thinks it's worth another try, given that the kernel already has almost everything that is needed. There have been few signs, though, that the kdbus developers are in the mood to drop their work and attempt to resurrect an approach that has already failed once to get into the kernel.

Metadata and capabilities

The fiercest bone of contention, though, would appear to be a topic that has come up before: the passing of process-specific metadata with messages. In particular, developers led by Andy Lutomirski have continued to assert that kdbus should not attach information about a sending process's capabilities and command line to messages as they cross the bus.

The purpose of the transmission of capabilities, in particular, is to enable privileged processes on the bus to carry out actions at the request of another process on the bus — if that other process has the requisite capabilities. The plans for systemd involve allowing processes to request actions like changing the system time, tweaking the network configuration, or rebooting the system over the bus; the requested action will ~~only~~ be carried out if the requester has CAP_SYS_TIME, CAP_NET_ADMIN, or CAP_SYS_BOOT, respectively.

The kdbus developers point out that one process can learn about another process's capabilities now by reading files in /proc. There's a little problem, though: reading from /proc is subject to race conditions. A process could request a privileged action over D-Bus, then quickly use exec() to run a setuid binary. If the exec() happens before the receiving process gets around to reading /proc, that process will see the new binary's elevated privileges and allow something that the original caller should not have been able to do. So capability-based authentication is not much used in current systems. One of the many appealing features of kdbus is that it makes such capability checks safe; the kernel can guarantee that the capabilities it transmits with the message are what the sending process held when the message was sent.

Andy (and others) have a number of objections to this approach, starting with the ~~fact~~ assertion that capabilities are meant to be interpreted by the kernel, not by user space. By adding these features, user-space developers are said to be violating the layering of the system while broadening the meaning of the relevant capabilities — and they are generally seen as being overly broad already. As an example, CAP_SYS_BOOT gives the ability to call the reboot() system call and immediately reboot the system. Systemd will respond to a reboot request (from a process with CAP_SYS_BOOT) over D-Bus, however, by initiating a clean reboot, unmounting filesystems, shutting down services, etc. Those are actions that CAP_SYS_BOOT would not enable on its own. Eric Biederman was quick to suggest that this extension of the CAP_SYS_BOOT capability could be helpful to an attacker.

Andy also pointed out that the set of capabilities is determined by the kernel source. They can never be extended, so they will limit the expressiveness of authentication mechanisms using kdbus. It would be better, he said, to have a separate, capability-like mechanism implemented in user space that could be extended as the need for new privileges is encountered.

Then there is an interesting little problem in the intersection of capabilities and user namespaces. If a process connects to D-Bus, then moves into its own user namespace, it will appear to have all available capabilities. That would allow the capability checks to be bypassed entirely. This particular problem was fixed in kdbus some time ago by simply dropping the capability metadata when a message crosses a user-namespace boundary. But that fix comes at a cost: now the capability checks do not work at all for processes in user namespaces. The capability-based authentication mechanism, in other words, falls apart on a system where user namespaces are being used for containerization. Systemd maintainer Lennart Poettering doesn't see this limitation as a problem ~~because user namespaces are not (yet) heavily used~~, but others may well disagree with this assessment.

Eric pointed out that there is a capability translation mechanism that could be used to properly transmit capabilities across namespace boundaries. But he also complains that passing capabilities leaks information about sending processes and is thus a security problem in its own right. Linus was not particularly sympathetic to that particular concern, but others, Andy and Alan included, feel that a process should explicitly indicate that it intends to perform an action requiring a specific capability before any such information should be sent.

Finally, though it hasn't been said explicitly, there is the simple fact that most kernel developers see capabilities as a failed experiment. There is no shortage of developers who would like to see them removed from the kernel altogether. That cannot be done — too many tiresome problems with applications breaking and such — but this feeling does lead to resistance to code that seems to expand the role of capabilities further.

Lennart, though, maintains (in the message linked above) that capabilities do have their value and that capability checks are better than an all-or-nothing check for root privileges. He is not thrilled with the suggestion that kdbus should ~~support~~ implement a new user-space privilege mechanism, saying that "we are not really in the business in designing comprehensive new access control systems that can be used for in-kernel and in-userspace subsystems." There seems to be little inclination to consider alternatives (especially those that do not actually exist) at this point.

And that seems to be the core of the impasse. Andy believes that this use of capabilities is dangerous, extending their meaning and bringing in a bunch of security-related code for little real benefit. The kdbus designers, instead, see metadata attachment as a useful tool for the implementation of sandboxing and privilege-separation schemes, and they are unwilling to drop it. Both positions seem firmly entrenched at this point, so it may well come down to what Linus decides to do. He has, for the most part, stayed out of the discussion, but in one message he indicated that most of the capability-related worries don't concern him that much. So he may yet pull kdbus into the kernel, though it would not be entirely surprising if it had to wait one more development cycle first.

Index entries for this article
Kernel	Capabilities
Kernel	kdbus

The kdbuswreck

Posted Apr 22, 2015 20:18 UTC (Wed) by josh (subscriber, #17465) [Link] (17 responses)

I'm wondering if, *even if* the kdbus developers want the capability-passing mechanism, they could simply split that into a separate patch, pushing and advocating for it separately. Then at least the core of kdbus would be in the kernel, and the capability handling could come later.

The kdbuswreck

Posted Apr 22, 2015 23:49 UTC (Wed) by ncm (guest, #165) [Link] (15 responses)

This does not look at all like a system ready to pull.

This sort of system should be using tickets. On login, the process group leader would be issued all the tickets it needs, which can then be communicated over any (secure) medium to child processes, and thence to services. No ticket, no service. The kernel's role, then, is just to deliver ticketed requests: packets, in other words. Tickets can be transacted myriad ways to arrange just the services needed and authorized, and nothing more. E.g. a service might require a ticket which no single other process has, but that two processes together can construct for the occasion.

Capabilities, or any fixed list, will always be the worser of too broad and not broad enough.

The kdbuswreck

Posted Apr 23, 2015 8:41 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

You may well be right (Kerberos, anybody?), but waiting another year or more for a comprehensive ticket subsystem to materialize ^w be developed and debugged seems rather unproductive.

kdbus doesn't add anything to the way systemd handles capabilities, except remove the race condition inherent in checking for them. Passing them through kdbus also doesn't add any and privacy concerns because quite frankly, the knowledge whether or not a particular process does or does not have a particular capability is not a security hole; call me somewhat dumb but offhand I can't think of a way to make it into one.

Rejecting kdbus just because it uses caps is thus somewhat disingenious.

The kdbuswreck

Posted Apr 23, 2015 22:31 UTC (Thu) by luto (guest, #39314) [Link]

Not quite. Systemd doesn't use caps for dbus on non-kdbus systems.

The kdbuswreck

Posted Apr 23, 2015 14:41 UTC (Thu) by drag (guest, #31333) [Link] (12 responses)

Tickets sounds awesome to me.

In the most basic mode Systemd could pass tickts to launched processes through tags in service files. For backwards compatibility to dbus systemd then can examine the capabilities of a process and assign a set of tickets based on that.

Later on a more advanced of gaining tickets can be developed were the process itself is able to negiotiate with systemd-related daemon 'ticket granting service' ala kerberos-style system.

The upside of this from the kernel's perspective is that the tickets themselves are meaningless in terms of kernel privileges. They don't mean anything at all to the kernel in terms of what type of system calls the process can do or anything like that. All the kdbus does is just provide a simple way to share ticket information. Then it really is just metadata only relevent to systemd and systemd related privileged daemons.

They would be read-only from a individual process's perspective. Having a simple /proc/<pid>/ktickets file that would list them for userspace.

To avoid issues with 'nested' operating system namespaces a associated hash can be tied to the tickets so that some process can have it's privileges associated with a specific container and avoid issues when that information leaks out into parent containers/hosting namespaces.

From a userspace perspective this ticket system would be superior to using capabilities because it would offer a much larger amount of flexiblity.

Say, for example:

I am writing a embedded system for controlling hot air balloon. It would consist of a USB device attached to a Linux laptop that would provide various telementary data and other information about the state of the balloon to a 'balloon management daemon'.

For the UI it would be a simple python application running on the user's desktop and it would use kdbus to communicate with the daemon. I am interested in allowing other people to write their own UIs, but I want to make sure that some potentially malicious program or non-privileged user account won't be able to do fool around with the balloon.

So if we were using a ticket system I could provide a service file (or whatever) that would essentially say: "If user is part of 'balloon' group and then the processes launched by this service then assign 'balloon-priv:<host hash>:<time stamp>' ticket to process. Users and programs can easily check if they are getting the tickets correctly by checking out the /proc/<pid>/ktickets file. Kdbus itself wouldn't depend on that file, of course... it's just informational to show what kdbus metadata gets provided along with dbus messages.

...

If I was trying to depend on a capabilities system then what sort of capability would I want to use?

I think that if we see capabilities being used then you would have all sorts of crazy attempts by user space programs to overload capabilities and make the represent weird privileges that they were never intended to represent.... With pure-metadata tickets then that can allow flexiblity and allow userspace to evolve and change without forcing the kernel developers to make difficult choices about breaking backwards compatibility.

The kdbuswreck

Posted Apr 23, 2015 15:46 UTC (Thu) by fandingo (guest, #67019) [Link] (11 responses)

Kdbus already includes a seclabel that can easily fulfill this "ticketing" functionality. LSMs can evaluate this metadata and enforce policy.

People are getting way to carried away with the importance of the capability metadata. It's simply not used for much.

The kdbuswreck

Posted Apr 23, 2015 17:06 UTC (Thu) by drag (guest, #31333) [Link] (8 responses)

> LSMs can evaluate this metadata and enforce policy.

I thought this is about IPC and a way to provide a authentication method so that privileged daemons can carry out tasks on behalf of non-privileged applications?

What would forcing people to program LSMs accomplish here?

The kdbuswreck

Posted Apr 23, 2015 21:24 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

SELinux labels could be [mis]used as tickets. They are traditionally used only to restrict access, but that's just a policy decision.

The kdbuswreck

Posted Apr 23, 2015 22:31 UTC (Thu) by drag (guest, #31333) [Link] (4 responses)

*shrug*

I like the idea of just having a generic metadata that isn't tied into anything else and allow user space to decide the format and how it's interpreted. Just let the kernel provide the mechanism and not be responsible directing the policy (besides a very simple 'is Y allowed to listen to X message bus').

Tying other kernel security interfaces into it as part of IPC information packet seems like it would be a mistake. Especially since those LSMs or capabilities (or whatever) are not necessarily appropriate for every case were daemons have to make a choice in how to respond to process requests.

At least that way it seems that kdbus working in concert with systemd would have a way to maintain backwards compatibility without having to hard code that into the kernel for all of eternity. Which seems to me what Andy is shooting for here.

The kdbuswreck

Posted Apr 23, 2015 23:04 UTC (Thu) by jspaleta (subscriber, #50639) [Link] (3 responses)

i'm not seeing how kdbus mandates caps usage for all of eternity.

Here's my understanding that right now with the current patches
receiver decided if it needs to have caps info or not
sender decides if it wants to send over caps.

If they agree recv and sender get to talk via the bus. If they don't.. bus doesn't relay mesgs.

If in the future every single receiver and sender on the bus decided they no longer needed to care about caps they can just stop asking for that metadata to be sent over the bus. Right?

The kdbuswreck

Posted Apr 23, 2015 23:23 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

If the kernel API says that it passes caps, it is _really_ hard to change that later without breaking something in userspace that depends on it.

This is one of the big reasons people are unhappy with kdbus, it locks a lot of dbus specific policies into the kernel API

The claim is that the only thing that should talk to it is libdbus, but we've already seen that such a policy doesn't work against the userspace dbus, so why would it work against kdbus?

The kdbuswreck

Posted Apr 23, 2015 23:39 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

DBUS is extensible, so in future it's certainly possible to pass additional capabilities via custom methods. The old userspace simply won't be able to use it, but that's not a big deal.

It all seems like a storm in a teacup, the current capabilities are used only for very special actions like rebooting or manipulation of system services. Basically, they are more-or-less a direct replacement of "if (sender_uid == 0)".

The kdbuswreck

Posted Apr 30, 2015 10:17 UTC (Thu) by metux-its (guest, #102293) [Link]

Actually, I never understood why tiny side cases like reboots need all that complexity.

Anybody noticed that we already have per-process namespaces ?
Oh, and we've got file permission flags since aeons.

Why not just giving things like the shutdown/reboot service their
own communication channel (ie. socket), which is only made available
to certain users or processes ? Either via perms or mounts, or by
some key authentication ?

Anybody had a look at Plan9 ?
It really could be so simple ...

The kdbuswreck

Posted Apr 25, 2015 1:06 UTC (Sat) by wahern (subscriber, #37304) [Link] (1 responses)

I'm not that familiar with SELinux, so please excuse my ignorance.

1) I thought the only way to attach a label to a resource is by tagging a file or other specific resource in the first instance. Similar to POSIX capabilities. So, for example, you attach labels to an executable file. Or you attach labels to a port. But how do you attach labels to ad hoc resources? Presumably you could create an anonymous file using open(O_TMPFILE) or memfd_create. But don't you need some kind of privilege to create new labels? And how do you attach a label to a resource when your process isn't already so labeled? Would systemd have to re-execute itself every time a new privilege was defined, e.g. from a package update.

2) I thought the purpose of SELinux was to prevent processes from acquiring access to resources when the labels don't match up. So, for example, if systemd creates a new file descriptor, attaches the label "reboot" to it, how can it pass it to another process that isn't already tagged with the label, "reboot"? The reboot-privileged executable will have been invoked from a process that probably shouldn't have that capability.

Basically, the only difference I can see between SELinux and POSIX capabilities in this scenario is that SELinux has a much larger namespace for defining ad hoc capabilities, as well as a way more sophisticated way to group capabilities (i.e. roles). AFAICT neither solution works anything like Kerberos or Capsicum, in the sense of capabilities that you can freely (but explicitly) pass to other processes, including unrelated processes.

Most importantly, the latter are much more friendly from a programming perspective. As a general matter, the developer is the most knowledgable person--compared to a package maintainer, or system administrator--when it comes to defining, exchanging, and executing capabilities. It's the only practical way to achieve fine-grained separation of privileges, especially ad hoc privileges unrelated to specific, pre-existing global resources. Solutions like SELinux can then be used to further restrict privilege--in as much as they're identifiable system resources--based on local policy.

The kdbuswreck

Posted Apr 25, 2015 2:03 UTC (Sat) by fandingo (guest, #67019) [Link]

> But don't you need some kind of privilege to create new labels?

Yes, that's defined in the SELinux policy. This is exactly what QEMU with s_virt does.

> And how do you attach a label to a resource when your process isn't already so labeled?

The ability to set MLS levels is controlled by selinux policy. It's the domain transition that allows setexeccon(3).

> Would systemd have to re-execute itself every time a new privilege was defined, e.g. from a package update.

I'm not exactly sure what you mean here, but it shouldn't. Obviously, if systemd got new code to enable it do something new, then that part of systemd would need to be restarted. However, if that code was previously there, then no. SELinux policy would have blocked it in the past, but the package update to selinux-policy would be loaded into the kernel, and then, allow that action to succeed.

> So, for example, if systemd creates a new file descriptor, attaches the label "reboot" to it, how can it pass it to another process that isn't already tagged with the label, "reboot"?

Why would it do this? It doesn't make any sense. SELinux would prevent that process from performing issuing any syscalls with that FD. The process would need to be allowed to access this resource through selinux-policy with the "reboot" label or able to do a domain transition (relabel).

> Basically, the only difference I can see between SELinux and POSIX capabilities in this scenario is that SELinux has a much larger namespace for defining ad hoc capabilities, as well as a way more sophisticated way to group capabilities (i.e. roles). AFAICT neither solution works anything like Kerberos or Capsicum, in the sense of capabilities that you can freely (but explicitly) pass to other processes, including unrelated processes.

I think that you're missing the policy part of SELinux, and the allowed transitions that can be programmed.

> Most importantly, the latter are much more friendly from a programming perspective. As a general matter, the developer is the most knowledgable person--compared to a package maintainer, or system administrator--when it comes to defining, exchanging, and executing capabilities.

Then ship a SELinux policy with your program. If the developer knows what resources, sharing, and transitions are required, she just needs to define it in the policy.

It's certainly possible in SELinux to allow arbitrarily defined syscalls on open FDs, but restrict the opening of new ones, which I think is what you're getting at. However, that's normally how SELinux policies are defined.

The kdbuswreck

Posted Apr 24, 2015 13:21 UTC (Fri) by ncm (guest, #165) [Link] (1 responses)

Every time I read "caps aren't used for much", it reads "this is not an appropriate thing to shove into the kernel". Stuff in the kernel should be stuff that *is* used for much, _and_ that can't practically be done any other way.

Not being used for much, too, indicates that switching it over to a ticketed service would not be a big job.

The kdbuswreck

Posted Apr 24, 2015 14:55 UTC (Fri) by fandingo (guest, #67019) [Link]

I'm not sure what you mean by "stuff" and "this." Are you complaining about caps, kdbus, or kdbus' use of caps?

Caps *are* in the kernel, and with the slavish devotion to supporting things forever, they'll be there for the foreseeable future. The horse is out of the barn. (That being said, the problem with caps is more implementation thn design. If you want some sophisticated policy system, that's never what caps were designed to do.)

Kdbus will definitely be used for a ton of stuff.

> Not being used for much, too, indicates that switching it over to a ticketed service would not be a big job.

I still don't understand what this is supposed to mean. This metadata is attached to kdbus messages because it cannot be provided in an atomic, attestable manner otherwise. A Kerberos-like ticketing system doesn't need any of that. It's the sender that provides the ticket directly to the SS. Even if the kernel were the AS -- which doesn't make sense -- the kernel doesn't need to insert metadata all over the place.

That's the fundamental problem with this ticketing idea: It is neither based on data that the kernel has nor data that is useful to the kernel -- only to the sender, receiver, and authenticator.

I seriously don't understand where this ticketing idea originates or particularly how it relates to the issue at hand. It's like a kid in a candy story that starts yelling about wanting a pony. Umm, I guess it could be nice, but that doesn't help answer the question whether he wants chocolates or gummies.

The kdbuswreck

Posted May 13, 2015 21:43 UTC (Wed) by shentino (guest, #76459) [Link]

Don't unix domain sockets support passing process credentials as out of band data?

If I remember the man page right you can pass pid, uid, gid, and open fds

The kdbuswreck

Posted Apr 22, 2015 20:33 UTC (Wed) by dlang (guest, #313) [Link] (7 responses)

Earlier in the process Linus did ask for examples of where the performance improvement claimed by kdbus actually matters.

The answers seemed to be that it doesn't for anything currently using dbus, but they want to allow other things (like streaming video) to use dbus, and that needs the increased performance.

That branch of the thread faded away a bit, but I don't think it was actually settled.

The kdbuswreck

Posted Apr 22, 2015 22:38 UTC (Wed) by markhahn (guest, #32393) [Link] (4 responses)

Like the dodo isn't "settled". Since the answer amounts to "nothing reasonable needs the performance"...

The kdbuswreck

Posted Apr 23, 2015 8:49 UTC (Thu) by smurf (subscriber, #17840) [Link] (3 responses)

So what would you consider to be reasonable? I'd regard the ability to securely pass large amounts of data without copying them at all, let alone multiple times, to be reasonable enough in itself.

The kdbuswreck

Posted Apr 23, 2015 11:25 UTC (Thu) by mstefani (guest, #31644) [Link] (2 responses)

Isn't that functionality already in? kdbus uses memfd for that and memfd was uncontroversial and generic.

The kdbuswreck

Posted Apr 30, 2015 10:23 UTC (Thu) by metux-its (guest, #102293) [Link] (1 responses)

And even memfd isn't required for that.

Anybody heared of mmap() ? ;-o

The kdbuswreck

Posted Apr 30, 2015 20:00 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> And even memfd isn't required for that. Anybody heared of mmap() ?

The point of using a memfd is that it can be sealed, so there's no risk of a TOCTOU vulnerability in the receiver.

With mmap(), the sender can change what the receiver sees while the receiver is looking at it.

The kdbuswreck

Posted Apr 22, 2015 23:42 UTC (Wed) by BenHutchings (subscriber, #37955) [Link] (1 responses)

I believe the predecessor of kdbus, AF_BUS, was implemented for Genivi and is being used in real IVI systems that have a high D-Bus message rate.

The kdbuswreck

Posted Apr 23, 2015 12:41 UTC (Thu) by daniels (subscriber, #16193) [Link]

> I believe the predecessor of kdbus, AF_BUS, was implemented for Genivi and is being used in real IVI systems that have a high D-Bus message rate.

Yes, and provided very real speedups. Greg linked these a couple of times.

The kdbuswreck

Posted Apr 22, 2015 21:24 UTC (Wed) by mezcalero (subscriber, #45103) [Link] (18 responses)

Jon, there are a number of errors in this text:

- "... the requested action will only be carried out if the requester has CAP_SYS_TIME, CAP_NET_ADMIN, or CAP_SYS_BOOT, respectively." -- this is simply incorrect. Nobody suggested something this. Having these caps should be *sufficient* to trigger the operations, but not *mandatory*. That's quite a difference.

- "... starting with the fact that capabilities are meant to be interpreted by the kernel, not by user space" -- that's hardly a "fact", that's merely an opinion.

- The part about "... Lennart Poettering doesn't see this limitation as a problem because user namespaces are not (yet) heavily used..." is pretty bogus, I never said that. Yes, I don't see that as limitation, but certainly not because userns weren't used, but simply because it is simply the right thing that processes of a different user namespace should not have rights in any other.

- "... feel that a process should explicitly indicate that it intends to perform an action requiring a specific capability before any such information should be sent..." -- this in fact has been implemented already after the first review round of the patches. And this has been mentioned in the various threads many times. Attaching creds is opt-in from both sides: the sender and the receiver of a message. Only if *both* sides allow/want the data it is actually attached.

- "Lennart .., is not thrilled with the suggestion that kdbus should support a user-space privilege mechanism" makes no sense. I never said anything like that, and systemd already supports a userspace authorization framework just fine, and uses it for most of its bus calls. That's what PolicyKit is.

Also, I don't think calling kdbus a "wreck" is appropriate at all.

The kdbuswreck

Posted Apr 22, 2015 21:46 UTC (Wed) by corbet (editor, #1) [Link] (17 responses)

Sigh. This always seems so hard.

"... the requested action will only be carried out if the requester has CAP_SYS_TIME, CAP_NET_ADMIN, or CAP_SYS_BOOT, respectively." -- this is simply incorrect. Nobody suggested something this. Having these caps should be *sufficient* to trigger the operations, but not *mandatory*. That's quite a difference.

Well, then, I'm genuinely confused. If you don't need the capability, why bother checking for it? If you're saying you're doing some other check (user running on the desktop, say), well, I didn't quite catch that. But I said "if", not "iff", so I can claim to have gotten it right :)

"... starting with the fact that capabilities are meant to be interpreted by the kernel, not by user space" -- that's hardly a "fact", that's merely an opinion.

Fine, it's an opinion, could have been expressed better. Obviously not everybody feels that way.

The part about "... Lennart Poettering doesn't see this limitation as a problem because user namespaces are not (yet) heavily used..." is pretty bogus, I never said that

The message from you linked in the article starts with you saying "I have seen no use of userns for sandboxing normal daemons so far. I have seen tons of daemons using caps for such sandboxing." Obviously you think that should have been interpreted some other way?

"... feel that a process should explicitly indicate that it intends to perform an action requiring a specific capability before any such information should be sent..." -- this in fact has been implemented already after the first review round of the patches. And this has been mentioned in the various threads many times. Attaching creds is opt-in from both sides: the sender and the receiver of a message. Only if *both* sides allow/want the data it is actually attached.

As you know, the "optional" nature of this is currently not universally believed. See the message from Andy linked at the point you stopped quoting.

"Lennart .., is not thrilled with the suggestion that kdbus should support a user-space privilege mechanism" makes no sense. I never said anything like that

Well, I quoted what you said. In retrospect it would have been better if I'd said "implement a new" instead of "support". They were suggesting you make something new and independent of capabilities, you clearly didn't like that idea — not entirely unreasonably, IMO.

Also, I don't think calling kdbus a "wreck" is appropriate at all.

...and I never did that. The title refers to the discussion, not the technology. If you think I see kdbus that way maybe you should reread what I've written, I don't think it was that unclear.

If you think the article was an unfair description of the disagreement I am genuinely sorry. I put a lot of time into trying to let all points of view shine through — it was not easy! And honestly, I don't think think it was that far off...

The kdbuswreck

Posted Apr 22, 2015 22:12 UTC (Wed) by mezcalero (subscriber, #45103) [Link] (15 responses)

Reply to your first reply:

"Well, then, I'm genuinely confused. If you don't need the capability, why bother checking for it? If you're saying you're doing some other check (user running on the desktop, say), well, I didn't quite catch that. But I said "if", not "iff", so I can claim to have gotten it right :) "

Well, there are multiple ways how things can be authorized. Here's an example: logind will allow you to kill all processes belonging to a specific user session either if you have CAP_SYS_KILL, or if your user id matches the session's user. Neither of these security checks is mandatory individually, but having one of them is sufficient. That's the exact same way the kernel makes it's permission checks on CAP_SYS_KILL. This isn't an algorithm we invented, that's *HOW THESE THINGS WORK*!

And no, you can *not* claim you got this right, you did not. You wrote "only".

Reply to your third reply:

"The message from you linked in the article starts with you saying "I have seen no use of userns for sandboxing normal daemons so far. I have seen tons of daemons using caps for such sandboxing." Obviously you think that should have been interpreted some other way?

The issue I have is that you connected "Lennart Poettering doesn't see this limitation as a problem" and "user namespaces are not (yet) heavily used" with that little word "because". I said both of these things, but I never said that one was because of the other. That's something you incorrectly made up.

Reply to your fourth reply:

"As you know, the "optional" nature of this is currently not universally believed. See the message from Andy linked at the point you stopped quoting."

Oh well, if you don't believe what the kdbus folks say, how about actually *checking* the kdbus code? It's all open, for review. Also why would you assume that the kdbus developers are dishonest about this?

Reply to your fifth reply:

"Well, I quoted what you said. In retrospect it would have been better if I'd said "implement a new" instead of "support". They were suggesting you make something new and independent of capabilities, you clearly didn't like that idea — not entirely unreasonably, IMO."

There are two things you changed from what I said. In the mail you linked I said "...comprehensive new access control systems that can be used for in-kernel and in-userspace subsystems". First as you noticed by now, I said "new". Secondly, I said "comprehensive ... access control system ... for in-kernel and in-userspace subsystems", the emphasis being on *both* in-kernel and in-userspace here: caps can be that. PK cannot, it is userspace-only, and will never make sense in the kernel and it shouldn't have to.

Reply to your sixth reply:

You called this "kdbuswreck", not "kdbus discussion wreck" or similar. You know exactly how this works: people read the title and skip over the text, and "kdbus" and "wreck" is all that'll be stuck.

Anyway, please be more careful next time.

The kdbuswreck

Posted Apr 22, 2015 22:24 UTC (Wed) by corbet (editor, #1) [Link] (5 responses)

Oh well, if you don't believe what the kdbus folks say, how about actually *checking* the kdbus code? It's all open, for review. Also why would you assume that the kdbus developers are dishonest about this?

Oh come on, now you are just looking for trouble. Who said anything about dishonesty?

From Andy:

But I don't believe that for a second. AFAICS sd-bus (maybe the primary implementation) will always set that flag if for no other reason than that it *doesn't know* when the client is trying to assert a capability. So we'd be giving users a gun which is, in practice, only ever pointed at the users' feet.

He's not calling anybody dishonest either. He's saying the optionality at one level of the code is unlikely to make it through to real-world use. I believe you knew this.

With regard to the title...perhaps it was a bad choice, but "buswreck" (or "trainwreck") is a fairly common English term for an unfortunate situation. I still believe that you have to stretch pretty hard to say that "The kdbuswreck" (note "the") somehow refers to the code. And I'm somewhat amused by your statement that people read only my titles and not the actual text...

The kdbuswreck

Posted Apr 22, 2015 22:31 UTC (Wed) by branden (guest, #7029) [Link] (2 responses)

Next time, dear editor, just call it a clusterf*ck. :-|

The kdbuswreck

Posted Apr 23, 2015 4:21 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

"The kdbust" has a more hopeless ring to it. :)

The kdbuswreck

Posted Apr 24, 2015 13:24 UTC (Fri) by ncm (guest, #165) [Link]

It would most precisely be called a "dust-up".

The kdbuswreck

Posted Apr 22, 2015 22:56 UTC (Wed) by mezcalero (subscriber, #45103) [Link]

To state this clearly: sd-bus allows overriding of both creds mask. By default though the receiving mask sets uid/pid/selinux label/caps, since that what is necessary for basic authentication. The sending mask allows all bits. If you choose to deviate from this, you can freely set other masks, note though that if you suppress the creds necessary for authorization this has the effect that all services that want to authorize will deny access to you, but I figure that's hardly surprising.

The kdbuswreck

Posted Apr 23, 2015 8:22 UTC (Thu) by edomaur (subscriber, #14520) [Link]

Well, I agree with Lennart, before reading the article, I assumed that it was, in fact, about the codebase.

The kdbuswreck

Posted Apr 22, 2015 22:37 UTC (Wed) by corbet (editor, #1) [Link] (8 responses)

Just for the record, I have made a few tweaks to the article in response to these complaints.

The kdbuswreck

Posted Apr 22, 2015 22:51 UTC (Wed) by mezcalero (subscriber, #45103) [Link]

Thank you very much, much appreciated!

The kdbuswreck

Posted Apr 23, 2015 1:57 UTC (Thu) by JdGordy (subscriber, #70103) [Link] (2 responses)

Reading the article after the corrections, I had no way of knowing if those strikethrough's were actual corrections or sarcastic jabs. You probably want to make it clear at the start (of better yet, just remove the corrected bits?)

Corrections

Posted Apr 23, 2015 2:03 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

Our policy is to not make silent changes to published articles for anything other than trivial typo fixes; an article shouldn't quietly mutate after it has been put out there. So the old stuff remains, even though I'd be happy to see it go. I did stick in a note up front noting that corrections have been made, though.

Corrections

Posted Apr 25, 2015 16:13 UTC (Sat) by Trelane (subscriber, #56877) [Link]

I saw this and find it greatly refreshing. This sort of transparency ought to be more prevalent in the fourth and fifth estates.

Your policy is fantastic. Thank you for it.

The kdbuswreck

Posted Apr 23, 2015 8:06 UTC (Thu) by speedster1 (guest, #8143) [Link] (3 responses)

IMO this was another excellent article on a complicated topic, and the fact that certain details could be improved by someone intimately involved with the effort being discussed does not imply otherwise.

LWN rocks!

Posted Apr 23, 2015 9:22 UTC (Thu) by rvfh (guest, #31018) [Link] (2 responses)

Isn't this the only place where we can actually talk to the editor and have explanations and even corrections made?

I have read so much bullsh*t on 'respected' newspaper sites where even commenting is useless because so many people need to send stupid comments without thinking for just one second, that seeing Jon listening, explaining and even fixing his already excellent article is like a breath of fresh air.

I really like LWN.

LWN rocks!

Posted Apr 23, 2015 20:40 UTC (Thu) by a9db0 (subscriber, #2181) [Link] (1 responses)

I second this.

Thank you, Jon, first for continuing to address thorny issues and make sense of them, and second for being responsive to your readers.

Dave

LWN rocks!

Posted Apr 25, 2015 12:19 UTC (Sat) by louai (guest, #58033) [Link]

LWN is awesome. Thank you very much indeed for all the hard work!

For what it's worth, I think the article is balanced and very informative.

Louai

The kdbuswreck

Posted Apr 22, 2015 22:13 UTC (Wed) by jjmarin (guest, #53201) [Link]

I agree that "The kdbuswreck" is a misleading title, IMHO, I think the title should be more precise to convey the general meaning of the article, maybe something like "The kdbus debate wreck"... anyway, I'm sure there must be a much better suggestion for the title :-)

The kdbuswreck

Posted Apr 22, 2015 22:29 UTC (Wed) by branden (guest, #7029) [Link]

All this hyperventilating over the article title strikes me as disingenuous, or at best implausibly unrealistic.

Why?

Let us imagine that kdbus is merged.

It is fantastically implausible that kdbus will never be found to have a bug that crashes the kernel.

The odds are middling to good that it would warrant an LWN article when it did.

Thus, people would be able to speak with all justice, and Mr. Corbet would be able to thoroughly appropriately title his article, something like:

"The kdbus crash"

Think of it this way--like ObamaCare, maybe you're getting your bad PR out of the way early.

The kdbuswreck

Posted Apr 22, 2015 22:38 UTC (Wed) by fandingo (guest, #67019) [Link] (45 responses)

I tend to believe that capabilities are such a mess because no one uses them. If they were used more, they might actually be more consistent across the kernel since users would actually want to do something beside use the CAP_SYS_ADMIN bludgeon.

I suppose one "hack" that could be made is to say that it's a kdbus_capability, and they just so happen to correspond to the kernel capabilities. In the future, these kdbus capabilities could diverge from the kernel ones if there are compelling policy metadata that the kernel should deliver, or even better, kernel capabilities are fixed (from both a policy perspective and implementation throughout the kernel) and kdbus just continues to mirror.

> Eric Biederman was quick to suggest that this extension of the CAP_SYS_BOOT capability could be helpful to an attacker.

>> You mean all I need to do to get around all of the logging servers is
capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program
that doesn't run as root so that it can only reboot the system? HeHeHe
So I can just trigger a clean reboot wait for journald, auditd, and
syslog all to shut down and then do evil things to the machine without
having to worry about erasing forensic evidence?

Supposing that an attacker gets CAP_SYS_BOOT, how exactly does the attack "wait for journald, auditd, and syslog all to shut down and then do evil things?" It's too dependent on improper shutdown order where logging services are stopped before other services. Additionally, I'm somewhat skeptical that a process has CAP_SYS_BOOT *and* has access to any worthwhile stuff while at the same time not having a more useful cap to exploit. There's already the same vulnerability, albeit with a shorter window: execute malicious code and issue reboot() before logging is durably written somewhere.

Kernel capabilities are fundamentally in such a sorry state because they've never been useful. It's clear that no one is going to step up and magically start improving them in the hopes that others will make use of them. Therefore, either scrap them entirely or start using them, expose the warts, and fix them.

The kdbuswreck

Posted Apr 23, 2015 0:00 UTC (Thu) by ncm (guest, #165) [Link] (8 responses)

They can't be scrapped, and can't be fixed. To me that means "stay away. Far away".

The kdbuswreck

Posted Apr 23, 2015 5:14 UTC (Thu) by hrogge (guest, #100012) [Link] (7 responses)

The problem is that I have yet to see anyone suggesting how a solution to the problem could look like.

Is there any good concept how the API of a "permission" system to replace capabilities could look like? Maybe even one that could be initialized through the capabilities API so there is a graceful fallback?

Saying "capabilities is bad, don't us them" is not really helpful without a good suggestion how to replace them.

The kdbuswreck

Posted Apr 23, 2015 11:31 UTC (Thu) by HIGHGuY (subscriber, #62277) [Link]

Could capabilites be translated to BPF filters on syscalls? Probably inheritance of these filters may be missing...

The kdbuswreck

Posted Apr 23, 2015 12:34 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

From what I've seen, Capsicum[1] appears to be something that would be much better.

[1]http://www.cl.cam.ac.uk/research/security/capsicum/

The kdbuswreck

Posted Apr 23, 2015 13:33 UTC (Thu) by justincormack (subscriber, #70439) [Link]

Yes Capsicum is along the right lines, you could send file descriptors embodying the capabilities. I don't think it has enough capabilities for some of the things being discussed, but sending file descriptors is the right way...

The kdbuswreck

Posted Apr 23, 2015 23:38 UTC (Thu) by neilbrown (subscriber, #359) [Link]

> The problem is that I have yet to see anyone suggesting how a solution to the problem could look like.

What is "the problem" - specifically? Once you have that clearly stated, the solution probably isn't far away (and it probably involves file descriptors - most good solutions do).

> Maybe even one that could be initialized through the capabilities API so there is a graceful fallback?

I think the capabilities API could be part of the problem, not part of the solution.

Consider Superman. He can jump without flying, can look without burning holes, can listen without hearing every mouse's footstep. His super-powers only take effect when he wants them too.
If "capabilities" don't need to be explicitly activated every time they are used, then they are really just "defaults". And default super-powers can cause a mess.

The kdbuswreck

Posted Apr 30, 2015 10:44 UTC (Thu) by metux-its (guest, #102293) [Link] (2 responses)

At this point, I'd rather raise the question, what's _actual_ problem to solve ? Some practical real-world usecases ?

Things like allowing certain (otherwise unprivileged) processes or users to trigger a shutdown (via the init system, of course) can be easily done with traditional unix mechanisms. No need for caps, nor dbus.

The kdbuswreck

Posted Apr 30, 2015 14:29 UTC (Thu) by ksandstr (guest, #60862) [Link] (1 responses)

Simply put, the idea with capabilities was that a process that has root privileges shouldn't be able to do all that root can, but instead just a narrow slice thereof. This restricts setuid binaries to privileges required for their stated purpose, and nothing else. The desired upshot is a limitation of the damage from successful compromise of setuid binaries and (restrictable) processes otherwise running as root.

It likely didn't help that academia at the time was still mildly abuzz with capability-based this and capability-based that, and that the relevant research papers would read like exercises in ontological wank -- for example, calling a process' knowledge of a path name a "capability" as it makes the process capable of accessing that entry (or discovering that it cannot). While that way of looking at things does account for things like forking (which implicitly copies data such as pathnames), it has precious little to do with the split-root capability mechanism of Linux besides having a word in common and an application in the field of access control.

Historically, then, a "capability" can mean basically everything, which makes it a good word for marketing towards the uncritical and unwary much like "the cloud". [Imagine a snarky remark wrt implied corporate braindamage in systemd here.]

The kdbuswreck

Posted Apr 30, 2015 18:25 UTC (Thu) by ms_43 (subscriber, #99293) [Link]

You should not confuse POSIX.1e capabilities, as implemented by Linux, with the capabilities described in security research literature for many years, which are quite precisely defined (and I really wonder why the POSIX committee used that term).

Linux also has *those* capabilities (in a very limited form), they are just called "file descriptors".

The closest you're going to get to a capability-based security model with a traditional UNIX-like kernel is Capsicum.

http://lwn.net/Articles/482858/

(Insert standard rant about kids these days thinking that "operating system" is a synonym for UNIX)

The kdbuswreck

Posted Apr 23, 2015 8:55 UTC (Thu) by kentonv (subscriber, #92073) [Link] (35 responses)

Linux's capabilities actually have almost nothing to do with true "capability-based security" e.g. as implemented by Capsicum, Cap'n Proto / Sandstorm.io, the E programming language, Google Caja, etc. At Sandstorm, in order to disambiguate, we've taken to calling Linux/POSIX capabilities "crapabilities" instead.

Crapabilities are broken because, among other things:

* Crapabilities are hopelessly inexpressive, largely because they designate verbs rather than nouns. Choosing a random example, CAP_KILL lets me send signals to any process. Probably in any case where this might be useful, what would be *more* useful would be the ability to designate *specific* processes which I'm allowed to signal, or the specific signals I'm allowed to send.

* Due in no small part to the previous point, most individual crapabilities seem to end up opening a trivial privilege escalation to full root, and thus don't offer much actual protection compared to full root. Those that don't at least tend to allow trivial DoS attacks.

* The design goes out of its way to make it difficult to delegate crapabilities between processes or programs. E.g. if you aren't UID zero, you basically can't delegate a crapability through exec() (unless the system admin blesses all binaries involved, which in practice is often impractical, especially when, say, you want to invoke the shell to execute a command or script, so now you need to bless /bin/sh). This severely limits how systems can be designed -- you can't have a bunch of loosely-coupled programs that exec() each other in the unix tradition; you instead must have one monolithic binary. These harsh restrictions do not provide any actual security benefit -- a program which wishes to maliciously leak its crapabilities can always arrange to listen on a socket and execute operations requested by anyone.

* Crapabilities are "ambient authority". It is difficult for a program to specify exactly when it intends to exercise a crapability and when it doesn't, and as a result it tends to be easy to trick programs into exercising them at the wrong time, commonly known as a "confused deputy attack". It sounds like kdbus is likely to exacerbate this, by allowing a process's capabilities to unexpectedly affect the way other processes react to its dbus requests.

The alternative proposal is simple: represent capabilities (the real kind) as file descriptors. A privileged process could open file descriptors representing specific capabilities, like, say, the ability to send signals to some process. It could then delegate power by transmitting said file descriptors via the usual means -- parent->child inheritance, SCM_RIGHTS, etc. To exercise such a capability, the user must explicitly pass the capability file descriptor to whatever system call exercises it. (Think of the *at() (openat(), etc.) system calls, which (sort of) do this for the file system.)

The Capsicum project is aiming to implement this vision, and is doing it right. Capsicum is already in FreeBSD and should be accepted into Linux as well. (I'm not affiliated with Capsicum, but I am the lead dev of Sandstorm which is built on the same principles.)

The kdbuswreck

Posted Apr 24, 2015 14:04 UTC (Fri) by meuh (guest, #22042) [Link] (34 responses)

The file descriptors as capabilities is an interesting design ... but how could it replace the 'setuid' binaries (which can be replaced with file based capabilities) ?

The kdbuswreck

Posted Apr 24, 2015 17:32 UTC (Fri) by cesarb (subscriber, #6266) [Link] (33 responses)

> The file descriptors as capabilities is an interesting design ... but how could it replace the 'setuid' binaries (which can be replaced with file based capabilities) ?

Setuid binaries could be replaced by services. For instance, instead of a setuid-root "passwd" executable, have a non-setuid executable which talks to a "passwd" service running as root.

In this example, the "passwd" service could have only the capabilities it needs (for instance, read-write access to /etc/shadow), and the non-setuid executable could also have only the capabilities it needs (for instance, read-write access to its tty). The service could be spawned on-demand by a system-wide process launcher.

The kdbuswreck

Posted Apr 24, 2015 18:29 UTC (Fri) by kentonv (subscriber, #92073) [Link] (31 responses)

Yep, that works.

Another, somewhat more radical approach: The user, when they log in, could receive an "account management" capability. This capability isn't implemented by the kernel; it's just a unix socket FD to the account management service, which implements some network protocol with operations like "change password". This socket is specific to the user; the management service assumes that any messages received on it have the full authority of the user, without needing to explicitly check credentials. This socket is never linked into the filesystem, but is created as a socketpair and then passed to the user's login process as, say, FD 3. In theory, the user would then even be able to decide which processes that they run should have access to this capability and which shouldn't, by deciding to pass the cap through or not.

Of course, at this point we're talking about a very different world from the status quo. It's unlikely that we'll rewrite all our tools to work this way anytime soon. But Capsicum is a step in the right direction, whereas crapabilities are not.

The kdbuswreck

Posted Apr 24, 2015 19:14 UTC (Fri) by fandingo (guest, #67019) [Link] (25 responses)

Or, you know, just use the already existing DBus session bus with polkit tha does the same thing.

That also sidesteps the problem of how programs talk to the login process to send privileged requests to services.

The kdbuswreck

Posted Apr 25, 2015 0:17 UTC (Sat) by kentonv (subscriber, #92073) [Link] (24 responses)

As I understand it, passing FDs over dbus is a common thing to do, and that's great. That can easily extend to capsicum-style capabilities.

What I'm arguing against is expanding the use of crapabilities, as kdbus does. If the status quo doesn't already depend on crapability passing in this way then let's not add it now; let's create designs based on FD passing instead.

(I also object to dbus being awfully singleton-y with global namespaces and such, but that ship obviously sailed long ago, so maybe it's not useful to argue now. But: http://www.object-oriented-security.org/lets-argue/single...)

The kdbuswreck

Posted Apr 25, 2015 1:09 UTC (Sat) by fandingo (guest, #67019) [Link] (23 responses)

> As I understand it, passing FDs over dbus is a common thing to do

While it will continue to be possible with kdbus, it will probably become less common. The main reason to pass a traditional FD is performance, which was a major problem with userspace DBus due to memory copying routinely up to 11 times between calls. In the future, I expect services to prefer passing data using memfd* rather than handing over a FD (>512KiB was experimentally determined to be a good default threshold). That way the service can not only control the syscall operations but also validate individual operations and the data therein.

* Of course memfd behave as FDs.

> What I'm arguing against is expanding the use of crapabilities, as kdbus does.

I disagree that it's a material expansion. It just allows those same capabilities to use systemd tools rather than going directly to the kernel with syscalls. For example, a user with CAP_SYS_BOOT (i.e. with access to run a program with that cap) can essentially panic the kernel, triggering a reboot. Systemd allows that same capability to be utilized for an orderly shutdown via systemd. The orderly shutdown case -- at least in my opinion and the kdbus/systemd developers' -- is probably what most people consider natural. Same thing with CAP_SYS_KILL. A user automatically can kill things in their session from systemd policy, but this allows a user to kill units through systemd (i.e. systemctl kill FOO). Again, that's totally natural for a system using systemd; otherwise, users would have to manually kill each process in a service and be unable to deal with systemd unit automatic restart, unless they were uid == 0, which would allow her to run `systemctl stop/kill FOO`.

The way that systemd will use kdbus' capability metadata allows for essentially the same control that the syscalls allowed to go through systemd's more featureful equivalents.

> (I also object to dbus being awfully singleton-y with global namespaces and such, but that ship obviously sailed long ago, so maybe it's not useful to argue now. But: http://www.object-oriented-security.org/lets-argue/single...)

I don't really see how DBus qualifies. I guess you can only have one resource occupy a distinguished name (i.e. org.foo.bar), but it's difficult to envision how that could possibly work differently. Furthermore, your idea of passing a FD to the logon process seems like it introduces a much "worse" singleton. Perhaps I just don't understand the argument in that link.

Fundamentally, I don't see how the combination of privileged executors authorized via polkit, memfd, and a LSM don't offer the same functionality of that all 8 components of capsicum. Perhaps you could explain something that could be done with capsicum that cannot be done with a combination of what I mentioned.

The kdbuswreck

Posted Apr 25, 2015 6:28 UTC (Sat) by kentonv (subscriber, #92073) [Link] (22 responses)

> I disagree that it's a material expansion.

I defer to Andy Lutomirski's arguments on LKML, since he's already said many of the same things I would say.

> I guess you can only have one resource occupy a distinguished name (i.e. org.foo.bar)

Yes, that's essentially the problem.

> it's difficult to envision how that could possibly work differently.

You don't really want "the" org.foo.bar, you want "an" org.foo.bar. Multiple applications should be able to export objects implementing the org.foo.bar interface and the user should be able to choose which one to use for each app (or choose none, i.e. disallow access).

> Furthermore, your idea of passing a FD to the logon process seems like it introduces a much "worse" singleton.

Not at all. The whole point is, any process can implement the "account management" interface for itself, and pass its own implementation down to children. This lets you do all kinds of magical things that are hard or infeasible today, like:
- Sandboxing: Just wrap the capability with an implementation that blocks or mocks out requests that you want to disallow.
- Testability: You can run an app against a mock capability instead of the real one for testing purposes.
- Monitoring/auditing: See what apps are doing by injecting an interceptor that logs requests.
- Composability: Apps can be composed on top of different back-ends to produce novel functionality. Like maybe instead of managing local users, you want to manage users on your remote server, but you want to use a GUI app that was written only with local users in mind. No problem, just swap the local cap for the remote one and it works. No need to go edit the GUI app to support a different kind of back-end.

> Perhaps you could explain something that could be done with capsicum that cannot be done with a combination of what I mentioned.

In addition to the above, expressing security policies in terms of capabilities is generally easier and less error-prone that expressing them in terms of ACLs or policy files. This is hard to prove in the space of an LWN post, but after working with them for a while you won't want to go back.

The kdbuswreck

Posted Apr 25, 2015 8:08 UTC (Sat) by cortana (subscriber, #24596) [Link] (21 responses)

I believe you've misunderstood the D-Bus model. You are confusing addresses with interfaces. This is understandable since in simple interfaces the same string is used to identify both!

In order to communicate with a peer, you need to talk to an address. For instance, "org.freedesktop.ColorManager". This is a well-known name, that the client would use to identify *who* it wants to talk to. The dbus-daemon has a set of rules that determine which processes are allowed to take ownership of a well-known name. This ensures that you are really talking to colord, and not an imposter. These are currently defined by files in /etc/dbus-1/systemd.d; I don't know what happens to them in the brave new kdbus world.

Once you've connected to an address, you can obtain a list of objects exported by the peer. Simple services will just export one object, such as "/org/freedesktop/ColorManager"; by convention these are similar to the address above, but with a slash instead of a period used to separate components.

More complex interfaces will export additional objects, for instance /org/freedesktop/ColorManager/devices/printer1, /org/freedesktop/ColorManager/devices/profiles/icc_{hexstring} and so on. In colord's case these are used to represent different devices and colour profiles on the system.

Having selected an object to talk to, for instance, /org/freedesktop/ColorManager, you now choose an *interface*. Even in the case of the simplest service, that just exports a single object, that object will export multiple interfaces. In our example we see 'org.freedesktop.DBus.Introspectable', 'org.freedesktop.DBus.Peer' and 'org.freedesktop.DBus.Properties' and finally 'org.freedesktop.ColorManager'.

Now, *these* are the interfaces that you mistook for addresses earlier. All the methods & properties supported by an object are accessed via one of these interfaces. So in order to find a list of the colour sensing devices attached to my computer, I would call the GetSensors() method of the 'org.freedesktop.ColorManager' interface of the /org/freedesktop/ColorManager object exported by the 'org.freedesktop.ColorManager' peer. Whereas if I wanted to see which interfaces an object implements, along with their associated methods and properties, I would call the Introspect() method of the 'org.freedesktop.DBus.Introspectable' interface of the same object. Those three 'org.freedesktop.DBus.*' interfaces, by the way, are supported by every object on the bus.

So, if you wanted to communicate with a special-purpose peer for testing, you would tell your code to connect to a different *address*, e.g. ":1.340" (these are non-well-known addresses that are assigned when a client connects to the dbus-daemon), but the same object and interface.

Hope that makes sense; if not then please install the dbus introspection tool "d-feet" and click around for a couple of minutes, things should become obvious then. For command-line introspection I usually use the 'qdbus' program, but it's much quicker to navigate the peers on a bus with d-feet, so start there.

Personally I think the D-Bus model is a little over-complex, and the fact that simple services use the same name for each of their address, object path _and_ interface makes things appear a more obscure than they really are, but we are where we are and D-Bus has gained a lot of adoption over the last 10 years, having replaced the IPC mechanisms previously used by both GNOME and KDE.

That said, I really do wish there was a dbus(7) man page that explained the above in simple terms that would be useful for busy sysadmins and users curious about the internals of their system. :)

The kdbuswreck

Posted Apr 25, 2015 12:27 UTC (Sat) by lsl (subscriber, #86508) [Link] (12 responses)

> Having selected an object to talk to, for instance, /org/freedesktop/ColorManager, you now choose an *interface*

That seems backwards to me. Why would I even care what object I talk to? I just want *some* object that implements the interface I need.

So when calling methods on the 'org.freedesktop.ColorManager' interface those get dispatched to an implementation that makes sense accorrding to local system configuration, say colord, KolorManager or whatever the user set up for this.

Is it possible to sanely use dbus this way? I mean, I can certainly enumerate the bus und search for something that implements the wanted interface but that doesn't seem reasonable.

So let's take a step back here. How would one implement the concept "I want this functionality but I don't care who provides it" in dbus? Are interfaces a red herring here and I better look at well-known names? What those resolve to is up to system configuration, right? So is this the point where it is commonly decided what program will handle my requests regarding color management? Whatever owns the name?

The kdbuswreck

Posted Apr 25, 2015 13:34 UTC (Sat) by mchapman (subscriber, #66589) [Link] (5 responses)

> How would one implement the concept "I want this functionality but I don't care who provides it" in dbus?

I would say that is *exactly* what D-Bus provides now. There is nothing in D-Bus's policy configuration that locks a service to a particular binary. A service name can be claimed by any process that matches that service's policy (for system services this is typically just a check that the connection was authenticated as root). Of course, only one D-Bus connection can own a service name at any particular time.

Service activation is a bit different, as D-Bus (or systemd, if it's doing the activation) needs to know which binary to launch when the service is requested. I suppose you could use something like alternatives to cater for multiple implementations of that service.

The kdbuswreck

Posted Apr 25, 2015 15:17 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

> Service activation is a bit different, as D-Bus (or systemd, if it's doing the activation) needs to know which binary to launch when the service is requested. I suppose you could use something like alternatives to cater for multiple implementations of that service.

There should already be examples for this at the session level with the different NetworkManager UIs, kwallet vs. gnome-keyring, etc.

The kdbuswreck

Posted Apr 27, 2015 12:16 UTC (Mon) by javispedro (guest, #83660) [Link] (3 responses)

> I would say that is *exactly* what D-Bus provides now. There is nothing in D-Bus's policy configuration that locks a service to a particular binary. A service name can be claimed by any process that matches that service's policy (for system services this is typically just a check that the connection was authenticated as root). Of course, only one D-Bus connection can own a service name at any particular time.

You actually _cannot_ do what is being asked with current D-Bus, and it is one of my major gripes with it (which is why I prefer anything else over it). The "using the service name as what every other IPC would call interface name" idea works well until you realize you cannot register more than one service with the same name. Thus, you start seeing ugly tricks such as what MPRIS does, which require greping over the service names, etc.

The kdbuswreck

Posted Apr 27, 2015 12:54 UTC (Mon) by mchapman (subscriber, #66589) [Link] (2 responses)

> The "using the service name as what every other IPC would call interface name" idea works well until you realize you cannot register more than one service with the same name. Thus, you start seeing ugly tricks such as what MPRIS does, which require greping over the service names, etc.

I don't think it's necessarily an ugly trick. The difference between getting a list of service names with some prefix and in getting a list of services providing some object (assuming this were even possible with D-Bus) is mostly superficial.

If efficiency is the problem, getting a list of service names for things like MPRIS could be optimized by extending the protocol slightly, e.g. by having org.freedesktop.DBus.ListNames take a prefix as an argument.

But allowing a service name to be owned by at most one connection is essential for lsl's use case. You can't sanely dispatch "to an implementation that makes sense accorrding to local system configuration" if more than one such implementation is on the bus at the same time.

The kdbuswreck

Posted Apr 27, 2015 13:28 UTC (Mon) by MrWim (subscriber, #47432) [Link]

I don't think it's necessarily an ugly trick. The difference between getting a list of service names with some prefix and in getting a list of services providing some object (assuming this were even possible with D-Bus) is mostly superficial.

If efficiency is the problem, getting a list of service names for things like MPRIS could be optimized by extending the protocol slightly, e.g. by having org.freedesktop.DBus.ListNames take a prefix as an argument.

Indeed, this is the purpose of the arg0namespace match rule. You register for NameOwnerChanged events with some prefix, call ListNames (or ListActivatableNames) filtering on the prefix and then you can efficiently and asynchronously keep your local list of remote names up-to-date.

The kdbuswreck

Posted Apr 28, 2015 8:10 UTC (Tue) by javispedro (guest, #83660) [Link]

It introduces a bunch of additional problems. For example, what if you want the same process to expose more than one instance of the service? You will be hit by the fact that despite having multiple service names you still have one object namespace only... and no way to setup different policies, etc.

The MPRIS trick obviously works, but it puts the design of DBus upside down.

The kdbuswreck

Posted Apr 27, 2015 6:45 UTC (Mon) by krake (guest, #55996) [Link]

> That seems backwards to me. Why would I even care what object I talk to? I just want *some* object that implements the interface I need.

It will depend on the type of service, i.e. if there is some object related context.

For example, a service which provides functionality on a set of real world objects will expose these objects again as a set of D-Bus objects.
It makes it easier for programmers on both sides (service and clients) if there is a one-to-one mapping, e.g. NetworkManager exposing each network device as a separate object.

For a service that provides only one interface on one object, the convention seems to be to use the same name parts for the well-known connection name, the object path and the interface name (with respective separator characters).

The kdbuswreck

Posted Apr 27, 2015 7:30 UTC (Mon) by cortana (subscriber, #24596) [Link] (2 responses)

> That seems backwards to me. Why would I even care what object I talk to? I just want *some* object that implements the interface I need.

If you're saving passwords then you want to be sure that the 'org.freedesktop.secrets' address has not been taken by a password-stealing program.

The kdbuswreck

Posted Apr 27, 2015 9:06 UTC (Mon) by mchapman (subscriber, #66589) [Link] (1 responses)

> If you're saving passwords then you want to be sure that the 'org.freedesktop.secrets' address has not been taken by a password-stealing program.

That seems like a completely orthogonal problem to me.

I'm going to reiterate what I said in my other post: D-Bus *already provides* the ability for a client to talk to "any object that implements a particular interface": simply replace the word "object" with "service" and "interface" with "object".

The kdbuswreck

Posted Apr 27, 2015 9:16 UTC (Mon) by mchapman (subscriber, #66589) [Link]

> I'm going to reiterate what I said in my other post: D-Bus *already provides* the ability for a client to talk to "any object that implements a particular interface": simply replace the word "object" with "service" and "interface" with "object".

Meh, I screwed that comment up. I should have said: simply replace the word "object" with "connection" and "interface" with "service".

That is, a D-Bus client does not care what connection provides a particular service; it relies on bus policy for that to be authorized appropriately.

That being said, I have the feeling there is very little stopping some malicious piece of software from killing off gnome-keyring-daemon, say, and grabbing the org.freedesktop.secrets bus name before GNOME has a chance to restart the daemon.

The kdbuswreck

Posted Apr 27, 2015 15:17 UTC (Mon) by hp (guest, #5220) [Link] (1 responses)

> That seems backwards to me. Why would I even care what object I talk to? > I just want *some* object that implements the interface I need.

An interface is implemented by N objects, so for example an interface might be implemented by each open document in a word processor. I would say you do not want "some document that implements the Document interface" when you call `org.whatever.Document.Delete()`, you want the specific document you plan to delete :-)

*Services* are generally pluggable - i.e. the entire word processor application, could implement a set of objects (each with a set of interfaces) conforming to some sort of standard, potentially, and then you could interop with whichever word processor owns a certain `org.whatever.WordProcessor` service, or something.

Well-known name: like a DNS entry, a way to find an entire *program* to talk to (service locator)

Object path: equivalent to a pointer ... a specific instance of an object in the "object-oriented programming" sense of object

Interface: means same thing as in Java (set of methods on an object instance)

The fact that some programs have only one object instance with only one interface, in no way means that these are redundant.

Yes you can write a program in Java that only contains `class MyProgram` and `static MyProgram theInstanceOfMyProgram = new MyProgram()`.

This does not mean that Java should _only_ provide support for singleton objects!

The kdbuswreck

Posted Apr 27, 2015 18:59 UTC (Mon) by lsl (subscriber, #86508) [Link]

> An interface is implemented by N objects, so for example an interface might be implemented by each open document in a word processor. I would say you do not want "some document that implements the Document interface" when you call `org.whatever.Document.Delete()`, you want the specific document you plan to delete :-)

Ah ok, thanks. Didn't thought about it that way. For most of the stuff on my local system bus it wouldn't make a difference: it doesn't matter who tells me the hostname or who is going to set the timezone. But then there's logind (which I missed the last time), where it in fact matters whose session is going to be terminated.

The kdbuswreck

Posted Apr 25, 2015 19:55 UTC (Sat) by kentonv (subscriber, #92073) [Link] (7 responses)

Sorry, I don't think I've expressed my point very clearly.

When I say "singleton" what I essentially mean is "an object addressed by a global well-known name or path". The problems that I have with singletons are not fixed by saying "ok, you can have a list of objects with different names" -- all those objects are still singletons.

For example, the path "/org/freedesktop/ColorManager/devices/printer1" refers to the *same* device regardless of who is calling. The problem with this is that it means the calling code decides which printer to connect to. That's bad because:

1. It's probably the user, not the app, that knows best which printer to connect to. So now the app needs to implement a picker dialog. Many apps will skip this and just hard-code the first object. (In practice you don't usually see this problem for printers, but you *do* see it for, say, audio output devices. Every app that plays audio should be asking me which device to use, but, sadly, they do not. I must choose a system-wide default device, and I cannot easily have different apps playing to different speakers. Yes, some systems support advanced configuration of audio sources and destinations within the audio control panel, but my point is that we should have this kind of configurability for all resources.)

2. The app necessarily has the ability to enumerate the devices and connect to all of them. For security reasons, it would be better if the app *only* had the ability to connect to the device that I, as the user, chose for it to access. Traditionally desktop systems have made the unfortunate assumption that I trust all my apps to wield all the power of my user account, but I'd really prefer that each of my apps runs in a sandbox with only the power it needs to do its job.

What I want is for apps to make requests like "I need something that implements org.freedesktop.AudioOutput" (or whatever interface), and then the system displays a dialog to the *user* asking which device or service to use. The app only ever receives access to the device the user chooses, and the app can be used in a broader range of use cases without burdening the app developer with implementing the requisite configurability.

The kdbuswreck

Posted Apr 25, 2015 20:38 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

For example, the path "/org/freedesktop/ColorManager/devices/printer1" refers to the *same* file regardless of who is opening it. The problem with this is that it means the calling code decides which file to open. That's bad because:

1. It's probably the user, not the app, that knows best which file to open. So now the app needs to implement a picker dialog. Many apps will skip this and just hard-code the first file. (In practice you don't usually see this problem for files, but you *do* see it for, say, audio output devices. Every app that plays audio should be asking me which device to use, but, sadly, they do not. I must choose a system-wide default device, and I cannot easily have different apps playing to different speakers. Yes, some systems support advanced configuration of audio sources and destinations within the audio control panel, but my point is that we should have this kind of configurability for all resources.)

2. The app necessarily has the ability to enumerate the files and open all of them. For security reasons, it would be better if the app *only* had the ability to open the file that I, as the user, chose for it to access. Traditionally desktop systems have made the unfortunate assumption that I trust all my apps to wield all the power of my user account, but I'd really prefer that each of my apps runs in a sandbox with only the power it needs to do its job.
...

> What I want is for apps to make requests like "I need something that implements org.freedesktop.AudioOutput" (or whatever interface), and then the system displays a dialog to the *user* asking which device or service to use.
Try to watch DBUS with a sniffer. Now imagine that you have to MANUALLY select each and every endpoint.

The kdbuswreck

Posted Apr 25, 2015 21:03 UTC (Sat) by kentonv (subscriber, #92073) [Link] (5 responses)

> Try to watch DBUS with a sniffer. Now imagine that you have to MANUALLY select each and every endpoint.

Yes obviously what I'm describing can't be dropped on top of the existing set of dbus endpoints and just work. Lots of stuff would need to be redesigned and organized differently. It is possible to design such an environment and have it work well (CapDesk did it, and Sandstorm.io is doing it), but I don't honestly expect today's dbus-using desktop environments to entirely switch anytime soon. Still, it's useful for people to understand the ideal in order to guide improvements to what we have today.

The kdbuswreck

Posted Apr 25, 2015 22:42 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

I still have no idea how your design will be any different from the current one. You still need to have well-known starting points for lookups.

The kdbuswreck

Posted Apr 26, 2015 20:17 UTC (Sun) by luto (guest, #39314) [Link] (1 responses)

Not really. Yes, something needs to create a starting point, but that something could just be whatever creates the resource in the first place.

For example, gdm or logind could start my shell with access to an object implementing the "find a printer" interface. Programs that inherit access to that object would use it.

Sandboxed programs, on the other hand, might get access to a different "find a printer" interface that behaves differently.

dbus can do this right now. On my Fedora 21 system, my shell and everything it starts has access to a standard implementation of a lot of these things. It looks like:

DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-qB3T8DFwej,guid=1453a3565ca58487e6a024fe5538ad89

Too bad that doesn't seem to apply to the system bus.

The kdbuswreck

Posted Apr 27, 2015 4:12 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

There's nothing really special about the system bus. It's possible to override it for each application.

But as I understand, it was designed to be a globally visible namespace with access being controlled by PolKit.

The kdbuswreck

Posted Apr 26, 2015 20:19 UTC (Sun) by kentonv (subscriber, #92073) [Link]

The "starting point" would be a file descriptor inherited from the parent process. Today we have "standard input", "standard output", and "standard error"; now imagine adding "standard desktop" which is a socket that implements some protocol to talk to the desktop session.

This file descriptor would support a bunch of standard functionality that all apps need (like, opening windows, or raising notifications, etc., but NOT things where the user might want to choose the resource used or deny it for security reasons, like printers or audio devices or connected OAuth accounts). It would also have a way to say "I need an object (file descriptor) implementing protocol X", and that's when the user is prompted to choose which object to use. Once the user chooses something, the app can save a long-term token representing that choice and re-request the same object later using that token.

Yes, this is "standard desktop" FD is similar to the dbus session bus, except that I can decide which programs that I run are allowed to access it, and I can mock it out, audit usage, sandbox, etc., as described previously, and I can make choices about individual resources accessed by any app.

The kdbuswreck

Posted Apr 27, 2015 12:13 UTC (Mon) by javispedro (guest, #83660) [Link]

Yes, but you can make symlinks ;)

The kdbuswreck

Posted Apr 24, 2015 23:39 UTC (Fri) by cesarb (subscriber, #6266) [Link] (2 responses)

> This socket is never linked into the filesystem, but is created as a socketpair and then passed to the user's login process as, say, FD 3. In theory, the user would then even be able to decide which processes that they run should have access to this capability and which shouldn't, by deciding to pass the cap through or not.
> Of course, at this point we're talking about a very different world from the status quo. It's unlikely that we'll rewrite all our tools to work this way anytime soon.

As a Gedankenexperiment, here's a simple way to implement that without having to rewrite all our tools:

On the kernel, create a "capability table" next to the "fd table", with the exact same lifetime rules (so, for instance, CLONE_FILES also shares the "capability table"). Create two new flags for dup3(), DUPFD_FROM_CAP and DUPFD_TO_CAP, which mean that, respectively, the oldfd or the newfd parameter refer to the "capability table" instead of the "fd table". That's all that needs to be changed in the kernel.

The user, when they log in, receives the "account management" capability in some slot of the capability table, plus an environment variable telling it which slot has that "account management" capability. Unmodified programs will not touch either the "capability table" or the environment variable, so both will be inherited by every program in the user's session.

The "passwd" program, then, would look for that environment variable, get the slot number from it, and pass that slot number to dup3() with DUPFD_FROM_CAP, to copy the FD to the "fd table". It can then talk normally to the "account management" service on the other side of the socket.

The kdbuswreck

Posted Apr 25, 2015 0:08 UTC (Sat) by kentonv (subscriber, #92073) [Link] (1 responses)

Hmm, I think the main thing I'm worried about is not that capability FDs will be indiscriminately closed by existing tools (e.g. ones that explicitly close all FDs above 2 before exec), but rather the opposite: that it will be too easy to inherit capabilities when you don't intend to. That gets back to "ambient authority". Really, the capability should only be passed on to processes that need it. This means that intermediate apps (e.g. shells) need good ways to explicitly control when to pass capability FDs without getting overly verbose. Today's shells are actually not that bad at this, but some further improvement would probably be desirable, and desktop environments are another matter.

I think putting capabilities in a separate table would actually be a step backwards, in that all the existing tools that work with FDs would not work with these. That means you couldn't manipulate them in bash, you couldn't pass them over unix domain sockets, etc.

The kdbuswreck

Posted Apr 25, 2015 0:52 UTC (Sat) by cesarb (subscriber, #6266) [Link]

> I think putting capabilities in a separate table would actually be a step backwards, in that all the existing tools that work with FDs would not work with these. That means you couldn't manipulate them in bash, you couldn't pass them over unix domain sockets, etc.

They are still FDs, just stashed away in a "shadow" table. They could be easily passed over unix domain sockets (dup3 to the normal table, send it over, and close the copy in the normal table).

I agree that hiding them makes them impossible to manipulate with unmodified tools like bash, but that's sort of the point of the thought experiment: a way to add capability FDs without interfering with anything which doesn't use them, thus allowing for gradual introduction of the feature. It's only a thought experiment, after all.

There's precedent for that sort of trickery: I recall seeing some discussion here on LWN about a way to allow libraries to open FDs without interfering with or being interfered by the application (IIRC, the proposal was to stash them as directly-allocated high-numbered FDs, instead of using the lowest available slot).

The kdbuswreck

Posted Apr 30, 2015 10:54 UTC (Thu) by metux-its (guest, #102293) [Link] (1 responses)

Well, extending the FD approach a little bit:

* make the FDs/sockets/... appear in the process' filesystem
(using per-process namespaces)
* separate services by security domains (so, choose the granularity of
the service operations in a way that you either allowed to talk
to the service or not)
* let processes pass these fd's selectively to others
* instead of sockets (streams), use directory trees (like in /sys)
* add an simple but generic remote file system for that

Finally, you'll have something like Plan9 or Inferno ...

The kdbuswreck

Posted Apr 30, 2015 17:04 UTC (Thu) by kentonv (subscriber, #92073) [Link]

> * make the FDs/sockets/... appear in the process' filesystem (using per-process namespaces)

Why? All this does is potentially create new security holes: if you can trick the app into opening an arbitrary file, you can now make it open one of its own FDs too, possibly bypassing chroot environments, etc. (This is in fact already possible through /proc/self, of course.)

> * separate services by security domains (so, choose the granularity of the service operations in a way that you either allowed to talk to the service or not)

No, that's the opposite of capability-based security. This is access control lists, with which it's notoriously difficult to express complex security policies because as the ACLs become more granular the maintenance burden skyrockets.

In capability-based security, you simply give a process capabilities (file descriptors) for the resources it needs to do its job and not for things it doesn't need. Essentially, you can define new "security domains" on-the-fly by specifying a set of capabilities.

> * let processes pass these fd's selectively to others

Unix domain sockets!

> * instead of sockets (streams), use directory trees (like in /sys)

If the goal is to put everything in one directory tree, then, again, you're creating a global namespace which creates risk of confused deputy attacks. OTOH, if you are allowed to have lots of different directory trees where you can give someone access to a particular tree by passing them a file descriptor and using openat() style calls, great. But I think trying to shoehorn arbitrary interfaces into looking like directories tends to create ugly interfaces. I want data types and function calls, not strings and read/write.

> * add an simple but generic remote file system for that

???

> Finally, you'll have something like Plan9 or Inferno ...

Sorry, they don't sound like what I want at all.

The kdbuswreck

Posted Apr 25, 2015 1:52 UTC (Sat) by wahern (subscriber, #37304) [Link]

You've basically just described the way BSDAuth works, as used on OpenBSD. Except BSDAuth uses setuid and setgid executables, and it doesn't need a separate root-privileged daemon. Because PAM modules all run in the same process, that process must execute as root to satisfy the needs of every module. Whereas BSDAuth uses a separate executable as it's "module", which communicates using a simple protocol over a pipe. Each module can have minimal privileges. Some modules are setuid root, others are setgid to a group with write permissions to the specific database. To be able to execute any of the login modules, you only need to be in the auth group. So if I wanted to write a web application that was able to check passwords using the system authentication, I only need to put the process (the HTTP daemon, or a separate process) into the auth group. Everything else is hidden behind a very simple C API.

$ sudo ls -ld /usr/libexec/auth/
drwxr-x---  2 root  auth  512 Apr 22 15:00 /usr/libexec/auth/

$ sudo ls -l /usr/libexec/auth/                                      
total 380
-r-xr-sr-x  4 root  _token   14.8K Aug  7  2014 login_activ
-r-sr-xr-x  1 root  auth     19.5K Aug  7  2014 login_chpass
-r-xr-sr-x  4 root  _token   14.8K Aug  7  2014 login_crypto
-r-sr-xr-x  1 root  auth     15.1K Aug  7  2014 login_lchpass
-r-sr-xr-x  1 root  auth     10.1K Aug  7  2014 login_passwd
-r-xr-sr-x  1 root  _radius  14.5K Aug  7  2014 login_radius
-r-xr-xr-x  1 root  auth      9.9K Aug  7  2014 login_reject
-r-xr-sr-x  1 root  auth     10.0K Aug  7  2014 login_skey
-r-xr-sr-x  4 root  _token   14.8K Aug  7  2014 login_snk
-r-sr-xr-x  1 root  auth     18.8K Aug  7  2014 login_tis
-r-xr-sr-x  4 root  _token   14.8K Aug  7  2014 login_token
-r-xr-sr-x  1 root  auth     20.5K Aug  7  2014 login_yubikey

I don't disagree with you in principle. I only mean to point out that existing, buzzword-deficient mechanisms still provide much low-hanging fruit that could be better applied to achieve least privilege. I have my fingers crossed that support for Capsicum is eventually merged into the kernel. The recent process descriptor merge (is it still pending?) made me a tad less pessimistic.

Now that I think about it, the same BSDAuth scheme could be applied in the case of kdbus. Because POSIX capabilities are defined by tagging the executable, why not simply make the executable setgid to a group, such as "_reboot". The GID of a process can be queried in a race-free manner using existing credential passing IPC features. Using supplementary groups and small helper executables, you could grant sets of ad hoc capabilities to executables--invoke the helper to make the request, allowing it to inherit your socket. The helper is setuid or setgid to a role which identifies a particular privileges, and restrict invocation to processes based on group (effective or supplementary). (The group owner of the executable could be different from the groups allowed to execute it by placing the executable in restricted directories, similar to /usr/libexec/auth in BSDAuth.)

Alternatively, you could expand the notion of setgid so that you could initialize all the supplementary groups of an executable based on extended attributes of the file. And you could extend credential passing to include supplementary groups. Solaris and OS X already do this! ucred_getgroups on Solaris and getsockopt(LOCAL_PEEREID) on OS X return the supplementary group list along with the effective UID and GID.

Frankly, I don't think such a scheme is all that ugly, especially considering that the counter argument to the concern with overloading of the semantics of kernel capabilities is that in practice the usage scenarios are few and relatively simple. That implies a scheme similar to BSDAuth likewise wouldn't get out-of-hand. It doesn't necessarily require any kernel changes, and is portable to boot!

The impasse in the kdbus discussion: Did we learn nothing from AF_UNIX attempt?

Posted Apr 23, 2015 0:09 UTC (Thu) by jspaleta (subscriber, #50639) [Link] (2 responses)

Championing the AF_UNIX approach now.. seems more than a little quixotic.. considering the documented history of the previous attempt in 2012 to make it work. I don't understand how it could be considered now, when it was allowed to stall out in 2012.

references:
https://lwn.net/Articles/482523/
and
https://lwn.net/Articles/504722/

What would be useful for me is trying to get my head around how the objections from the AF_UNIX based socket approach overlap with the current objections. What particular objections from the previous discussion has the new approach solved, what objections are entirely new, and what objections have persisted from one attempt to another.

I do find it interesting that I see Havoc showing up in this discussion again, basically repeating his personal testimony concerning design factors that I saw him talk about in 2012.
ref: http://lwn.net/Articles/505235/

Makes me wonder are we just seeing different objections now from people who were not actively involved in the merge proposal review of the previous effort? Different eyeballs now bringing different ideal solution into discussion?

Naively, if AF_UNIX approach was at all workable, and had support from those reviewing the patches, I would have thought it would have been beaten into shape in 2012-2013 when there was active interest in seeing that approach merged. I'm not saying championing now is deliberately gaming the system, but it seems like the AF_UNIX based approach was beaten to death already and it seems pretty counter productive and downright inhumane to go beat that particular dead horse any more.

-jef

The impasse in the kdbus discussion: Did we learn nothing from AF_UNIX attempt?

Posted Apr 23, 2015 12:36 UTC (Thu) by daniels (subscriber, #16193) [Link] (1 responses)

> Naively, if AF_UNIX approach was at all workable, and had support from those reviewing the patches

It didn't, and don't see how it ever would: http://thread.gmane.org/gmane.linux.kernel/1255575

David made it pretty clear that he doesn't feel the kernel has any role providing a socket subsystem which provides a multicast subscription model or in-order/guaranteed delivery, and suggested using multicast UDP instead. Which might almost work (substantial overhead to reassemble notwithstanding) if it supported fd passing, which it obviously doesn't.

His other suggestion was turn D-Bus into a network-capable protocol. Again, where that leaves fd passing is anyone's guess.

So it's pretty clear that nothing even resembling a general IPC system which has enough benefit to be usable for D-Bus will ever make it through net/. And here we are.

If you were some totally different IPC system with totally different requirements, you might stand a chance of being able to use a lossy, out-of-order, multicast protocol though.

The impasse in the kdbus discussion: Did we learn nothing from AF_UNIX attempt?

Posted Apr 25, 2015 17:26 UTC (Sat) by ploxiln (subscriber, #58395) [Link]

From that posting:

"The first approach was to create a new AF_DBUS socket address family and
move the routing logic of the D-bus daemon to the kernel. The motivations behind
that approach and the thread of the patches post can be found in [1] and [2].

The feedback was that having D-bus specific code in the kernel is a bad
idea so the second approach was to implement multicast Unix domain sockets so
clients can directly send messages to peers bypassing the D-bus daemon."

So now that kernel developers are trying to fend off what amounts to a lot *more* "D-bus specific code in the kernel", AF_BUS is a lot more appealing. If two years ago they said "... and if not this, we're going to get in a D-bus specific monstrosity via GregKH", and the core kernel devs believed it, they might have put a lot more pressure on DaveM to let something minimal through.

But now, D-bus proponents are unlikely to let go of their perfect-fit subsystem, the result of a lot of work, and which seemed so close to getting in.

The kdbuswreck

Posted Apr 23, 2015 1:01 UTC (Thu) by roskegg (subscriber, #105) [Link] (7 responses)

Wish they'd start over with the 9P protocol.

The kdbuswreck

Posted Apr 23, 2015 12:40 UTC (Thu) by daniels (subscriber, #16193) [Link] (6 responses)

> Wish they'd start over with the 9P protocol.

As a new IPC system, maybe that's a good idea. As something which can accelerate all existing uses of D-Bus, it's 100% irrelevant.

The kdbuswreck

Posted Apr 23, 2015 18:56 UTC (Thu) by roskegg (subscriber, #105) [Link] (5 responses)

That is right. I'd like to see D-Bus phased out entirely and replaced with 9P.

The kdbuswreck

Posted Apr 23, 2015 20:06 UTC (Thu) by HelloWorld (guest, #56129) [Link] (1 responses)

…and a pony.

The kdbuswreck

Posted Apr 25, 2015 14:03 UTC (Sat) by jeff@uclinux.org (guest, #8024) [Link]

"Pony"

Except that an (admittedly application specific) 9P implementation is already in the kernel, it works very well, and is incredibly useful for the simplicity that it is.

The kdbuswreck

Posted Apr 24, 2015 10:54 UTC (Fri) by dgm (subscriber, #49227) [Link]

An article comparing both would be highly interesting.

The kdbuswreck

Posted Apr 24, 2015 12:12 UTC (Fri) by anselm (subscriber, #2796) [Link]

Good luck selling that idea to everyone who's using DBus now. We're looking forward to your libdbus clone that maps everything to 9P!

The kdbuswreck

Posted Apr 28, 2015 1:13 UTC (Tue) by bronson (subscriber, #4806) [Link]

Maybe 9P failed to learn the WebDAV lesson? If you design a protocol that can be used for everything, nobody will use it.

Credential passing

Posted Apr 23, 2015 11:40 UTC (Thu) by hmh (subscriber, #3838) [Link]

There's probably some good technical reason why the idea below wouldn't work well (since it is too obvious to not have been proposed/implemented already), but at first glance it looks like the kernel should encapsulate entirely all process-related credential passing.

At message submission time, it would "attach" the full set of credentials (which actually depend on the security models active in the kernel, e.g. capabilities, SELinux contextes, kernel-keyring-assisted crypto signatures, etc). Maybe allow a flag that signals "no credentials need to be sent" (or the inverse). This closes most (if not all) race windows re. credential passing.

At message receiving time, the kernel would check the credentials of the receiving process, and if it has the appropriate ones (security modules might want to filter this, for example), and the message also has the credentials required by the receiving side, deliver it.

At no moment is the full, raw, credential set exposed to userspace. Not even for querying. Thus, the details of the credential set do not become stable kernel/userspace ABI.

The devil is to provide a generic ("functional") set of credentials that the receive side can request to be checked (by the kernel) against the message. THIS set of functional/generic credentials would become a stable kernel/userspace ABI, of course. It has a possibly steep cost, but it can deliver better usability of the whole interface, it avoids layering violation, and it seems to be amicable to the security model of the kernel (use of security hooks to implement different security modules, etc).

Meh, this possibility has likely already been dissected in the monster thread. Will have to read it now.

The kdbuswreck

Posted Apr 24, 2015 3:36 UTC (Fri) by skissane (subscriber, #38675) [Link] (1 responses)

So if the major bone of contention is whether to transmit metadata about the message sender with the kdbus message, how about this approach:
1) Make sure the kdbus user space-kernel API handles message metadata in an extensible way
2) Merge kdbus without providing any message sender metadata
3) Separately, create a patch to add the sender metadata back in, pursue merging that separately

It looks to me like the kdbus patch would still be useful even without this metadata functionality, so it would make sense to try to get kdbus w/o metadata merged now, then look at adding metadata later.

The kdbuswreck

Posted Apr 24, 2015 6:56 UTC (Fri) by HelloWorld (guest, #56129) [Link]

http://thread.gmane.org/gmane.linux.kernel/1930358/focus=...

The kdbuswreck

Posted Apr 27, 2015 3:44 UTC (Mon) by bandrami (guest, #94229) [Link] (13 responses)

Man, if only the Kernel team had had the foresight to include things like signals, message queues, semaphores, and shared memory in the first place, we wouldn't need to add a message bus system.

The kdbuswreck

Posted Apr 27, 2015 11:24 UTC (Mon) by zyga (subscriber, #81533) [Link] (12 responses)

Man if only human beings would agree on one specific way to use all of those features so that applications have an inter-operable way of talking to each other. If only that specification got widely implemented and got massive usage in all environments. Then we could see if we could put some of that into the kernel to avoid the one process from having to be the central point of contention. If only someone would have proposed some patches that implement this to the kernel.

The kdbuswreck

Posted Apr 27, 2015 20:44 UTC (Mon) by flussence (guest, #85566) [Link]

There's no need for sarcasm; your point is agreeable. X11 *has* improved drastically since most of the X server's functions were moved into the kernel.

The kdbuswreck

Posted Apr 28, 2015 0:04 UTC (Tue) by luto (guest, #39314) [Link] (10 responses)

One thing I've learned about software development: never assume that your performance sucks for the reason that you think it sucks. It sure seems obvious that dbus is slow because there's a single process that's a central point of contention. Too bad this doesn't appear to be the case [1] [2].

I can think of a couple reasons that the kernel might be slower than it ought to be for workloads like dbus-daemon. I fixed one of them in 3.16 (it affected me, too). All of this stuff is so far down in the noise, though, that I don't think it's even worth trying to optimize any of the kernel's part yet.

[1] http://lkml.kernel.org/g/CA+55aFxRa3mwL-17hUuUGpjCeGJXseG...
[2] http://lkml.kernel.org/g/CALCETrWLTLqZ0pioOEHakd_S+h=F1X2...

The kdbuswreck

Posted Apr 28, 2015 0:35 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

the first rule of optimization, measure first and find your bottleneck

The kdbuswreck

Posted Apr 28, 2015 0:46 UTC (Tue) by jspaleta (subscriber, #50639) [Link]

I thought the first rule of optimization was not to talk about optimization.

The kdbuswreck

Posted Apr 28, 2015 6:42 UTC (Tue) by zyga (subscriber, #81533) [Link] (4 responses)

The one thing that I think you may be missing is that kdbus-based dbus doesn't do much at all in the userspace deamon. It is the current design that does put all of the overhead in the one userspace process. With the kernel based version half of the overhead is removed outright (A->server->B->server->A becomes A->B->A in the common case).

Secondly, AFAIR, the current dbus daemon gets penalized by fair kernel scheduling. That issue goes away with kdbus. Lastly I think that it's prety clear that kdbus unlocks a whole new level of performance with code based on memfd that current dbus doesn't use.

Still, the threads you've referenced are interesting and I need to read more into them to understand how kdbus-based changes applies to them.

The kdbuswreck

Posted Apr 28, 2015 7:07 UTC (Tue) by luto (guest, #39314) [Link]

I realize that a dbus-like design (central daemon relaying messages) will probably take a performance hit due to context switches and copies. However, there's no reason that a synchronous method call should need 15 context switches, nor is there any reason that dbus couldn't use memfd for large messages.

Regardless, this particular dbus benchmark is so incredibly slow that none of this explains it, and kdbus is apparently only twice as fast. I'm not sure what the problem is, but it's not the scheduler or the fact that there's a central daemon.

IOW, yes, kdbus is in principle twice as fast as a dbus-like design. But dbus is several hundred times slower than it should be. Let's fix that first before quibbling over the other factor of two by moving some or all of it into the kernel.

The kdbuswreck

Posted Apr 28, 2015 14:15 UTC (Tue) by granquet (guest, #60931) [Link]

>Secondly, AFAIR, the current dbus daemon gets penalized by fair kernel scheduling. That issue goes away with kdbus. Lastly I think that it's prety clear that kdbus unlocks a whole new level of performance with code based on memfd that current dbus doesn't use.

yes, I concur here.
The switch to the CFS broke some use cases at the place I was working at that time.

but probably, those use cases where a bit stupid ;)

The kdbuswreck

Posted Apr 30, 2015 16:01 UTC (Thu) by ksandstr (guest, #60862) [Link] (1 responses)

>Secondly, AFAIR, the current dbus daemon gets penalized by fair kernel scheduling. That issue goes away with kdbus.

The issue should've gone away when priority inheritance was mooted for AF_UNIX to support lower latency in Xorg: the scheduler should've been altered to also select the previous process' IPC peer ("partner") to run until the client's wakeup condition was satisfied, and then return to the client immediately. This would've made a closed wait over AF_UNIX equivalent to a syscall, some thousands of clock cycles notwithstanding.

It's my opinion that a transitive form of partner scheduling and priority inheritance would've made an userspace DBus daemon near-transparent from a performance point-of-view, were a sufficient "counts as partner call" boundary possible to distinguish from the many states and forms of I/O sleep found in Unix. However today, instead of a relatively simple and well-defined primitive behaviour (and perhaps a tiny control API to manage it), we have 10_000 lines of lennartware being pushed for inclusion -- and not in staging like Android's "binder", either.

And before someone else in our little peanut gallery chimes in about priority inheritance: while that is necessary for a well-performing IPC architecture, it's insufficient a solution to the whole of the latency issue because rather than re-using the abstract scheduling decision that made a client process run in the first place, it only elevates the recipient's priority. A scheduler may well schedule an unrelated process in the server's (elevated) priority band, for example. The inheritance mechanism's interactions with scheduling quantums (the server's? the client's? at what priority? for how long?) and its teardown conditions have also remained poorly defined, which suggests that these issues just cold-up aren't being considered.

Finally, to not call "lennartware" without justification, and based on the considerations above, it's my prediction that if kdbus is merged, there'll be a span of two to six years immediately afterward at the end of which that which remains of kdbus-2015 will not be a net loss to its applications anymore, as with PulseAudio and Avahi before that.

The kdbuswreck

Posted May 4, 2015 7:58 UTC (Mon) by dgm (subscriber, #49227) [Link]

> the scheduler should've been altered to also select the previous process' IPC peer ("partner") to run until the client's wakeup condition was satisfied, and then return to the client immediately. This would've made a closed wait over AF_UNIX equivalent to a syscall, some thousands of clock cycles notwithstanding.

Hear! Hear!

This has the potential to make requesting services from a daemon (any daemon) much more efficient. Everything from web servers to desktop environments could benefit. Just think about how many daemons are constantly running in any typical desktop (answer: dozens!)

One has to wonder why something like this doesn't exists yet?

The kdbuswreck

Posted Apr 28, 2015 8:23 UTC (Tue) by paulj (subscriber, #341) [Link] (2 responses)

I wish LWN had a "+1 Awesome" button.

Kdbus looks like the mother of all premature optimisation from this.

The kdbuswreck

Posted Apr 29, 2015 16:35 UTC (Wed) by Uraeus (guest, #33755) [Link] (1 responses)

If that is your takeaway from this article I think you probably suffer from confirmation bias :)

The kdbuswreck

Posted Apr 29, 2015 16:57 UTC (Wed) by paulj (subscriber, #341) [Link]

Quite possible. ;)

To be honest, I'd prefer if this was done with something more generic, i.e. multi-listener AF_UNIX-like and whatever new SCM_CRED stuff needed to support authentication, so that it could benefit now just DBus but also whichever IPC system ends up replacing DBus.

Attaching and exposing kernel capabilities to sockets by default in a new API definitely sounds scarey!