Varlink: a protocol for IPC
One of the motivations behind projects like kdbus and bus1, both of which have fallen short of mainline inclusion, is to have an interprocess communication (IPC) mechanism available early in the boot process. The D-Bus IPC mechanism has a daemon that cannot be started until filesystems are mounted and the like, but what if the early boot process wants to perform IPC? A new project, varlink, was recently announced; it aims to provide IPC from early boot onward, though it does not really address the longtime D-Bus performance complaints that also served as motivation for kdbus and bus1.
The announcement came from Harald Hoyer, but he credited Kay Sievers and Lars Karlitski with much of the work. At its core, varlink is simply a JSON-based protocol that can be used to exchange messages over any connection-oriented transport. No kernel "special sauce" (such as kdbus or bus1) is needed to support it as TCP or Unix-domain sockets will provide the necessary functionality. The messages can be used as a kind of remote procedure call (RPC) using an API defined in an interface file.
One of the foundations of varlink is simplicity. As outlined on
the "ideals"
page, the protocol is "not specifically optimized for anything
else but ease-of-use and maintainability
". To that end, interface
definitions are text files, readable by both machines and humans, that
describe the services a
varlink endpoint will provide. The interface files are meant to be
self-documenting and can be retrieved using the
GetInterfaceDescription() method of the varlink service interface
(org.varlink.service). As Hoyer describes, they are human-readable so
that the interfaces can be discussed widely:
Hoyer shows a simple example that gets information from the /etc/passwd file:
interface com.redhat.system.accounts
type Account (
name: string,
uid: int,
gid: int,
full_name: string,
home: string,
shell: string
)
method GetAccounts() -> (accounts: Account[])
method GetAccountByUid(uid: int) -> (account: Account)
method GetAccountByName(name: string) -> (account: Account)
method AddAccount(account: Account) -> (account: Account)
error AccountNotFound ()
error AccountCreationFailed (field: string)
All it takes is four lines of Python to retrieve and print the information for the "root" user (for example). There is also a varlink command-line tool (written in C) that can be used to make varlink calls. Bindings for other languages (C, JavaScript, Go, Java, and Rust) are also available, though some are just a proof of concept at this point.
As described so far, there is still a missing piece. Some service must provide a way to resolve names like "com.redhat.system.accounts" to a Uniform Resource Identifier (URI) corresponding to the running service. If the service is known, but is not running, something needs to start it. Both of those tasks can be handled by the varlink resolver.
Unlike other protocols, such as D-Bus, varlink makes no provision for sending
things like file descriptors. It is simply for sending simple data types
(numbers, strings, arrays, etc.) That means the messages can be transparently
proxied or redirected elsewhere for servicing. As the ideals statement
notes: "Varlink should be free of any side-effects of local APIs. All
interactions need to be simple messages on a network, not carrying things
like file descriptors or references to locally stored files.
"
Varlink is available in a GitHub repository. It is available under the Apache 2.0 license.
As part of the announcement, Hoyer makes a sweeping claim about the current API to a Linux system: it could all be replaced with varlink-based interfaces. In that statement, he includes kernel interfaces, such as ioctl() and other system calls, procfs, and sysfs; the Linux command-line interface; and various IPC mechanisms including D-Bus and Protobuf. There is a kernel module that allows varlink interfaces to be added to the kernel, but it is a little hard to see the kernel API being replaced, even if it was deemed desirable. It would be decades (if not longer) before the existing kernel interfaces could be removed, which would make for a maintenance headache at minimum.
Hoyer does wryly note the classic xkcd
standards proliferation comic: "Of course varlink is the 15th
xkcd standard here
".
As nice as it might be to have a single, standard interface mechanism
throughout the Linux system, that's not a likely outcome. However,
varlink does seem like it may have its uses. One would guess that, rather
than have each early boot daemon have "fallback IPC via unix domain
sockets with its own homegrown
protocol
", it may make sense for (some) distributions to move to varlink.
Given that the developers are from Red Hat, Fedora would seem like a
plausible starting place.
Varlink is a fairly simple way to gather needed information or request that certain services be performed, though it doesn't provide the kinds of guarantees that D-Bus is supposed to require—or the increased performance that folks have been clamoring for. The amount of churn throughout the Linux ecosystem to support it "everywhere" would be enormous and the benefits to doing so are not obvious. As they say, however, the future is unwritten.
Posted Jan 4, 2018 8:13 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (4 responses)
Posted Jan 4, 2018 13:25 UTC (Thu)
by flussence (guest, #85566)
[Link] (3 responses)
Posted Jan 4, 2018 15:54 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (2 responses)
Posted Jan 5, 2018 3:49 UTC (Fri)
by dskoll (subscriber, #1630)
[Link] (1 responses)
I hadn't heard of MessagePack before. Thanks for pointing me at it.
Posted Jan 6, 2018 14:36 UTC (Sat)
by flussence (guest, #85566)
[Link]
Something like MessagePack would be a fine choice too, but I think having everything purely as text would ease adoption in casual scripting. This protocol already looks a lot more bash-friendly than DBus ever did.
Posted Jan 4, 2018 14:20 UTC (Thu)
by paulj (subscriber, #341)
[Link] (4 responses)
Implicit length protocols require recipients to chew on the messages (which may arrive slowly), and only be able to decide if they are too big once "almost too much" has been parsed. Explicit, up-front length fields mean the recipient can make the "too big, sorry" decision /before/ having to spend time parsing. And mean the recipient doesn't have to set buffers aside to parse a message that, ultimately, may have to be rejected, while a malicious client very slowly sends message parts on.
BSON, the binary version of JSON, added length fields at the front. DBUS has up-front determinable lengths. HTTP 1 did not (and.. Slowlaris), but QUIC / HTTP 2 fixes that.
Length fields up front - no good protocol should be without one. ;)
Posted Jan 4, 2018 21:13 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted Jan 6, 2018 14:52 UTC (Sat)
by paulj (subscriber, #341)
[Link]
Posted Jan 6, 2018 15:04 UTC (Sat)
by flussence (guest, #85566)
[Link]
Posted Jan 11, 2018 13:43 UTC (Thu)
by oldtomas (guest, #72579)
[Link]
Especially the Elias omega code, which manages to walk all the turtles down to the bottom.
Fascinating :-)
Posted Jan 4, 2018 21:59 UTC (Thu)
by cesarb (subscriber, #6266)
[Link]
Posted Jan 4, 2018 22:13 UTC (Thu)
by noxxi (subscriber, #4994)
[Link] (4 responses)
I think we should better look back to some old and proven technologies like the lean and non-ambigius XDR representation (i.e. NFS etc) instead of using all this fancy JSON, XML or similar.
Posted Jan 6, 2018 14:57 UTC (Sat)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Jan 6, 2018 15:56 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
WRT use for scripting: you need a reasonable parser anyway. Generating not-well-defined messages (SQL injection is boring, let's do Varlink injection attacks instead!), dissecting replies with regexp matches, or similar nonsense should not even be possible.
Posted Jan 13, 2018 12:42 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
Bear in mind JSON's predecessors date back to the SIXTYs - it's been around a long long time. And - barring re-invention of the wheel - a LOT of the problems have been worked out.
Numbers? Why do you want to restrict your choice of numbers to what will fit in a byte, or word, or very-long-word, or whatever? Lists? JSON etc handles lists much better than binary (and even in binary you've got to parse a list ...)
As for parsing, you just say "be strict in what you receive". If it's not what you're expecting, it gets dumped as a security risk, no questions asked.
Binary is a pretty naff protocol for handling data, be it structured or especially unstructured. You just need to be paranoid about formatting, and if the spec says "be paranoid", take it from there!
Cheers,
Posted Jan 14, 2018 7:34 UTC (Sun)
by paulj (subscriber, #341)
[Link]
Now, you can argue it is silly and arbitrary to be limited by the fact that computers' native number types have fixed-widths and specific formats, and (in particular) that there are two very fundamentally different number forms in computers. However, that doesn't get rid of the fact that those forms exist and that programmers might want to precisely transfer data between them. With JSON, I have 0 guarantee that the recipient will get the *exact* same number that I send, nor that the recipient will interpret it as I intend it. Unless either:
1. I control both ends, and I can be 100% sure there will be no "JSON speakers" I do not control in between the 2 ends. But then, I'm *not using JSON* but my own carefully controlled subset of JSON.
2. I wrap numbers in another object, with additional properties to specify the attributes I need to control. But then, I'm *not using JSON* but my own extended types on top of JSON.
Posted Jan 5, 2018 5:21 UTC (Fri)
by Fowl (subscriber, #65667)
[Link]
"now you have n+1 standards" indeed.
Posted Jan 5, 2018 5:47 UTC (Fri)
by alison (subscriber, #63752)
[Link] (2 responses)
Posted Jan 8, 2018 12:35 UTC (Mon)
by vrfy (guest, #13362)
[Link] (1 responses)
Varlink does not support anything like signals or broadcasts. Varlink is a simple susbcription model where services know about their clients and only do the work the individual client asks for; there are no messages prepared and transmitted where nobody might listen to.
Posted Jan 12, 2018 16:44 UTC (Fri)
by HelloWorld (guest, #56129)
[Link]
Posted Jan 12, 2018 16:47 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (17 responses)
Posted Jan 12, 2018 18:40 UTC (Fri)
by kreijack (guest, #43513)
[Link] (16 responses)
For example, if you try to restart dbus, some dbus client became crazy (in my case the X session is restarted !!!!). And this is a lot better than some months ago, when I was unable to do a login anymore after a dbus restart.
Posted Jan 12, 2018 19:49 UTC (Fri)
by HelloWorld (guest, #56129)
[Link]
No, why? You can just start a new dbus-daemon from the root fs. Afaics handing over control would only be needed if connections established during early boot need to persist after pivot_root is called and the real rootfs/init is started. That may or may not be useful, but either way, using varlink in early boot and D-Bus later on doesn't achieve that.
And actually, serializing the state to the disk won't do it as you also need to keep the file descriptors open. You'd have to establish an AF_UNIX socket between initramfs/init and rootfs/init and pass the file descriptors through using SCM_RIGHTS (and since you have a socket already, you might as well use it to pass all the other state through as well, saving all the kinds of trouble associated with writing to the disk, like permissions or quota problems or whatever).
Anyway, the point is, if you need IPC connections to persist after pivot_root, why not just extend dbus-daemon (or dbus-broker) in that way?
Posted Jan 13, 2018 11:59 UTC (Sat)
by smurf (subscriber, #17840)
[Link] (14 responses)
So effing what? If even systemd can do that (and its state is a whole lot more complex than dbus's), the dbus daemon can be taught to do it too.
Also, you don't need to save anything to disk. Create a pipe, fork, clear the close-on-exec flag on all connections, exec the new master. The child serializes and writes its state to the pipe and terminates. The new copy reads the state, then re-sets the close-on-exec flags, and broadcasts a "hey, I'm all new on $NEW_ROOT, you should probably do the same" signal to whatever services want it.
This is hardly rocket science.
Posted Jan 13, 2018 14:37 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (13 responses)
Posted Jan 13, 2018 21:06 UTC (Sat)
by nix (subscriber, #2304)
[Link] (6 responses)
Posted Jan 14, 2018 0:59 UTC (Sun)
by njs (subscriber, #40338)
[Link]
It doesn't make much difference though – even if you want to make the old and new dbus daemons "different services" from systemd's point of view, you can still pass the state and file descriptors between them through a socket.
Posted Jan 14, 2018 8:45 UTC (Sun)
by smurf (subscriber, #17840)
[Link]
Anyway, it doesn't make much sense to fork off a child an passing your state to the program it exec()s. Much better to fork off a child to hold the current state and exec() the new code from the parent.
However, systemd doesn't have a way to tell it about already-running services (other than saved internal state which doesn't apply here) and I assume there's not much point in adding that kind of feature, so instead of fork/exec the new daemon can just tell the old one to send its state and terminate.
Posted Jan 14, 2018 22:09 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (3 responses)
Posted Jan 15, 2018 17:30 UTC (Mon)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Jan 15, 2018 18:15 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Jan 15, 2018 21:57 UTC (Mon)
by smurf (subscriber, #17840)
[Link]
Posted Jan 14, 2018 8:29 UTC (Sun)
by Jandar (subscriber, #85683)
[Link] (5 responses)
> so you can't just fork from the initramfs' dbus-daemon.
The fork is only a depot holding the old state. The initrd pid1 execing into the post pivot pid1 only changes the executable while being the same process. Where is the problem?
If the .service files in initrd have other content than the files post pivot, so what? Changes of .services files must be dealt with at other times also.
Posted Jan 14, 2018 22:05 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (4 responses)
Posted Jan 15, 2018 6:55 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
so making dbus-daemon use cgroups can be sorted out by the initramfs.
Cheers,
Posted Jan 16, 2018 1:56 UTC (Tue)
by HelloWorld (guest, #56129)
[Link]
Posted Jan 15, 2018 7:48 UTC (Mon)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Jan 16, 2018 1:52 UTC (Tue)
by HelloWorld (guest, #56129)
[Link]
Posted Jan 14, 2018 22:02 UTC (Sun)
by areilly (subscriber, #87829)
[Link]
There is mention that API stability is a virtue of the proposal, and yet there is no mention of a versioning mechanism.
This seems to be a system management interface without any mention of authentication, identity or credentials. Perhaps those are buried in the example implementations. Certainly unix-domain sockets have permission mechanisms, but the idea of network remoting suggests that there must be IDs and authentication and encryption at the connection level, and I didn't spot discussion about that.
What do you need to communicate to a running daemon besides "HUP: your config file has changed"? Well, that's in general. Certainly there are examples of daemons that have whole complicated control interfaces of their own (I'm looking at you, zfs). Seems like a 15th-standard though.
At least, if the suggestion of replacing the syscall interface with varlink ever happens (after extending it to pass file descriptors, presumably), the performance problems caused by Meltdown will be a happy, distant memory.
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Wol
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
Varlink: a protocol for IPC
https://lwn.net/Articles/580194/
What exactly is the point here??
What exactly is the point here??
Because, after the boot, when the "initramfs/init" starts "rootfs/init", "initramfs/dbus" should give the control to "rootfs/dbus"; however dbus is not capable to restarting itself; in order to do that, it should be capable to serialize its state on the disk, restart itself and reload the the previously saved state.
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
It's not that easy. System daemons are supposed to be started by systemd so that all the settings in the .service file are applied, they're run in a cgroup etc., so you can't just fork from the initramfs' dbus-daemon.
That said, it's probably possible to make that work anyway, it just makes things a little more complicated.
What exactly is the point here??
System daemons are supposed to be started by systemd so that all the settings in the .service file are applied, they're run in a cgroup etc., so you can't just fork from the initramfs' dbus-daemon.
I thought the whole point of cgroup containment was that systemd was not fooled by daemons doing that sort of thing. Sure, it's preferred if they don't daemonize, but even nondaemonizing systemd-ready daemons can presumably do this stuff by forking, passing stdin/stdout/stderr to the new child as part of its state, then terminating. Does systemd really care that its child has died if the stdin/out/err pointing to the daemon are still open and there are still processes in the daemon's cgroup? It seems to me it probably shouldn't (or there should be another class of daemons for which this is true).
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
Everything else can be ignored until the new daemon has read the serialized state and takes over.
What exactly is the point here??
What exactly is the point here??
What exactly is the point here??
Wol
What exactly is the point here??
Systemd also sets up syscall filtering, private /tmp and all sorts of other stuff. Merely setting up cgroups won't cut it.
What exactly is the point here??
What exactly is the point here??
> That said, it's probably possible to make that work anyway, it just makes things a little more complicated.
Thanks for confirming.
Varlink: a protocol for IPC
