|
|
Subscribe / Log in / New account

Varlink: a protocol for IPC

By Jake Edge
January 3, 2018

One of the motivations behind projects like kdbus and bus1, both of which have fallen short of mainline inclusion, is to have an interprocess communication (IPC) mechanism available early in the boot process. The D-Bus IPC mechanism has a daemon that cannot be started until filesystems are mounted and the like, but what if the early boot process wants to perform IPC? A new project, varlink, was recently announced; it aims to provide IPC from early boot onward, though it does not really address the longtime D-Bus performance complaints that also served as motivation for kdbus and bus1.

The announcement came from Harald Hoyer, but he credited Kay Sievers and Lars Karlitski with much of the work. At its core, varlink is simply a JSON-based protocol that can be used to exchange messages over any connection-oriented transport. No kernel "special sauce" (such as kdbus or bus1) is needed to support it as TCP or Unix-domain sockets will provide the necessary functionality. The messages can be used as a kind of remote procedure call (RPC) using an API defined in an interface file.

One of the foundations of varlink is simplicity. As outlined on the "ideals" page, the protocol is "not specifically optimized for anything else but ease-of-use and maintainability". To that end, interface definitions are text files, readable by both machines and humans, that describe the services a varlink endpoint will provide. The interface files are meant to be self-documenting and can be retrieved using the GetInterfaceDescription() method of the varlink service interface (org.varlink.service). As Hoyer describes, they are human-readable so that the interfaces can be discussed widely:

They are human readable and can be even discussed amongst people, which are not developing the implementation. They enable a "checks and balance" system for product management, customers, quality engineering and software developers. Interface stability and backwards compatibility can be enforced easily.

Hoyer shows a simple example that gets information from the /etc/passwd file:

    interface com.redhat.system.accounts

    type Account (
      name: string,
      uid: int,
      gid: int,
      full_name: string,
      home: string,
      shell: string
    )

    method GetAccounts() -> (accounts: Account[])

    method GetAccountByUid(uid: int) -> (account: Account)

    method GetAccountByName(name: string) -> (account: Account)

    method AddAccount(account: Account) -> (account: Account)

    error AccountNotFound ()

    error AccountCreationFailed (field: string)

All it takes is four lines of Python to retrieve and print the information for the "root" user (for example). There is also a varlink command-line tool (written in C) that can be used to make varlink calls. Bindings for other languages (C, JavaScript, Go, Java, and Rust) are also available, though some are just a proof of concept at this point.

As described so far, there is still a missing piece. Some service must provide a way to resolve names like "com.redhat.system.accounts" to a Uniform Resource Identifier (URI) corresponding to the running service. If the service is known, but is not running, something needs to start it. Both of those tasks can be handled by the varlink resolver.

Unlike other protocols, such as D-Bus, varlink makes no provision for sending things like file descriptors. It is simply for sending simple data types (numbers, strings, arrays, etc.) That means the messages can be transparently proxied or redirected elsewhere for servicing. As the ideals statement notes: "Varlink should be free of any side-effects of local APIs. All interactions need to be simple messages on a network, not carrying things like file descriptors or references to locally stored files."

Varlink is available in a GitHub repository. It is available under the Apache 2.0 license.

As part of the announcement, Hoyer makes a sweeping claim about the current API to a Linux system: it could all be replaced with varlink-based interfaces. In that statement, he includes kernel interfaces, such as ioctl() and other system calls, procfs, and sysfs; the Linux command-line interface; and various IPC mechanisms including D-Bus and Protobuf. There is a kernel module that allows varlink interfaces to be added to the kernel, but it is a little hard to see the kernel API being replaced, even if it was deemed desirable. It would be decades (if not longer) before the existing kernel interfaces could be removed, which would make for a maintenance headache at minimum.

Hoyer does wryly note the classic xkcd standards proliferation comic: "Of course varlink is the 15th xkcd standard here".

As nice as it might be to have a single, standard interface mechanism throughout the Linux system, that's not a likely outcome. However, varlink does seem like it may have its uses. One would guess that, rather than have each early boot daemon have "fallback IPC via unix domain sockets with its own homegrown protocol", it may make sense for (some) distributions to move to varlink. Given that the developers are from Red Hat, Fedora would seem like a plausible starting place.

Varlink is a fairly simple way to gather needed information or request that certain services be performed, though it doesn't provide the kinds of guarantees that D-Bus is supposed to require—or the increased performance that folks have been clamoring for. The amount of churn throughout the Linux ecosystem to support it "everywhere" would be enormous and the benefits to doing so are not obvious. As they say, however, the future is unwritten.



to post comments

Varlink: a protocol for IPC

Posted Jan 4, 2018 8:13 UTC (Thu) by smurf (subscriber, #17840) [Link] (4 responses)

If messages are terminated with a NUL byte, the protocol is binary. So why not go the whole way and use msgpack instead?

Varlink: a protocol for IPC

Posted Jan 4, 2018 13:25 UTC (Thu) by flussence (guest, #85566) [Link] (3 responses)

Why not go all the way in the other direction and use DJB's netstrings format?

Varlink: a protocol for IPC

Posted Jan 4, 2018 15:54 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

You'd first have to standardize how to encode non-string data (lists vs. dicts, numbers as opposed to strings consisting of digits, boolean, NULL, …).

Varlink: a protocol for IPC

Posted Jan 5, 2018 3:49 UTC (Fri) by dskoll (subscriber, #1630) [Link] (1 responses)

I think the OP meant netstrings whose payload is JSON.

I hadn't heard of MessagePack before. Thanks for pointing me at it.

Varlink: a protocol for IPC

Posted Jan 6, 2018 14:36 UTC (Sat) by flussence (guest, #85566) [Link]

Yes, that's what I meant. NUL-separated JSON is a bit silly, netstrings looks like a nice medium between that and the other extreme — using JSON as the framing format, a la XMPP/YAML.

Something like MessagePack would be a fine choice too, but I think having everything purely as text would ease adoption in casual scripting. This protocol already looks a lot more bash-friendly than DBus ever did.

Varlink: a protocol for IPC

Posted Jan 4, 2018 14:20 UTC (Thu) by paulj (subscriber, #341) [Link] (4 responses)

General purpose protocols should have a length field near the start of the message. Otherwise they're a pain to parse, in terms of buffer allocation.

Implicit length protocols require recipients to chew on the messages (which may arrive slowly), and only be able to decide if they are too big once "almost too much" has been parsed. Explicit, up-front length fields mean the recipient can make the "too big, sorry" decision /before/ having to spend time parsing. And mean the recipient doesn't have to set buffers aside to parse a message that, ultimately, may have to be rejected, while a malicious client very slowly sends message parts on.

BSON, the binary version of JSON, added length fields at the front. DBUS has up-front determinable lengths. HTTP 1 did not (and.. Slowlaris), but QUIC / HTTP 2 fixes that.

Length fields up front - no good protocol should be without one. ;)

Varlink: a protocol for IPC

Posted Jan 4, 2018 21:13 UTC (Thu) by ballombe (subscriber, #9523) [Link] (3 responses)

To be future-proof, a protocol should start by the bit length of the length field, in unary!

Varlink: a protocol for IPC

Posted Jan 6, 2018 14:52 UTC (Sat) by paulj (subscriber, #341) [Link]

I was going to reply and say thanks for the joke, but that'd obviously be silly. Then I remembered the TCP window scale option, the BGP extended-attribute-length option, and ... So, you're actually right. A scale would be more space efficient / future-proof than just a "length of length" field though.

Varlink: a protocol for IPC

Posted Jan 6, 2018 15:04 UTC (Sat) by flussence (guest, #85566) [Link]

You jest but that's a surprisingly common thing to do, e.g. UTF-8 and WebSockets both use modified versions of the idea (with hard upper bounds on length for sanity).

Varlink: a protocol for IPC

Posted Jan 11, 2018 13:43 UTC (Thu) by oldtomas (guest, #72579) [Link]

You jest, but have a look at the various Elias codes: https://en.wikipedia.org/wiki/Elias_coding

Especially the Elias omega code, which manages to walk all the turtles down to the bottom.

Fascinating :-)

Varlink: a protocol for IPC

Posted Jan 4, 2018 21:59 UTC (Thu) by cesarb (subscriber, #6266) [Link]

JSON is the new XML, apparently. First the on-disk LUKS2 header, now this.

Varlink: a protocol for IPC

Posted Jan 4, 2018 22:13 UTC (Thu) by noxxi (subscriber, #4994) [Link] (4 responses)

Why would somebody use a text based protocol like JSON (or XML or YAML or ...) to transport messages which have a clearly defined structure with stronly typed members? Apart from the overhead in memory and probably also processing time one introduces several ambiguities in interpretation this way: what happens if a member is used twice in JSON, what happens if the declared type does not match the value (like expected integer, got string), what happens if value is out of the expected range (integer larger than 64 bit) etc. A clearly defined binary format with automatically generated packers und unpackers based on the interface description does not have these problems. Also, security requires unambigius interpretation using a simple instead of complex implementation.

I think we should better look back to some old and proven technologies like the lean and non-ambigius XDR representation (i.e. NFS etc) instead of using all this fancy JSON, XML or similar.

Varlink: a protocol for IPC

Posted Jan 6, 2018 14:57 UTC (Sat) by paulj (subscriber, #341) [Link] (1 responses)

The number type in JSON is extremely vaguely defined, relative to the common computer storage formats.

Varlink: a protocol for IPC

Posted Jan 6, 2018 15:56 UTC (Sat) by smurf (subscriber, #17840) [Link]

Right. Another reason to use msgpack instead.

WRT use for scripting: you need a reasonable parser anyway. Generating not-well-defined messages (SQL injection is boring, let's do Varlink injection attacks instead!), dissecting replies with regexp matches, or similar nonsense should not even be possible.

Varlink: a protocol for IPC

Posted Jan 13, 2018 12:42 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

> Why would somebody use a text based protocol like JSON (or XML or YAML or ...) to transport messages which have a clearly defined structure with stronly typed members?

Bear in mind JSON's predecessors date back to the SIXTYs - it's been around a long long time. And - barring re-invention of the wheel - a LOT of the problems have been worked out.

Numbers? Why do you want to restrict your choice of numbers to what will fit in a byte, or word, or very-long-word, or whatever? Lists? JSON etc handles lists much better than binary (and even in binary you've got to parse a list ...)

As for parsing, you just say "be strict in what you receive". If it's not what you're expecting, it gets dumped as a security risk, no questions asked.

Binary is a pretty naff protocol for handling data, be it structured or especially unstructured. You just need to be paranoid about formatting, and if the spec says "be paranoid", take it from there!

Cheers,
Wol

Varlink: a protocol for IPC

Posted Jan 14, 2018 7:34 UTC (Sun) by paulj (subscriber, #341) [Link]

If I want to transfer precise state of computer representation of numbers, JSON is terrible, because it doesn't specify a canonical form for its number type.

Now, you can argue it is silly and arbitrary to be limited by the fact that computers' native number types have fixed-widths and specific formats, and (in particular) that there are two very fundamentally different number forms in computers. However, that doesn't get rid of the fact that those forms exist and that programmers might want to precisely transfer data between them. With JSON, I have 0 guarantee that the recipient will get the *exact* same number that I send, nor that the recipient will interpret it as I intend it. Unless either:

1. I control both ends, and I can be 100% sure there will be no "JSON speakers" I do not control in between the 2 ends. But then, I'm *not using JSON* but my own carefully controlled subset of JSON.

2. I wrap numbers in another object, with additional properties to specify the attributes I need to control. But then, I'm *not using JSON* but my own extended types on top of JSON.

Varlink: a protocol for IPC

Posted Jan 5, 2018 5:21 UTC (Fri) by Fowl (subscriber, #65667) [Link]

gRPC comes to mind...

"now you have n+1 standards" indeed.

Varlink: a protocol for IPC

Posted Jan 5, 2018 5:47 UTC (Fri) by alison (subscriber, #63752) [Link] (2 responses)

The previous in-kernel DBUS replacements all included a method to broadcast IPC messages without copying. That was for many of us the most attractive feature. Based on this report, varlink does not support broadcast IPC?

Varlink: a protocol for IPC

Posted Jan 8, 2018 12:35 UTC (Mon) by vrfy (guest, #13362) [Link] (1 responses)

With kdbus, all messages were copied once by the sender to every receiving peer. There was no mode without copying.

Varlink does not support anything like signals or broadcasts. Varlink is a simple susbcription model where services know about their clients and only do the work the individual client asks for; there are no messages prepared and transmitted where nobody might listen to.

Varlink: a protocol for IPC

Posted Jan 12, 2018 16:44 UTC (Fri) by HelloWorld (guest, #56129) [Link]

kdbus did have zero-copy messaging, it just turned out that copying once is faster for small messages.
https://lwn.net/Articles/580194/

What exactly is the point here??

Posted Jan 12, 2018 16:47 UTC (Fri) by HelloWorld (guest, #56129) [Link] (17 responses)

Why can D-Bus not be used? If dbus-daemon (or dbus-broker) can't be started without a mounted file system, why not just fix that? Why not start it from the initramfs?

What exactly is the point here??

Posted Jan 12, 2018 18:40 UTC (Fri) by kreijack (guest, #43513) [Link] (16 responses)

> Why not start it from the initramfs?
Because, after the boot, when the "initramfs/init" starts "rootfs/init", "initramfs/dbus" should give the control to "rootfs/dbus"; however dbus is not capable to restarting itself; in order to do that, it should be capable to serialize its state on the disk, restart itself and reload the the previously saved state.

For example, if you try to restart dbus, some dbus client became crazy (in my case the X session is restarted !!!!). And this is a lot better than some months ago, when I was unable to do a login anymore after a dbus restart.

What exactly is the point here??

Posted Jan 12, 2018 19:49 UTC (Fri) by HelloWorld (guest, #56129) [Link]

> Because, after the boot, when the "initramfs/init" starts "rootfs/init", "initramfs/dbus" should give the control to "rootfs/dbus"; however dbus is not capable to restarting itself; in order to do that, it should be capable to serialize its state on the disk, restart itself and reload the the previously saved state.

No, why? You can just start a new dbus-daemon from the root fs. Afaics handing over control would only be needed if connections established during early boot need to persist after pivot_root is called and the real rootfs/init is started. That may or may not be useful, but either way, using varlink in early boot and D-Bus later on doesn't achieve that.

And actually, serializing the state to the disk won't do it as you also need to keep the file descriptors open. You'd have to establish an AF_UNIX socket between initramfs/init and rootfs/init and pass the file descriptors through using SCM_RIGHTS (and since you have a socket already, you might as well use it to pass all the other state through as well, saving all the kinds of trouble associated with writing to the disk, like permissions or quota problems or whatever).

Anyway, the point is, if you need IPC connections to persist after pivot_root, why not just extend dbus-daemon (or dbus-broker) in that way?

What exactly is the point here??

Posted Jan 13, 2018 11:59 UTC (Sat) by smurf (subscriber, #17840) [Link] (14 responses)

> in order to do that, it should be capable to serialize its state on the disk, restart itself and reload the the previously saved state.

So effing what? If even systemd can do that (and its state is a whole lot more complex than dbus's), the dbus daemon can be taught to do it too.

Also, you don't need to save anything to disk. Create a pipe, fork, clear the close-on-exec flag on all connections, exec the new master. The child serializes and writes its state to the pipe and terminates. The new copy reads the state, then re-sets the close-on-exec flags, and broadcasts a "hey, I'm all new on $NEW_ROOT, you should probably do the same" signal to whatever services want it.

This is hardly rocket science.

What exactly is the point here??

Posted Jan 13, 2018 14:37 UTC (Sat) by HelloWorld (guest, #56129) [Link] (13 responses)

> Also, you don't need to save anything to disk. Create a pipe, fork, clear the close-on-exec flag on all connections, exec the new master. The child serializes and writes its state to the pipe and terminates. The new copy reads the state, then re-sets the close-on-exec flags, and broadcasts a "hey, I'm all new on $NEW_ROOT, you should probably do the same" signal to whatever services want it.
It's not that easy. System daemons are supposed to be started by systemd so that all the settings in the .service file are applied, they're run in a cgroup etc., so you can't just fork from the initramfs' dbus-daemon.
That said, it's probably possible to make that work anyway, it just makes things a little more complicated.

What exactly is the point here??

Posted Jan 13, 2018 21:06 UTC (Sat) by nix (subscriber, #2304) [Link] (6 responses)

System daemons are supposed to be started by systemd so that all the settings in the .service file are applied, they're run in a cgroup etc., so you can't just fork from the initramfs' dbus-daemon.
I thought the whole point of cgroup containment was that systemd was not fooled by daemons doing that sort of thing. Sure, it's preferred if they don't daemonize, but even nondaemonizing systemd-ready daemons can presumably do this stuff by forking, passing stdin/stdout/stderr to the new child as part of its state, then terminating. Does systemd really care that its child has died if the stdin/out/err pointing to the daemon are still open and there are still processes in the daemon's cgroup? It seems to me it probably shouldn't (or there should be another class of daemons for which this is true).

What exactly is the point here??

Posted Jan 14, 2018 0:59 UTC (Sun) by njs (subscriber, #40338) [Link]

I guess the issue would be if after you get your real root, you look at /etc/systemd/system/dbus.service and discover that there are some new settings that are supposed to be applied to the new dbus daemon, that weren't used for the initramfs's dbus daemon.

It doesn't make much difference though – even if you want to make the old and new dbus daemons "different services" from systemd's point of view, you can still pass the state and file descriptors between them through a socket.

What exactly is the point here??

Posted Jan 14, 2018 8:45 UTC (Sun) by smurf (subscriber, #17840) [Link]

That depends on how you configure the systemd service.

Anyway, it doesn't make much sense to fork off a child an passing your state to the program it exec()s. Much better to fork off a child to hold the current state and exec() the new code from the parent.

However, systemd doesn't have a way to tell it about already-running services (other than saved internal state which doesn't apply here) and I assume there's not much point in adding that kind of feature, so instead of fork/exec the new daemon can just tell the old one to send its state and terminate.

What exactly is the point here??

Posted Jan 14, 2018 22:09 UTC (Sun) by HelloWorld (guest, #56129) [Link] (3 responses)

You can't really blame systemd for stuff that happens before it's even started, like in the initramfs.

What exactly is the point here??

Posted Jan 15, 2018 17:30 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

systemd often runs in the initramfs as well. I'm just wondering why having the daemon serialize under it would be such a problem (I mean, it does much the same thing to get out from under the initramfs libc's purview).

What exactly is the point here??

Posted Jan 15, 2018 18:15 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

It will always be fragile with various races during handover. It'd be much better if DBUS supported reconnection internally in the protocol...

What exactly is the point here??

Posted Jan 15, 2018 21:57 UTC (Mon) by smurf (subscriber, #17840) [Link]

Why should there be any races during handover? it doesn't use threads AFAIK.
Everything else can be ignored until the new daemon has read the serialized state and takes over.

What exactly is the point here??

Posted Jan 14, 2018 8:29 UTC (Sun) by Jandar (subscriber, #85683) [Link] (5 responses)

> > The child serializes and writes its state to the pipe and terminates.

> so you can't just fork from the initramfs' dbus-daemon.

The fork is only a depot holding the old state. The initrd pid1 execing into the post pivot pid1 only changes the executable while being the same process. Where is the problem?

If the .service files in initrd have other content than the files post pivot, so what? Changes of .services files must be dealt with at other times also.

What exactly is the point here??

Posted Jan 14, 2018 22:05 UTC (Sun) by HelloWorld (guest, #56129) [Link] (4 responses)

I already wrote what the problem is: “System daemons are supposed to be started by systemd so that all the settings in the .service file are applied, they're run in a cgroup etc.”. This is not the case if you start dbus-daemon from an initramfs (because the initramfs stuff runs before systemd is even started), and I don't see how your comment addresses that at all.

What exactly is the point here??

Posted Jan 15, 2018 6:55 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

Bear in mind that cgroups are nothing to do with systemd, they're a kernel facility that systemd uses ...

so making dbus-daemon use cgroups can be sorted out by the initramfs.

Cheers,
Wol

What exactly is the point here??

Posted Jan 16, 2018 1:56 UTC (Tue) by HelloWorld (guest, #56129) [Link]

> so making dbus-daemon use cgroups can be sorted out by the initramfs.
Systemd also sets up syscall filtering, private /tmp and all sorts of other stuff. Merely setting up cgroups won't cut it.

What exactly is the point here??

Posted Jan 15, 2018 7:48 UTC (Mon) by smurf (subscriber, #17840) [Link] (1 responses)

That's immaterial. Simply start the new dbus daemon from systemd. When it finds an old one still running it could just connect and ask for its state.

What exactly is the point here??

Posted Jan 16, 2018 1:52 UTC (Tue) by HelloWorld (guest, #56129) [Link]

Which is exactly what I said:
> That said, it's probably possible to make that work anyway, it just makes things a little more complicated.
Thanks for confirming.

Varlink: a protocol for IPC

Posted Jan 14, 2018 22:02 UTC (Sun) by areilly (subscriber, #87829) [Link]

Perhaps I'm too old and set in my ways to understand the motivations for this proposal. How does replacing the getwpent() API with a JSON parser and socket library, and an API that embeds the name "redhat" in it improve the world?

There is mention that API stability is a virtue of the proposal, and yet there is no mention of a versioning mechanism.

This seems to be a system management interface without any mention of authentication, identity or credentials. Perhaps those are buried in the example implementations. Certainly unix-domain sockets have permission mechanisms, but the idea of network remoting suggests that there must be IDs and authentication and encryption at the connection level, and I didn't spot discussion about that.

What do you need to communicate to a running daemon besides "HUP: your config file has changed"? Well, that's in general. Certainly there are examples of daemons that have whole complicated control interfaces of their own (I'm looking at you, zfs). Seems like a 15th-standard though.

At least, if the suggestion of replacing the syscall interface with varlink ever happens (after extending it to pass file descriptors, presumably), the performance problems caused by Meltdown will be a happy, distant memory.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds