Upstreaming multipath TCP

By Jonathan Corbet
September 26, 2019

The multipath TCP (MPTCP) protocol (and the Linux implementation of it) have been under development for a solid decade; MPTCP offers a number of advantages for devices that have more than one network interface available. Despite having been deployed widely, though, MPTCP is still not supported by the upstream Linux kernel. At the 2019 Linux Plumbers Conference, Matthieu Baerts and Mat Martineau discussed the current state of the Linux MPTCP implementation and what will be required to get it into the mainline kernel.

MPTCP, described by RFC 6824, is built around one fundamental idea: allowing a single network connection to exchange data over multiple physical paths. One obvious use case is a phone handset, which has both WiFi and broadband interfaces. Being able to use both at the same time would give the device greater bandwidth, but also greater redundancy — a connection could continue uninterrupted despite changes to individual paths.

Apple added MPTCP support in 2013, mostly to support easier failover between paths. The "walk-out" use case, where a user working over WiFi leaves the building and must switch to broadband, is prominent here. Others, including Samsung and LG, have patched in MPTCP support to gain access to greater bandwidth. MPTCP is also being added to residential network gateways, which have both DSL and LTE interfaces, again for greater bandwidth. The 5G standards also include MPTCP.

The Linux implementation was started in March 2009; over the following ten years it has reached version 0.95. It is already used by millions, Baerts said, but it is not in a condition where it can be upstreamed. The MPTCP developers are working on changing that, but there are a number of constraints they have to work under, the first of which being that the addition of MPTCP cannot affect the existing TCP stack. In particular, there can be no performance regressions, and no increase in code size if MPTCP is disabled. The protocol itself is strictly opt-in; applications must ask for it explicitly. The plan is to proceed in small steps, merging a minimal feature set first.

The MPTCP protocol

Baerts then launched into a quick overview of the MPTCP protocol which, he said, looks as much like vanilla TCP as possible. Due to the proliferation of network middleboxes that refuse to pass traffic they don't recognize, the addition of new protocols to the open Internet is a difficult task. There are two approaches that can work; one of which is the QUIC way, with all of the protocol details hidden from middleboxes. The other, taken by MPTCP, is to look just like an existing protocol. So the "subflows" used to carry traffic over a specific path are just basic TCP connections.

The new protocol does have to carry some information to tie those connections together, though. That is done in a few ways, starting with the "data sequence number" that is uniform across all connections. There is an MPTCP option in the TCP header, and a set of new SYN flags to indicate MPTCP capability or add a connection to an existing MPTCP session. There is some extra signaling to, for example, announce the availability of additional addresses. Receive windows across the TCP connections have to be coupled to provide a single window at the MPTCP level.

As it turns out, there are two versions of the MPTCP protocol: RFC 6824 and the newer RFC 6824bis draft. The latter has been submitted for publication, and is the version that the MPTCP v1.0 patch will support. The modifications in the newer draft make it easier to implement, and 5G is using that version of the standard as well, so it is to be expected that all users will switch over relatively quickly. Baerts asked whether focusing on just RFC 6824bis would be acceptable to the networking developers; Eric Dumazet replied that it would be fine.

Getting it upstream

There was a patch set sent to the lists in June. It creates a new protocol type (IPPROTO_MPTCP) that applications can use to select multipath. Baerts noted that this patch set does not yet support IPv6; adding that support should not be hard, but the MPTCP developers don't want to focus on it now. A member of the audience quickly suggested that the group should do the opposite: implement IPv6 only, and add IPv4 support later. Dave Miller said that the basic functionality needs to be submitted at the beginning, and that includes IPv6 support. We are, he said, way past the point of allowing IPv4-specific protocol implementations.

The developers are also uncertain about how to set socket options on MPTCP subflows. They don't want to settle on an API at this point, Baerts said, so there will be no user-space access to subflows for now.

Then, there is the question of who should be able to create MPTCP sockets. The first implementation will not be fully hardened; there have been no fuzzing efforts yet, for example. The plan is to include a sysctl knob to control access to the feature; it will be off by default and specific to each network namespace. Miller shot down that idea as well; if the code is to be accepted, he said, it should be functional. It should not be off by default, and in any case there are too many knobs in the system already. Alexei Starovoitov said that the receive path is more worrisome anyway, and that can't be controlled by a sysctl knob. Apple had a remote security issue that could be used to jailbreak phones; he would like to avoid that in Linux.

Use cases

The initial use case for MPTCP, Martineau said, is the server role. The path management issues are far simpler; clients, instead, have the complexity of multiple interfaces to deal with. Some of the requisite low-level pieces have already been upstreamed. He mentioned SKB extensions in particular; they are a way of attaching additional information to packets in the kernel without bloating the (already too large) sk_buff structure. SKB extensions have already been used to remove a couple of unrelated pointers from that structure, making things better overall.

Once the initial merge has happened, work on adding more features can begin. Moving beyond the server use case is high on the list at that point. There is also a need to support a user-space path manager that can add and remove paths while applying whatever policy the system administrator might configure. Work on the user-space side can be found on GitHub.

Another area for future development is a packet scheduler that can decide which path should be used for each packet. It could be configured to optimize for throughput, latency, or redundancy. This is relatively simple to do on the server side, where it is mostly a matter of acting on requests from the peer on each MPTCP connection. The kernel will feature a "basic" scheduler; the addition of a BPF hook for more complex cases seems nearly inevitable.

As mentioned above, MPTCP will be entirely opt-in; it will not be used unless an application explicitly requests it. But, naturally, there are users who want their unmodified binary programs to start using MPTCP once it's available. There is a working, if inelegant, solution to this problem. A new control-group hook allows the installation of a BPF program that runs when a program calls socket(); it can change the requested protocol to IPPROTO_MPTCP and the calling application will be none the wiser.

The "break before make" feature would allow the establishment of an MPTCP connection that initially has no subflows at all. It may be useful for cases like switching between multiple access points. This feature will be added if the demand for it materializes. There will eventually be a need to be able to set socket options on subflows; this will have to be handled carefully since a number of them could interfere with ordinary TCP.

Finally, Martineau mentioned the problem of in-kernel TLS support. Since MPTCP is not TCP, it lacks the upper-level protocol support needed for TLS. With enough work, support could be added, and it would still be possible to split TLS data across flows. That becomes hard, though, when hardware acceleration is involved; it's not clear what the solution there will be.

[Your editor thanks the Linux Foundation, LWN's travel sponsor, for supporting travel to this event.]

Index entries for this article
Kernel	Networking/Protocols
Conference	Linux Plumbers Conference/2019

Upstreaming multipath TCP

Posted Sep 27, 2019 7:22 UTC (Fri) by kevincox (subscriber, #93938) [Link] (3 responses)

I wonder if multipath TCP can improve using TCP with an anycast address. Right now the problem is that if different packets of a TCP stream hit different servers sharing an IP it will be broken. This is currently solved with fancy routing to create session stickyness.

I imagine with multipath TCP it would be possible to receive the connection on the anycast address then quickly fail over to a flow on an IP address that is specific to the host the packet landed on.

It probably doesn't completely remove the need for sticky routing however it should reduce the (already rare) case where the session rerouted outside of your sticky routing domain (for example landing on a different PoP).

Upstreaming multipath TCP

Posted Sep 27, 2019 17:42 UTC (Fri) by raven667 (subscriber, #5198) [Link] (1 responses)

I dunno if this would make sense to implement in the OS kernel, but you could build this exact behavior into an application by having an anycast service that just returns the IP (and port?) you want the client to load balance to, followed by an MTCP connection to that service. One could probably make that into a client library and server daemon to make it easy to be integrated into any software that wanted to behave this way but the OS provides all the primitives necessary to have this work, and there probably isn't any benefit in abstracting that away into the kernel as opposed to having userspace control all the knobs and build on top.

Upstreaming multipath TCP

Posted Sep 27, 2019 17:47 UTC (Fri) by kevincox (subscriber, #93938) [Link]

You can definitely do it this way. DNS is commonly used as that anycast service. However there are a number of reasons this isn't quite ideal including staleness or added latency.

Doing it via MPTCP means that you don't have any added latency in exchange for no guarantee that the "connection" doesn't get migrated to a different target before the "handover" to the server IP.

Upstreaming multipath TCP

Posted Sep 27, 2019 20:07 UTC (Fri) by obonaventure (guest, #108715) [Link]

You are right, Multipath TCP can be tuned to better support load-balancers and anycast. The trick for anycast is very simple. The client sends a SYN to the anycast address. It reaches one of the servers that replies and returns its regular IP address using the ADD_ADDR option supported by Multipath TCP. The client can either create a new subflow towards the server's real IP address or wait until routing changes break the initial subflow.

This technique was proposed and evaluated in Making Multipath TCP friendlier to Load Balancers and Anycast, see https://inl.info.ucl.ac.be/publications/making-multipath-...
It also works for load balancers and has been included in RFC6824bis

Upstreaming multipath TCP

Posted Sep 29, 2019 13:33 UTC (Sun) by Herve5 (subscriber, #115399) [Link]

Now I dream of the day someone will add ping tunnels as an extra alternate path, turning us connected *forever* even if s o m e t i m e s _ v e r y _ s l o w . . .

Upstreaming multipath TCP

Posted Oct 3, 2019 18:25 UTC (Thu) by dps (guest, #5725) [Link]

IMHO the simplest method of making regular application use multi[path TCP without recompiling them might be a LD_PRELOAD object which automagically replaces regular TCP with the multipath version. An object like this could be smart enough not to use multipath TCP in when it inappropriate, for example connections to the local host. This would not require any kernel support and work for almost all applications.

Some very high performance network products feature LD_PRELOAD objects which make regular application exploit the hardware, often in user space. Applications include stock trading applications where the fact that c is finite matters. This hardware is seriously expensive because high stakes bond trading and supercomputers can both justify expensive hardware.

Upstreaming multipath TCP

Posted Oct 3, 2019 20:15 UTC (Thu) by rand0m$tring (guest, #125230) [Link] (4 responses)

tcp is dead. non-encrypted connections are dead (perhaps besides intra-datacenter).

i feel the best course of action must be to fold all these wonderful efforts into QUIC. no?

Upstreaming multipath TCP

Posted Oct 3, 2019 23:03 UTC (Thu) by flussence (guest, #85566) [Link] (2 responses)

TCP seems pretty alive to me, as I was able to read your post sent via it.

Upstreaming multipath TCP

Posted Oct 4, 2019 13:07 UTC (Fri) by foom (subscriber, #14868) [Link] (1 responses)

And you can still write new software in COBOL, too. Doesn't mean it's not dead.

Upstreaming multipath TCP

Posted Oct 5, 2019 3:17 UTC (Sat) by flussence (guest, #85566) [Link]

Software that works tends to outlive CADT fads.

Upstreaming multipath TCP

Posted Oct 7, 2019 10:22 UTC (Mon) by kevincox (subscriber, #93938) [Link]

TCP is very much alive.

QUIC is a very cool protocol and effectively support multi-path out of the box since IP and port aren't used to identify connections. In fact roaming works a lot better than MPTCP since you don't need to report your new IP+Port before breaking the old connection, which means you can migrate between two WiFi access points or similar where there is never an overlap where you are connected to both networks.

However TCP is far from dead, and this is still very useful until then (if it ever happens).