Time-based packet transmission

By Jonathan Corbet
March 8, 2018

Normally, when an application sends data over the network, it wants that data to be transmitted as quickly as possible; the kernel's network stack tries to oblige. But there are applications that need their packets to be transmitted within specific time windows. This behavior can be approximated in user space now, but a better solution is in the works in the form of the time-based packet transmission patch set.

There are a number of situations where outgoing data should not necessarily be transmitted immediately. One example would be any sort of isochronous data stream — an audio or video stream, maybe — where each packet of data is relevant at a specific point in time. For such streams, transmitting ahead of time and buffering at the receiving side generally works well enough. But realtime control applications can be less flexible. Commands for factory-floor or automotive systems, for example, should be transmitted within a narrow period of time. Realtime applications can wait until the window opens before queuing data for transmission, of course, but any sort of latency that creeps in (due to high network activity, for example) may then cause the data to be transmitted too late.

Naturally, the network-standards community has been working on solutions for this particular problem; one of them is called P802.1Qbv. Should that name prove to be a mouthful, there is the more concise alternative: "Standard for Local and Metropolitan Area Networks-Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks Amendment: Enhancements for Scheduled Traffic". It defines a mechanism for the draining of queues of packets such that each packet is transmitted by its specific deadline. When P802.1Qbv is in use, applications can queue packets whenever they are ready, but those packets will not actually hit the wire until their deadline approaches.

The patch set implementing time-based transmission on Linux has a few separate components to it. The first is an API addition to allow applications to request this behavior. That is done by setting the new SO_TXTIME option with the setsockopt() system call. Packets intended for timed transmission should be sent with sendmsg(), with a control-message header (of type SCM_TXTIME) indicating the transmission deadline as a 64-bit nanoseconds value.

There are a couple of other control-message parameters that can be set with sendmsg(). SCM_DROP_IF_LATE instructs the network stack to simply drop a packet if, for some reason, it cannot be transmitted by the given deadline. The SCM_CLOCKID message can be used to specify which clock should be used for packet timing; the default is CLOCK_MONOTONIC. This parameter does not appear to actually be used in the current implementation, though, with one small exception described below.

These changes to the core network stack allow the specification of time-based behavior, but the core itself does not implement that behavior. That, instead, is an add-on feature. One way to get it is with the tbs queuing discipline, which is also part of the patch set. It can be configured to use time-based scheduling on a specific queue, with a couple of additional parameters. Here, too, the clock ID can specified; if the clock ID also appears in individual packets the two must match, or the packets will be dropped. There is also a delta parameter to configure how far in advance of the deadline each packet should be sent to the network interface for transmission. This parameter and the deadline for each packet thus define the window in which the packet should hit the wire.

The delta and the SCM_DROP_IF_LATE flag can be used to obtain two distinctly different behaviors. If the flag is set and delta is reasonably large, the semantics are that the packet must be transmitted before the given deadline. Instead, with a small (or zero) delta and with SCM_DROP_IF_LATE not set, the behavior is to not transmit the packet until after the given deadline.

The tbs queuing discipline, by itself, is a "best-effort" implementation, since there is still the possibility that packets could be delayed after tbs releases them to the interface. The real intent behind P802.1Qbv, however, appears to be implementation in the network adapters themselves. If the adapter is aware of packet deadlines, it can schedule its own transmission activities to ensure that the packets hit the wire at the right time.

The tbs queuing discipline thus supports offloading time-based transmission to the hardware; the patch set includes an implementation for the Intel igb Ethernet driver. In a full offload scenario, the delta and clock-ID parameters are not used; instead, all deadlines are assumed to be relative to the clock running within the adapter itself, so the adapter takes full responsibility for packet timing. If those parameters are specified, instead, tbs will sort the packets and send them to the interface at the beginning of the transmission window, with the interface still taking responsibility for getting them out before the deadline passes. Since this mode uses both a kernel-based clock and the adapter's own clock, the two must be running in sync or the results will not be as desired.

The patch set is now in its third revision; the initial version was posted by Richard Cochran but it is now being posted by Jesus Sanchez-Palencia, who has made a number of changes and added the hardware offload capability. There is still some disagreement over how the API should work and, in particular, if the ability to specify different clocks is really needed. Storing a clock ID with each packet makes the network stack's sk_buff structure larger, which is something that the networking developers have been resisting strongly for some time now. Working that out is likely to take at least one more revision, so it's not clear if this patch set will be ready by the 4.17 merge window or not.

Index entries for this article
Kernel	Networking
Kernel	Realtime

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 17:23 UTC (Fri) by glenn (subscriber, #102223) [Link] (7 responses)

What are the motivations for using CLOCK_MONOTONIC as the default? From the outside, it seems like a poor choice. Consider: Received packets are timestamped with CLOCK_REALTIME (see SO_TIMESTAMP), but sent packets are assigned CLOCK_MONOTONIC deadlines. Seems like a recipe for confusion. (As far as I know, there is no socket option that lets one request CLOCK_MONOTONIC receive-time timestamps.) Moreover, since CLOCK_REALTIME can be easily synchronized to a network clock with PTP, a default of CLOCK_REALTIME feels more natural/useful.

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 17:30 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (4 responses)

Doesn't CLOCK_REALTIME jump around in daylight savings time changes?

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 20:13 UTC (Fri) by wahern (subscriber, #37304) [Link] (3 responses)

A Unix timestamp (i.e. CLOCK_REALTIME) is always GMT0, so it's not effected by daylight savings. But a "POSIX second" is not the same thing as an SI second as POSIX says there are _exactly_ 86400 seconds per calendar day. That cheat makes calendar arithmetic incredibly simple. But when there's a leap second in UTC, one or more "POSIX seconds" need to be stretched. Thus, if there's a leap second within (or near) the interval between two Unix timestamps, the difference doesn't reflect the number of elapsed SI seconds. Similarly, depending on the skewing algorithm, a single Unix timestamp could represent two SI seconds.

For many engineering use cases what people normally would want is CLOCK_TAI. But, AFAIU, CLOCK_TAI can go backwards if the sysadmin or faulty hardware demands it, so often CLOCK_MONOTONIC is the safest choice to avoid weird arithmetic errors (as opposed to errors from poor accuracy or precision).

There's a movement to remove leap seconds from UTC so that UTC becomes a fixed offset from TAI. IMO that's short-sighted. It doesn't really improve things much as a practical matter (see TAI vs monotonic, above). Nor even as a theoretical matter (see special relativity). Anyhow, if we wanted to ignore the inherent complexity of time synchronization we may as well jump straight to BCT (https://en.wikipedia.org/wiki/Barycentric_Coordinate_Time)

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 20:50 UTC (Fri) by k8to (guest, #15413) [Link] (2 responses)

Note of course that faulty hardware *could* cause CLOCK_MONOTONIC to misbehave, but you'd need a whole new level of faulty hardware for this, probably the kind where all software on the system will be randomly crashing.

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 20:58 UTC (Fri) by vadim (subscriber, #35271) [Link] (1 responses)

Long ago I had a computer where time randomly jumped backwards by a second or two then resumed ticking forwards. Caused a lot of very confusing problems and baffled the hell out of me.

I'm wondering if you might know what it could have been. It was a dual CPU Athlon MP.

Why CLOCK_MONOTONIC?

Posted Mar 9, 2018 22:58 UTC (Fri) by zlynx (guest, #2285) [Link]

I seem to remember that some of these non-Intel SMP systems had problems with TSC synchronization and switching CPUs. Linux did patch it eventually. I think?

Anyway, if a program was using TSC, an AMD system might run TSC at different rates on different CPUs since TSC was actually pegged to the CPU's clock rate. Whereas on Intel TSC was a virtual clock. No matter the clock rate, TSC ran at the same speed.

I could only find this: https://github.com/Psychtoolbox-3/Psychtoolbox-3/wiki/FAQ...

Why CLOCK_MONOTONIC?

Posted Mar 10, 2018 8:17 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

Received packets get stamped with REALTIME because that makes sense for time keeping protocols like NTP.
Transmitted packets get tracked with MONOTONIC because that makes sense with a video stream. You don't want your stream to stop when NTP adjusts your clock.

Why CLOCK_MONOTONIC?

Posted Mar 10, 2018 20:03 UTC (Sat) by glenn (subscriber, #102223) [Link]

I’ve come around to using CLOCK_REALTIME because so much infrastructure has been built around it; specifically, PTP on Linux. However, I would like to have another clock besides CLOCK_REALTIME and CLOCK_MONOTONIC that could be synced against PTP, read cheaply from userspace, and timestamp received packets.

There are use cases beyond measuring data-center network latency and multimedia. I work on a real-time robotics platform where some network-connected sensors are synchronized with PTP, and timestamp sensors measurements with their PTP-synced clock. A Linux box in this system has to synchronize its CLOCK_REALTIME against the PTP clock in order to reason about these timestamps against its own local timestamps (reading a NIC’s PTP-synced clock is far too expensive). Although the PTP clock need not be synced against a time close to UTC (I only want to be able to reason about the relative age of sensor readings), I must use a UTC-based time if I want to be able to make use of certificates and have reasonable filesystem timestamps. Moreover, having CLOCK_REALTIME synced against PTP provides a mechanism to timestamp received packets with a PTP-synced clock, which helps integrate network-connected sensors that do not support PTP. This all works, but I don’t like the reliance on CLOCK_REALTIME. Anyone with superuser access can come along and manually slam the clock—this action would cause the whole house of cards to collapse. This is not a vulnerability that I like to have in a robotic system.

Time-based packet transmission

Posted Mar 9, 2018 19:25 UTC (Fri) by vcgomes (subscriber, #51281) [Link]

Seems that there was a mistake in the report by Jonathan (great article by the way), the default clock is CLOCK_REALTIME (which is defined as zero). Perhaps the code has a bit of a fault here, in the fact that it initializes the CLOCK_MONOTONIC first in the array of "get time functions". Will fix for the next version.

Security evaluation - covert data exfiltration?

Posted Mar 10, 2018 15:11 UTC (Sat) by iam.TJ (guest, #56644) [Link] (1 responses)

I wonder if this has been evaluated from a security viewpoint?

Being able to control the timing of packets would provide a covert exfiltration channel that could leak information by controlling the gap-length.

I'd guess it'd be pretty easy to implement morse-code over this channel :)

Security evaluation - covert data exfiltration?

Posted Mar 12, 2018 18:14 UTC (Mon) by raven667 (subscriber, #5198) [Link]

There are probably a near infinite number of clever ways to exfiltrate data on a sufficiently advanced computer, if it is to do anything useful and talk to other computers at all. Trying to prevent them all ahead of time is probably a fools errand.

Time-based packet transmission

Posted Mar 22, 2018 14:43 UTC (Thu) by bns (guest, #97378) [Link]

interesting work out of CERN as well

https://en.wikipedia.org/wiki/The_White_Rabbit_Project

Time-based packet transmission

Posted Aug 28, 2018 12:01 UTC (Tue) by nyrahul (guest, #119310) [Link]

Is it possible to get some sort of feedback in the user-space when the packet is dropped in kernel because it was too late to send? Apps can make use of this feedback to slow down (in cases where possible).