The congestion-notification conflict
Network congestion is a fact of life; when it occurs, the only useful response is to get senders of traffic to slow down. Many governments place traffic signals on the on-ramps to major highways in congestion-prone areas in an attempt to limit traffic entering and to keep things flowing. Network traffic can benefit from similar controls, but the placement of traffic signals at every entry point to the net is impractical. So network protocols must rely on other types of signals to learn when they should reduce their transmission rate.
Protocols like TCP, unfortunately, were not designed with such signals, so congestion-control algorithms have been built to use the one signal that is always reliably delivered: dropped packets. But dropping packets on the floor can make things worse (it forces the data to be transmitted again), it introduces delays and, by the time it happens, congestion is already occurring. It would be better to inform senders of congestion in a less heavy-handed manner, before that congestion becomes a problem.
ECN and its discontents
The explicit congestion notification (ECN) protocol, standardized in 2001, was an attempt to improve the situation by informing senders of congestion without dropping packets. ECN repurposed two bits in the IP header; for reasons that will become clear below, it is worth looking at how those bits are interpreted:
00 Transport is not ECN-capable 01 ECN-capable ECT(1) 10 ECN-capable ECT(0) 11 Congestion experienced
The ECT(0) and ECT(1) values have the same meaning; they indicate that ECN is supported at least somewhere on the route between two endpoints. In practice all implementations use ECT(0); attempts to give a separate meaning to ECT(1) have not gained traction. ECN famously broke the Internet because many routers would drop packets with either of those bits set; that delayed its adoption for years.
ECN has improved the situation, but not enough; it suffers from a couple of significant problems. One is that a "congestion experienced" signal still arrives too late; congestion is already happening and the router is pleading for help. It is also still a heavy hammer; the RFC requires that a congestion-experienced signal be treated as if a packet had been dropped, so congestion-control algorithms respond by severely reducing their transmission rates, then working back up. That can reduce the throughput of a connection (and increase its latency) more than is needed to resolve the problem.
As networks get faster, the demands for lower latencies grow, and as bufferbloat-reduction efforts reduce the amount of queue space available on routers, congestion control needs to become a bit more nuanced. There is widespread agreement on that point. How that nuance should happen is a matter of rather less agreement.
L4S
One attempt at improving congestion control has been developed, mostly slowly and mostly in private, by various industry players; it is called Low Latency, Low Loss, Scalable Throughput, or L4S. The core idea seems to be to replace ECN with a more flexible signal built into a higher-level protocol; data center TCP (DCTCP) is one example. DCTCP acknowledgment packets can include information on how much queuing space is available; senders can use the information to keep the queue full without overflowing it. Linux has supported DCTCP since the 4.1 release.
The problem with something like DCTCP is that it must work with the active queue-management algorithms running on all of the routers between the two endpoints. Those algorithms see all traffic passing through the router, not just the DCTCP traffic. The proponents of L4S seem to want a sort of privileged treatment for suitably clueful protocols so that they can get low-latency treatment through the router without having to contend with what the L4S draft terms "classic TCP".
To bring that about, L4S redefines the ECT(1) value described above to indicate "this packet is using better congestion notification". Routers would then create two separate queues; a fast one for the L4S traffic, and a slower one for the "classic" traffic. That differentiation can, on its own, raise some eyebrows, but the queue-management algorithm needs to be evaluated as a whole to see what its broader effects, including on fairness, would really be.
DCTCP is not seen as being entirely safe for use outside of protected environments like data centers. For wider deployment, the intent has been to create a new TCP congestion-control algorithm called "TCP Prague". The L4S portion would then be implemented with a queue-management algorithm called "DUALPI", so named because it maintains the two independent queues described above. Both of these modules have been vaporware until recently: a repository with TCP Prague showed up on March 12 (no attempt has yet been made to submit it for the mainline), and DUALPI was posted to the netdev list on March 11.
SCE
The alternative, pushed by longtime bufferbloat fighters Jonathan Morton and Dave Täht, along with UDP creator David Reed, is called some congestion experienced, or SCE. It is a rather simpler proposal, intended to provide a "congestion is imminent" signal that, once again, is less heavy handed; it places a higher priority on compatibility with existing TCP congestion-control implementations, though.
SCE also makes use of the ECT(1) value to encode the "some congestion" signal. The full congestion-experienced value would retain its current meaning, with protocols expected to treat it as being equivalent to a dropped packet. The SCE signal, instead, should be interpreted this way:
The code implementing SCE is also quite new; it showed up in the fq_codel_fast repository on March 14.. It's worth noting that this proposal does not give intermediate routers a way of knowing whether either endpoint is capable of responding to SCE signals or not. There is, perhaps, an implicit assumption that, once SCE is supported by Linux and FreeBSD, it will quickly become omnipresent.
Which is better?
These two proposals are clearly incompatible with each other; each places its own interpretation on the ECT(1) value and would be confused by the other. The SCE side argues that its use of that value is fully compatible with existing deployments, while the L4S proposal turns it over to private use by suitably anointed protocols that are not compatible with existing congestion-control algorithms. L4S proponents argue that the dual-queue architecture is necessary to achieve their latency objectives; SCE seems more focused on fixing the endpoints.
It looks like a fairly typical battle between a protocol pushed by the largest Internet service providers, and one with a rather more grass-roots origin. There is, however, another important thing to know about L4S: Alcatel-Lucent claims a patent on the dual-queue algorithm. The company has generously offered to make that patent available under "fair, reasonable, and non-discriminatory" terms; such terms are, of course, highly discriminatory against free software implementations. They make it impossible to merge the affected code into a GPL-licensed kernel.
As is the case with many patents, the quality of this one is not universally recognized. Bob Briscoe, one of the developers of L4S, claims loudly that there is prior art for the claims in the Alcatel-Lucent patent and that it should never have been issued. The patent unfortunately exists, though; as long as Alcatel-Lucent continues to claim it, the code cannot become part of the Linux kernel. If L4S becomes the IETF-anointed standard, and if industry adoption follows, Linux could find itself out in the cold.
The disagreement over these protocols reflects a difference in approach
between developers and their associated industries. It is subject to all
of the usual technical and political maneuverings; the process could be
unpleasant to watch as it plays out. One could argue that the Linux
community could
happily let it play out and simply merge the winner; one might also
argue that SCE better matches the values that have shaped our network stack
in general. The assertion of that patent, though, raises the stakes
considerably; it would not be good for Linux to find itself unable to play
with other high-performance network stacks. As long as the patent remains,
the technical choice is easy.
Index entries for this article | |
---|---|
Kernel | Networking/Congestion control |
Posted Mar 22, 2019 19:00 UTC (Fri)
by mm7323 (subscriber, #87386)
[Link] (4 responses)
Unless ISPs are going to filter at ingress points (and so break the end-to-end congestion management anyway), surely everyone has to agree and implement the same rules or it won't work - regardless of patents or politics.
Posted Mar 22, 2019 21:30 UTC (Fri)
by roc (subscriber, #30627)
[Link] (3 responses)
The same thing that stops you from disabling TCP congestion control or using a congestion-oblivious UDP transport. I.e. nothing, until you become a big enough problem that ISPs start blocking your traffic.
Posted Mar 23, 2019 9:52 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (1 responses)
Which is a strike against L4S, in my book; if I can cheat by just setting a bit in my headers, what is to stop me from doing so while my usage is small? And following on from that, when I'm a big enough problem, if I'm also a legitimate service (say Netflix or YouTube), I'm also effectively immune to blocking - block me, and I'll be able to allege that it's because I'm a big competitor to your services, not because I'm cheating.
Similar issues apply to global deployment of DiffServ - you can't stop users setting their traffic to an EF codepoint or even an AF codepoint, and absent a global agreement on shaping that should apply to such codepoints, you end up having to either block EF/AF traffic, or treat it as no better than DF traffic.
This is where SCE has an advantage - the endpoint setting the markers can request worse treatment, but not better.
Posted Mar 23, 2019 12:35 UTC (Sat)
by corbet (editor, #1)
[Link]
Posted Mar 28, 2019 6:44 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
There is actually something stopping yourself from entirely disabling any form of congestion control: you'll be hurting not just the others but your own connections/sockets too.
As often one key thing missing in this discussion is the definition and boundary between "my" (headers/connections/sockets/...) versus others'; wherever is that boundary I doubt "my" describes just one socket.
Posted Mar 23, 2019 6:50 UTC (Sat)
by flussence (guest, #85566)
[Link] (4 responses)
Posted Mar 24, 2019 19:57 UTC (Sun)
by sourcejedi (guest, #45153)
[Link] (3 responses)
That is, SCE still wants support in the "transport layer" - such as TCP - to effectively echo back the SCE markings to the sender. Just like TCP with "classic ECN" (RFC 3168), or DCTCP. Otherwise, how would the sender of the packet notice that it needed to slow down?
I don't know why SCE was posted as an IETF draft without referencing "Accurate ECN". "Accurate ECN" aka AccECN, is an experimental TCP draft, which it says is required to implement L4S for TCP. It has been worked on since 2016. "Accurate ECN" was discussed on the bufferbloat.net lists during that time.
I think the AccECN TCP option is generic enough that SCE *could* use it unchanged. Although there is some limitation. The AccECN TCP option is optional and some middleboxes strip it (or block it? something like that anyway). AccECN has a fallback, which I think is only able to echo the ECN CE codepoint. L4S will still be able to work in the fallback mode. SCE congestion signals (ECN ECT(1) codepoint) would not get though in the fallback mode.
"Accurate ECN" also says it can be used to implement the "classic ECN" response to congestion. (I.e. 1 signal per RTT, which the sender treats as a if a packet was dropped, i.e. halves its transmission rate. If I understand correctly). I don't think it explains exactly how this is possible, but it is in good faith and I guess it is as simple as it sounds.
SCE does suggest that instead of echoing the SCE signal to the sender, it is also possible to handle it "entirely by the receiver". The submitted draft-00 suggests the reciever could tweak the receive window. OTOH, when this was politely challenged, Dave Taht said "I kind of wish we'd cut that from the draft".
https://lists.bufferbloat.net/pipermail/bloat/2019-March/...
Posted Mar 29, 2019 0:56 UTC (Fri)
by flussence (guest, #85566)
[Link] (2 responses)
• SCE adds information in the IP header that higher layers in the network stack may use to back off from congestion (like BBR/FQ?)
Posted Mar 29, 2019 21:32 UTC (Fri)
by moeller0 (guest, #131181)
[Link]
L4S now proposed to use the same codepoint to let flows promise the AQM that they will respond differently to CE marks than TCP-friendly flows. Specifically these flows promise not to interpret a CE mark as a strong signal to scale back their sending rate, but will look at the ratio of unmarked an CE-marked packets to get a pulsemodulated signal of the AQMs load/congestion state.
So in both cases the idea is to send a graded congestion signal so the endpoints can try to respond with more finesse than current TCPs with the binary congestion signal can. The main difference is how backward compatibility is handled. IMHO the SCE approach seems cleaner an more evolutionary than trying to press the ECT(1) codepoint into service as an identifier, as in the L4S approach endpoints never know whether a CE mark comes from an L4S AQM or from a traditional one due to the fact that ECT(1) will only identify packets that have not yet encountered congestion unambiguously...
Posted Mar 29, 2019 22:25 UTC (Fri)
by sourcejedi (guest, #45153)
[Link]
In both L4S and SCE, routers only operate on the IP header. They do not inspect or alter the IP payload, including TCP headers. The "higher layers" only run on the endpoints. L4S does not violate the normal layering rules in that sense.
Posted Mar 23, 2019 12:31 UTC (Sat)
by mfuzzey (subscriber, #57966)
[Link] (9 responses)
Is it really realistic that any network protocol not supported by Linux be adopted at any internet significant scale?
Posted Mar 23, 2019 20:53 UTC (Sat)
by jg (guest, #17537)
[Link] (4 responses)
Posted Mar 23, 2019 20:58 UTC (Sat)
by jg (guest, #17537)
[Link] (3 responses)
Posted Mar 23, 2019 22:23 UTC (Sat)
by pizza (subscriber, #46)
[Link]
Posted Mar 24, 2019 12:33 UTC (Sun)
by jch (guest, #51929)
[Link] (1 responses)
I don't think it's a generation gap, Jim, but a cultural gap that crosses generations.
Posted Mar 25, 2019 17:30 UTC (Mon)
by jg (guest, #17537)
[Link]
But there is some correlation with age; UNIX was not FOSS, and I see correlation with gray hairs (of which I have more than a few). If my career had wandered off into other directions, I would not have the perspective I have.
And I've spend years working in the IETF, for better or worse.
Posted Mar 24, 2019 15:42 UTC (Sun)
by sourcejedi (guest, #45153)
[Link] (1 responses)
There is a claim elsewhere in the discussion that an L4S-compliant AQM can be implemented by fq_codel. This would use a different mechanism. It would also allow using L4S, without losing the benefits of fq-codel if you are not using L4S.
https://lists.bufferbloat.net/pipermail/bloat/2019-March/... -- Bob Briscoe
There is some confusion about whether the current L4S has been worded incorrectly, so that it technically prohibits an AQM based on fq_codel, and whether it would actually work fine. If you need more clarity on that *then ask about it*.
https://mailarchive.ietf.org/arch/msg/tsvwg/LBZ24KXwK13DH... -- Greg White
Yes, it would be really nice to *also* rule-in Linux routers to be able to run an L4S AQM in regimes where AQM is still possible in pure software, but flow-queuing is not.
But I don't think that deserves knee-jerking as hard as this article does.
And that's a narrower window of regimes than you might assume. I think the Bufferbloat.NET Official Position (TM) was that the main performance problem for slow home routers is often "shaping", which is not part of fq_codel. So sch_fq_codel can work in many places where sch_codel or sch_pie do. I can't find a reference about upper limits for fq_codel (as opposed to shaping), but I expect the bloat mailing list would be able to find some statements if you need them.
TL;DR for the patent to really be a problem for you, I think you would need to be running something like sch_codel or sch_pie at the moment, and not be able to switch to sch_fq_codel. Is there really anyone doing that?
I Am Not A Lawyer. None Of This Is Legal Advice. But Seriously, Did You Find Any Source Involved In the Linux Side (Dave Taht et al) Who Is As Unhappy About The *Patent* As This Article Reads?
I only skimmed the threads to find a few messages to read, based on author name. At the moment I see the most unhappiness coming from the LWN article. This is mostly re-inforcing my opinion about the amount of salt I need to take with that source :-P. (But I thank them for being good about providing links to the relevant posts.)
Posted Mar 24, 2019 17:07 UTC (Sun)
by sourcejedi (guest, #45153)
[Link]
https://mailarchive.ietf.org/arch/msg/tsvwg/Okd59BLGs8ZIg... -- Dave Taht
I'm not sure how to parse it. It says the 2016 "Coupled AQM" release was patent-infested and hence unsafe to review, let alone implement in Linux. But that the "DUALPI2 AQM upstream version" (released 2019?) was "possibly worth reviewing by experts".
However the DUALPI2 commit message links directly to draft-ietf-tsvwg-aqm-dualq-coupled . And the description seems to match the discussion about the patent.
I can understand anger about it being really hard to develop and test interoperability, if understanding the patented details endangers your ability to contribute to implementations of GPL'd AQMs. (Aggravated damages due to "wilful infringement?"). Although that's not how I understood things based on previous LWN articles. The patents are out there and you're going to be stuck fighting them anyway; you need to know the details in order to avoid infringing. That can be a very useful strategy, and then when you implement an alternative *that they had not already considered*, they can't then patent that alternative. (Details apply, e.g. see "patent pending").
-- Patent details! Maybe don't read further if you're not allowed to read short quotes about patent details? --
"[the base probability for the classic queue] is squared to compensate the squareroot rate equation of Reno/Cubic" sounds exactly like -
"The only claim that I could not find prior art for (in the original
However, until just now, I had not noticed that Al-Lu has
Posted Mar 25, 2019 17:02 UTC (Mon)
by fw (subscriber, #26023)
[Link]
Posted May 13, 2020 18:41 UTC (Wed)
by rodgerd (guest, #58896)
[Link]
Given their track record opposing efforts to hold VMware and other alleged GPL violators to account, I imagine the Linux Foundation will quickly endorse this idea.
Posted Mar 24, 2019 17:28 UTC (Sun)
by ajb (subscriber, #9694)
[Link] (6 responses)
I think it's premature to decide that SCE "better matches the values" of the Linux networking stack, however. As I understand it, in L4S, the dual-q mechanism is a transition mechanism, and the end state is that everyone - except for old stacks - end up in the second, low-latency, bucket. Whereupon the behaviour of the system is controlled by the end-hosts - very much how the Linux networking community would have it. Linux - if, and it's a big if, the patent issue is solved - would move to the second bucket very fast, give how quickly Linux development goes.
SCE, however, appears to rely on per-flow queueing (FQ) being implemented at any hop that could end up being the bottleneck. FQ is fine if its on your home router and thus under your control. But if it's implemented everywhere, then the network, not the end clients, are controlling the behaviour of the system.
The bufferbloat guys are pushing for FQ to be implemented everywhere. So far, it's not happened because the fast path of ISP-class switches are mostly implemented in ASICs, where queues are a limited resource. If it were to happen, you can be sure that the facility to weight those queues according the the commercial policy of the ISPs would also be implemented also. Despite FQ coming from the grass-roots, the difficulty implementing it at the ISP level may well be, paradoxically, something the open community should be grateful for.
Posted Mar 24, 2019 18:05 UTC (Sun)
by mtaht (subscriber, #11087)
[Link] (2 responses)
Posted Mar 28, 2019 6:51 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Wow, speaking of grey hair *you* must have a lot of it; you're clearly from some pre- social media era :-)
Posted Mar 29, 2019 12:31 UTC (Fri)
by mtaht (subscriber, #11087)
[Link]
Posted Mar 25, 2019 8:16 UTC (Mon)
by mtaht (subscriber, #11087)
[Link] (2 responses)
https://lists.bufferbloat.net/pipermail/ecn-sane/2019-Mar...
(and see the explosion of traffic on that list for far, far more debate and detail)
It helps to have read the ecn-sane charter:
https://www.bufferbloat.net/projects/ecn-sane/wiki/
and rules of operation: https://www.bufferbloat.net/projects/ecn-sane/wiki/rules/
before actually joining the list.
before joining the list. Anyone that has issues with ecn-sane's rules or charter is welcome to submit a pull request against the https://github.com/tohojo/bufferbloat-net repo.
People are also welcome to attempt to join the DOCSIS consortium, under their rules. Or the ietf, under their rules.
Posted Mar 27, 2019 17:49 UTC (Wed)
by ajb (subscriber, #9694)
[Link] (1 responses)
Posted Mar 30, 2019 15:41 UTC (Sat)
by mtaht (subscriber, #11087)
[Link]
Jonathan's talk on SCE at tsvwg is now up at: https://www.youtube.com/watch?v=JQmWyr0JDJM&t=1h3m50s
And the cable industry's talk is prior to that.
Posted Mar 27, 2019 18:11 UTC (Wed)
by ajb (subscriber, #9694)
[Link]
All the non-dependent claims in the patent (that is, claims 1, 14 and 22) seem to assume use of the 'proportional-integral controller', (a object from control theory which is used in AQMs). But, the dualQ L4s draft (draft-briscoe-tsvwg-aqm-dualq-coupled) gives, in Appendix B, an alternative to this called 'Curvy RED'.
( All the claims of a patent are either non-dependent or depend ultimately on one of the non-dependent claims, so when all the non-dependent claims rely on something then it is essential to the patent as a whole).
In email on the TSVWG list, Bob Briscoe, one of the developers of L4S, clarifies here that the dualQ part of L4S, to which this patent applies, is intended to be a framework into which any AQM could be dropped.
Accordingly, it seems to me much less likely that it would first appear, that Linux could be locked out of high-perf networking if L4S is the winner in this debate.
Note that I am not a lawyer, and it would be good if others could read the patent and post if they concur or not with my reading of it.
It is still less than ideal that the DUALPI source, which is posted for evaluation of L4S and for adoption into linux, has this patent hanging over it. The developers might be well advised to post a version which avoids the PI controller - unless they believe that ALu can be persuaded to licence the patent freely to OSS.
Posted Apr 28, 2019 20:17 UTC (Sun)
by randyqx (guest, #131682)
[Link]
Posted May 13, 2020 15:58 UTC (Wed)
by mtaht (subscriber, #11087)
[Link] (1 responses)
https://mailarchive.ietf.org/arch/msg/tsvwg/rXWRHAyGOuu_q...
Apple and google have joined the L4S coalition, and the competing SCE team done an admirable job of analyzing and benchmarking the code quality and claims thus far: https://github.com/heistp/sce-l4s-ect1#key-findings
The latest major bug (fixed) in the l4s tcp prague code is documented there, but the analysis of what the fix means needs to be regression tested more fully against the existing benchmarks.
One bugfix involving all the tunnels in the world for incorrectly decapsulating ect(1) marked packets just went into linux-stable. I don't really understand the scope of the problems this bug had, but it bothers me.
Anyway, I *do not* encourage any newcomers to the tsvwg mailing list to actually vote, as these issues are very complex and I do not want to be accused to stuffing the room.
But I do wish that more network engineers that understood congestion
https://mailarchive.ietf.org/arch/msg/tsvwg/uVuFcicR6i7ev...
Posted May 13, 2020 16:32 UTC (Wed)
by mtaht (subscriber, #11087)
[Link]
https://www.youtube.com/watch?v=KowAEqHpftI
sometimes ya just gotta laugh.
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
My guess is that cheating would work less well than one might like. One of the big objectives behind L4S is low latency; that means tiny queues in the fast lane and tight congestion-control loops. If you don't implement the congestion-control side, you're probably going to overflow the queues and end up retransmitting a lot of dropped packets.
Cheating
The congestion-notification conflict
I really like the cleverness of SCE. It's basically pulse-density modulation encoding, simple enough anyone can understand it. More importantly, downstreams can understand it without a full protocol stack dissector, which the other scheme seems to require. Isn't the whole point of modern networking techniques to get heavy lifting *out* of the hot path?
The congestion-notification conflict
The congestion-notification conflict
be left to additional Internet Drafts yet to be submitted."
The congestion-notification conflict
• In L4S the higher layers already have the congestion information through some means, those set the bit in the IP layer as an extension signal, and it's up to receivers to understand that bit and introspect the rest of the packet to enqueue it properly?
The congestion-notification conflict
The challenge is that an AQM needs to know how flows are going to react to CE-marks to send the appropriate signal to each flow, otherwise the flows responding milder to CEs will suppress the ones with a strong response.
The congestion-notification conflict
The congestion-notification conflict
Given the number of Linux based servers, networking devices and Android phones I would have thought not.
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
EU filing) was a very specific claim about using a square root for the
coupling. The Linux implementation runs this the other way round so that
it only has to do a squaring. So I figured we were safe from that.
retrospectively re-written the claims in the US patent and in the EU
patent application to claim this the other way round - as a squaring."
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
The congestion-notification conflict
I took a look at the patent in question (at least the US version) and noticed the following, which gives some hope that the patent may not be as much of a blocker as this article suggests:
The congestion-notification conflict
The congestion-notification conflict
Another ietf vote on the fate of the ect(1) bit is on
control to any extent, were paying attention to these events. In my case, I've mostly been trying to get all the participants on the mailing list to look at the
aftereffects on videoconferncing in particular, on asymmetric home networks, (e.g. 100/10, 200/10 mbit stuff) of either set of proposals vs the state of the art (without any ecn support at all) There's packet captures and flent data here:
Another ietf vote on the fate of the ect(1) bit is on
any way this goes down, is that of a very beleaguered QA manager, and for those of you that haven't enjoyed this video... here it is: