Gettys: Mitigations and Solutions of Bufferbloat

[Posted December 13, 2010 by corbet]

Jim Gettys continues his discussion of bufferbloat with a posting on how to improve the situation on home systems. "Once tuned, Linux's latency (and the router's latency) can be really nice even under high load (even if I've not tried hard to get to the theoretical minimums). But un-tuned, I can get many second latency out of both Linux home routers and my laptop, just by heading to some part of my house where my wireless signal strength is low (I have several chimneys that makes this trivial). By walking around or obstructing your wireless router, you should be easily able to reproduce bufferbloat in either your router or in your laptop, depending on which direction you saturate."

Will we get buffer wars?

Posted Dec 14, 2010 0:52 UTC (Tue) by arjan (subscriber, #36785) [Link] (14 responses)

With ISPs/etc turning the buffers way high so that they can get good throughput (in real workloads or benchmarks, I'll leave that in the middle)...

.... won't this kind of thing risk a "buffer war" where we work around the ISP, the ISP then changes their setup to work around our workaround etc etc etc ....

none of it for the better of the internet as a whole

Will we get buffer wars?

Posted Dec 14, 2010 2:20 UTC (Tue) by butlerm (subscriber, #13312) [Link] (10 responses)

ISPs don't really have an incentive to make their router transmit buffers too large, because customers, especially sophisticated customers, care about latency as well as bandwidth. DNS lookups, the establishment phases of connections, and recovery time after a packet loss are all highly dependent on latency.

If the latency is excessive, the ISP is liable to be perceived as sluggish relative to the competition, no matter what the top line bandwidth is. DSL providers are major offenders here, due to default adoption of high latency forward error correction. Link level acknowledgement and retransmission would probably be much more effective.

Will we get buffer wars?

Posted Dec 14, 2010 3:34 UTC (Tue) by jg (guest, #17537) [Link] (8 responses)

Sorry; both cable and FIOS is also badly guilty of bufferbloat, not just DSL. In the FIOS case, it's occurring as much in the home routers they provide as in the FIOS connections themselves.

I honestly don't know what is better or worse for bufferbloat at the moment; your mileage will vary as netalyzr has shown.

Will we get buffer wars?

Posted Dec 14, 2010 3:57 UTC (Tue) by butlerm (subscriber, #13312) [Link] (7 responses)

Whether ISPs have a net incentive to oversize their buffers and whether they (or their hardware providers) actually do so are entirely independent questions.

Will we get buffer wars?

Posted Dec 14, 2010 4:26 UTC (Tue) by jg (guest, #17537) [Link] (6 responses)

The incentive has been there... We just didn't understand they were there.

All the tests the equipment vendors have to meet have been been on bandwidth; occasionally you'll see latency specs, but *never* will you find latency under load (which is where you can spot bufferbloat).

So anytime a hardware vendor is struggling to make one of these specs by an ISP, and can paper over a problem by adding buffering, they have done so. Or they do so just because they have the memory lying around unused; you can't buy small enough chips anymore for proper buffering, by even an order of magnitude.

And so everything is getting sized for the max theoretical bandwidth, + increasing amounts of overestimation by using observed RTT's rather than computed RTT's, with a large factor of papering over bugs thrown in for measure.

So you'll find performance of a DOCSIS 3 modem may be worse when used on DOCSIS 2 plant, because to make the much higher bandwidth requirements, and paper over their bugs, they've typically made the buffers in recent hardware yet bigger than the previous generation. And so on; we get more bandwidth, we get more buffers.

There is no single right answer for the "proper" amount of buffering.

Until we are (including public tests tests) measuring latency under load routinely, we'll never get this problem fixed. Real solutions that always "just work" will require AQM in many circumstances (e.g. 802.11 wireless).
- Jim

Will we get buffer wars?

Posted Dec 14, 2010 4:30 UTC (Tue) by jg (guest, #17537) [Link]

Sometimes the previously unused memory becomes an advertised "feature" on the box, where the marketeers then advertise that more buffer is somehow better. So the devices resemble me much more than Rusty or Linus :-( ... Bad idea...
- Jim

Will we get buffer wars?

Posted Dec 14, 2010 4:45 UTC (Tue) by butlerm (subscriber, #13312) [Link] (2 responses)

I am all for access providers reducing their transmit queue lengths to reasonable levels, and convincing them that it is in their best interest to do so.

There are a few other things that can be done. For example, TCP currently has a packet loss feedback loop but not a latency feedback loop. Latency feedback cannot solve the problem entirely, but a TCP endpoint can be modified to detect increasing queue delays in the bottleneck router by paying attention to the latency, and then modulate the transmit rate accordingly when the latency increases above its lowest observed value.

If this practice was widespread across implementations of TCP and other comparable transport protocols, the problem would largely go away, whether access providers reduce their maximum queue lengths or not. Of course the latter is a much easier place to get started. Convincing everyone to enable ECN next after that.

Will we get buffer wars?

Posted Dec 14, 2010 8:10 UTC (Tue) by ajb (subscriber, #9694) [Link]

I ran across a paper describing just such a modification to tcp:
http://bobbriscoe.net/projects/netsvc_i-f/chirp_pfldnet10...

Will we get buffer wars?

Posted Dec 15, 2010 21:34 UTC (Wed) by schabi (guest, #14079) [Link]

Hi,

I'm not convinced that this would really fix the problem. The buffers are "global" to several hosts with several connections from each, and there are UDP-based protocols, too.

So if one host implements it, it must take the "global" latency of all its connections into account. And as soon as there are other hosts sharing the same buffer, they will just refill the buffers on their own, and the well-behaving hosts will starve.

Will we get buffer wars?

Posted Dec 15, 2010 12:39 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

> There is no single right answer for the "proper" amount of buffering.

No but I think there is a small, universal and reasonable range, say between 1 and 10 ms. Give or take. Anything outside this range should be eradicated without even thinking about it.

- below 1 ms you might empty the queue too often and leave the pipe underutilized.
- above 10 ms you risk Jim Gettys and gamers pointing the finger at you.

By the way I do not like Ted T'so "SMM" justification as you do. It is not TCP's problem if some motherboards/systems out there "go catatonic for of a few milliseconds at a time". TCP is all about timings. If the system cannot react timely then TCP is not going to work well, period. Bufferbloat workarounds do more harm than good as you highlighted.

PS: also note that socket buffers can be as large as they want since they are outside the scope of TCP's congestion avoidance. A lot of people tend to confuse TCP (receiver's) window with TCP congestion window (on the sender). They are not related: the former protects the receiver's socket buffers while the latter protects the network.

Will we get buffer wars?

Posted Dec 15, 2010 16:28 UTC (Wed) by marcH (subscriber, #57642) [Link]

> - below 1 ms you might empty the queue too often and leave the pipe underutilized.
> - above 10 ms you risk Jim Gettys and gamers pointing the finger at you.

This poor attempt at humour should not hide that there are hard, "scientific" facts backing up this range, notably speed of light and "human latency". The same "human latency" has a similar influence on the constants hardcoded in the schedulers of operating systems.

Will we get buffer wars?

Posted Dec 14, 2010 20:55 UTC (Tue) by ajb (subscriber, #9694) [Link]

DSL FEC, though possibly annoying, is not part of this problem, since the delay there is fixed. But FWIW the DSL world is indeed adopting retransmission, mainly for better impulse noise protection (ITU G998.4).

Will we get buffer wars?

Posted Dec 14, 2010 4:40 UTC (Tue) by jg (guest, #17537) [Link]

Those who cannot remember the past are condemned to repeat it.
- Santayana

Unfortunately, we've managed to forget both the NSFnet collapse, and the solutions that were devised to avoid it's reappearance (e.g. RED, and now I find people are messing with slow start).

Last comment from me tonight: the ISP's have missed this just as much as the equipment vendors, or OS developers, or device driver writers have. We're all in glass houses. Much as we all like good conspiracy theories, I don't believe conspiracy theory in this case.

Will we get buffer wars?

Posted Dec 14, 2010 5:19 UTC (Tue) by njs (subscriber, #40338) [Link] (1 responses)

The rule is that the bottleneck controls buffering behavior, so if you don't like someone else's buffer management policy, you have to throttle your connection.

So say I throttle my upstream, in return for latency. Why in the world would my ISP respond to this by throttling their connection *even more*, throwing away even more throughput in return for *worse* latency? That makes no sense. ISPs, like you say, have been trained to maximize throughput.

Which could still turn out okay, I guess -- if main-stream OSes were to start doing dynamic traffic shaping (i.e., detect bufferbloat and then shape outgoing traffic to maintain latency) and telling their users "hey, so maybe your ISP told you that you got X Mbs but actually you get Y Mbs" then ISPs might start fixing their routers. It would certainly be a lot cheaper for them than any *other* way of increasing (effective) throughput, like laying new cable.

Will we get buffer wars?

Posted Dec 15, 2010 13:00 UTC (Wed) by jg (guest, #17537) [Link]

Fixing broadband devices is going to take time and may take replacement.

The ISP's have a wide range of provisioned bandwidth they sell, and even fixed size buffers need to be sized differently (and won't work nearly as well as AQM). They may not even have a way to communicate to CPE devices what semi-sane upper bounds on buffering to the CPU (e.g. DOCSIS has no such protocol), so there are standards actions required to do this.

Now, there is *nothing* to prevent vendors from implementing AQM in these devices; it wouldn't even come at much cost or power consumption these days. In some technologies, such as cable, it's even a pretty competitive market in those markets. I hope entrepreneurial companies/people will see the opportunity and build VOIP/Gamer broadband gear.

There are some mitigation strategies, as I've pointed out even without such devices yet available. That's action we can take immediately and should do so. I hope in the long term, it becomes entirely unnecessary.
- Jim

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 14, 2010 8:22 UTC (Tue) by butlerm (subscriber, #13312) [Link] (7 responses)

DCTCP is a TCP variant optimized for data centers that shows enormous potential for reducing queue lengths - an objective which is more or less its reason to exist. It requires trivial extensions on both ends and ECN marking support on the bottleneck routers / switches in between.

(See http://conferences.sigcomm.org/sigcomm/2010/papers/sigcom...)

Certainly in a WAN environment you have more room for doing creative things at the end points, but it is hard to see how any endpoint only solution is going to outperform a good implementation of ECN, mostly due to the overhead of implementing any reasonable TCP compatible alternative without the information advantage that explicit feedback brings. On the other hand, an endpoint only solution might see much wider deployment...

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 14, 2010 16:16 UTC (Tue) by jg (guest, #17537) [Link] (6 responses)

And many endpoints never get updated.

What's the current penetration of Windows XP? Still > 50%. Unless Microsoft decided to bundle TCP changes into security updates, it strikes me as very unlikely updated TCP's will deploy faster than even replacing the offending hardware.

Phones are a different market: they still do get updates frequently (at least smart phones). So there one could imagine it.

But what has worried me about the "mess with TCP" approach is game theory: if modifying it for everyone's benefit disadvantages the user of that device over others, it's going to be a hard sell to deploy. We see the opposite phenomena today with browsers, which have been upping the # of connections used at once. This is bad for the network and other users sharing any link, but beneficial to the end users, so happens anyway.

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 14, 2010 17:18 UTC (Tue) by butlerm (subscriber, #13312) [Link]

Someday, unfortunately, that means there is going to need to be admission control on all the border routers to lower the priority of traffic that doesn't follow congestion controls. The IETF has a project called Re-ECN to do just that.

http://www.bobbriscoe.net/projects/refb/

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 14, 2010 17:22 UTC (Tue) by marcH (subscriber, #57642) [Link] (3 responses)

The only reason why the "many TCP connections" hack does not happen even more often, is because it is also unfair to other users in the *same* home/workplace, and even to other applications running on the same host.

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 14, 2010 17:31 UTC (Tue) by jg (guest, #17537) [Link] (2 responses)

I don't think the browser/server folks are really paying much attention to that problem.

While they now care about latency (at long last) I don't think they observe what it does to others around them.

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 15, 2010 0:57 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

Maybe because they think that the web browser is the new operating system, so there is actually nothing left around.

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 15, 2010 17:16 UTC (Wed) by jg (guest, #17537) [Link]

Ah, but google does want telephony to work, so they'll learn someday...

Gettys: Mitigations and Solutions of Bufferbloat

Posted Dec 16, 2010 14:10 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> But what has worried me about the "mess with TCP" approach is game theory: if modifying it for everyone's benefit disadvantages the user of that device over others, it's going to be a hard sell to deploy.

Not if it disadvantages the user of that device over other devices owned by the same user. On my experience, the bufferbloat is often specific to your link; either on your router/modem/wireless, or on a bandwidth limiter somewhere on your ISP.

If messing with TCP allows me to do a full-speed Bittorrent download (60 connections by default on the Transmission client which is the default AFAIK at least on Fedora and Ubuntu), without having to throttle it on the client to 90% of the link speed, and still having low latency and good browsing speeds on the other devices I have at home, then it is a good thing. Even if it makes my Bittorrent download/upload marginally slower. Probably even if it completely pauses the download while other devices are using the link (web browsing traffic is bursty, so this also would not affect download speeds much).