Gettys: Mitigations and Solutions of Bufferbloat
Once tuned, Linux's latency (and the router's latency) can be really nice even under high load (even if I've not tried hard to get to the theoretical minimums). But un-tuned, I can get many second latency out of both Linux home routers and my laptop, just by heading to some part of my house where my wireless signal strength is low (I have several chimneys that makes this trivial). By walking around or obstructing your wireless router, you should be easily able to reproduce bufferbloat in either your router or in your laptop, depending on which direction you saturate."
Posted Dec 14, 2010 0:52 UTC (Tue)
by arjan (subscriber, #36785)
[Link] (14 responses)
.... won't this kind of thing risk a "buffer war" where we work around the ISP, the ISP then changes their setup to work around our workaround etc etc etc ....
none of it for the better of the internet as a whole
Posted Dec 14, 2010 2:20 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (10 responses)
If the latency is excessive, the ISP is liable to be perceived as sluggish relative to the competition, no matter what the top line bandwidth is. DSL providers are major offenders here, due to default adoption of high latency forward error correction. Link level acknowledgement and retransmission would probably be much more effective.
Posted Dec 14, 2010 3:34 UTC (Tue)
by jg (guest, #17537)
[Link] (8 responses)
I honestly don't know what is better or worse for bufferbloat at the moment; your mileage will vary as netalyzr has shown.
Posted Dec 14, 2010 3:57 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (7 responses)
Posted Dec 14, 2010 4:26 UTC (Tue)
by jg (guest, #17537)
[Link] (6 responses)
All the tests the equipment vendors have to meet have been been on bandwidth; occasionally you'll see latency specs, but *never* will you find latency under load (which is where you can spot bufferbloat).
So anytime a hardware vendor is struggling to make one of these specs by an ISP, and can paper over a problem by adding buffering, they have done so. Or they do so just because they have the memory lying around unused; you can't buy small enough chips anymore for proper buffering, by even an order of magnitude.
And so everything is getting sized for the max theoretical bandwidth, + increasing amounts of overestimation by using observed RTT's rather than computed RTT's, with a large factor of papering over bugs thrown in for measure.
So you'll find performance of a DOCSIS 3 modem may be worse when used on DOCSIS 2 plant, because to make the much higher bandwidth requirements, and paper over their bugs, they've typically made the buffers in recent hardware yet bigger than the previous generation. And so on; we get more bandwidth, we get more buffers.
There is no single right answer for the "proper" amount of buffering.
Until we are (including public tests tests) measuring latency under load routinely, we'll never get this problem fixed. Real solutions that always "just work" will require AQM in many circumstances (e.g. 802.11 wireless).
Posted Dec 14, 2010 4:30 UTC (Tue)
by jg (guest, #17537)
[Link]
Posted Dec 14, 2010 4:45 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (2 responses)
There are a few other things that can be done. For example, TCP currently has a packet loss feedback loop but not a latency feedback loop. Latency feedback cannot solve the problem entirely, but a TCP endpoint can be modified to detect increasing queue delays in the bottleneck router by paying attention to the latency, and then modulate the transmit rate accordingly when the latency increases above its lowest observed value.
If this practice was widespread across implementations of TCP and other comparable transport protocols, the problem would largely go away, whether access providers reduce their maximum queue lengths or not. Of course the latter is a much easier place to get started. Convincing everyone to enable ECN next after that.
Posted Dec 14, 2010 8:10 UTC (Tue)
by ajb (subscriber, #9694)
[Link]
Posted Dec 15, 2010 21:34 UTC (Wed)
by schabi (guest, #14079)
[Link]
I'm not convinced that this would really fix the problem. The buffers are "global" to several hosts with several connections from each, and there are UDP-based protocols, too.
So if one host implements it, it must take the "global" latency of all its connections into account. And as soon as there are other hosts sharing the same buffer, they will just refill the buffers on their own, and the well-behaving hosts will starve.
Posted Dec 15, 2010 12:39 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (1 responses)
No but I think there is a small, universal and reasonable range, say between 1 and 10 ms. Give or take. Anything outside this range should be eradicated without even thinking about it.
- below 1 ms you might empty the queue too often and leave the pipe underutilized.
By the way I do not like Ted T'so "SMM" justification as you do. It is not TCP's problem if some motherboards/systems out there "go catatonic for of a few milliseconds at a time". TCP is all about timings. If the system cannot react timely then TCP is not going to work well, period. Bufferbloat workarounds do more harm than good as you highlighted.
PS: also note that socket buffers can be as large as they want since they are outside the scope of TCP's congestion avoidance. A lot of people tend to confuse TCP (receiver's) window with TCP congestion window (on the sender). They are not related: the former protects the receiver's socket buffers while the latter protects the network.
Posted Dec 15, 2010 16:28 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
This poor attempt at humour should not hide that there are hard, "scientific" facts backing up this range, notably speed of light and "human latency". The same "human latency" has a similar influence on the constants hardcoded in the schedulers of operating systems.
Posted Dec 14, 2010 20:55 UTC (Tue)
by ajb (subscriber, #9694)
[Link]
Posted Dec 14, 2010 4:40 UTC (Tue)
by jg (guest, #17537)
[Link]
Unfortunately, we've managed to forget both the NSFnet collapse, and the solutions that were devised to avoid it's reappearance (e.g. RED, and now I find people are messing with slow start).
Last comment from me tonight: the ISP's have missed this just as much as the equipment vendors, or OS developers, or device driver writers have. We're all in glass houses. Much as we all like good conspiracy theories, I don't believe conspiracy theory in this case.
Posted Dec 14, 2010 5:19 UTC (Tue)
by njs (subscriber, #40338)
[Link] (1 responses)
So say I throttle my upstream, in return for latency. Why in the world would my ISP respond to this by throttling their connection *even more*, throwing away even more throughput in return for *worse* latency? That makes no sense. ISPs, like you say, have been trained to maximize throughput.
Which could still turn out okay, I guess -- if main-stream OSes were to start doing dynamic traffic shaping (i.e., detect bufferbloat and then shape outgoing traffic to maintain latency) and telling their users "hey, so maybe your ISP told you that you got X Mbs but actually you get Y Mbs" then ISPs might start fixing their routers. It would certainly be a lot cheaper for them than any *other* way of increasing (effective) throughput, like laying new cable.
Posted Dec 15, 2010 13:00 UTC (Wed)
by jg (guest, #17537)
[Link]
The ISP's have a wide range of provisioned bandwidth they sell, and even fixed size buffers need to be sized differently (and won't work nearly as well as AQM). They may not even have a way to communicate to CPE devices what semi-sane upper bounds on buffering to the CPU (e.g. DOCSIS has no such protocol), so there are standards actions required to do this.
Now, there is *nothing* to prevent vendors from implementing AQM in these devices; it wouldn't even come at much cost or power consumption these days. In some technologies, such as cable, it's even a pretty competitive market in those markets. I hope entrepreneurial companies/people will see the opportunity and build VOIP/Gamer broadband gear.
There are some mitigation strategies, as I've pointed out even without such devices yet available. That's action we can take immediately and should do so. I hope in the long term, it becomes entirely unnecessary.
Posted Dec 14, 2010 8:22 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (7 responses)
(See http://conferences.sigcomm.org/sigcomm/2010/papers/sigcom...)
Certainly in a WAN environment you have more room for doing creative things at the end points, but it is hard to see how any endpoint only solution is going to outperform a good implementation of ECN, mostly due to the overhead of implementing any reasonable TCP compatible alternative without the information advantage that explicit feedback brings. On the other hand, an endpoint only solution might see much wider deployment...
Posted Dec 14, 2010 16:16 UTC (Tue)
by jg (guest, #17537)
[Link] (6 responses)
And many endpoints never get updated.
What's the current penetration of Windows XP? Still > 50%. Unless Microsoft decided to bundle TCP changes into security updates, it strikes me as very unlikely updated TCP's will deploy faster than even replacing the offending hardware.
Phones are a different market: they still do get updates frequently (at least smart phones). So there one could imagine it.
But what has worried me about the "mess with TCP" approach is game theory: if modifying it for everyone's benefit disadvantages the user of that device over others, it's going to be a hard sell to deploy. We see the opposite phenomena today with browsers, which have been upping the # of connections used at once. This is bad for the network and other users sharing any link, but beneficial to the end users, so happens anyway.
Posted Dec 14, 2010 17:18 UTC (Tue)
by butlerm (subscriber, #13312)
[Link]
Posted Dec 14, 2010 17:22 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (3 responses)
Posted Dec 14, 2010 17:31 UTC (Tue)
by jg (guest, #17537)
[Link] (2 responses)
While they now care about latency (at long last) I don't think they observe what it does to others around them.
Posted Dec 15, 2010 0:57 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Dec 15, 2010 17:16 UTC (Wed)
by jg (guest, #17537)
[Link]
Posted Dec 16, 2010 14:10 UTC (Thu)
by cesarb (subscriber, #6266)
[Link]
Not if it disadvantages the user of that device over other devices owned by the same user. On my experience, the bufferbloat is often specific to your link; either on your router/modem/wireless, or on a bandwidth limiter somewhere on your ISP.
If messing with TCP allows me to do a full-speed Bittorrent download (60 connections by default on the Transmission client which is the default AFAIK at least on Fedora and Ubuntu), without having to throttle it on the client to 90% of the link speed, and still having low latency and good browsing speeds on the other devices I have at home, then it is a good thing. Even if it makes my Bittorrent download/upload marginally slower. Probably even if it completely pauses the download while other devices are using the link (web browsing traffic is bursty, so this also would not affect download speeds much).
Will we get buffer wars?
Will we get buffer wars?
Will we get buffer wars?
Whether ISPs have a net incentive to oversize their buffers and whether they (or their hardware providers) actually do so are entirely independent questions.
Will we get buffer wars?
Will we get buffer wars?
- Jim
Will we get buffer wars?
- Jim
Will we get buffer wars?
Will we get buffer wars?
http://bobbriscoe.net/projects/netsvc_i-f/chirp_pfldnet10...
Will we get buffer wars?
Will we get buffer wars?
- above 10 ms you risk Jim Gettys and gamers pointing the finger at you.
Will we get buffer wars?
> - above 10 ms you risk Jim Gettys and gamers pointing the finger at you.
Will we get buffer wars?
Will we get buffer wars?
- Santayana
Will we get buffer wars?
Will we get buffer wars?
- Jim
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
Gettys: Mitigations and Solutions of Bufferbloat
