> First, it would be great to have some experimental data showing
> improvement of the problem. i.e, show, on a home network or cell
> network, how forcing some packets to be dropped would change latencies/RTT.
The data is well known for core Internet routers: the problem is that classic AQM algorithms don't function at all properly in the face of variable bandwidth. While there are problems in the core of the network when people have not enabled and configured RED, the big problems I know we have a the moment are at the edge, where we have variable bandwidth (both in broadband, e.g. PowerBoost) and wireless which is by its nature variable.
So your sane request is actually harder than it seems.
Having said that, I am working on demo (and some data) I'll put up sometime soon in video form; we can use bandwidth shaping to get the buffers out of the broadband connection (sacrificing bandwidth and powerboost), and the difference is actually quite dramatic.
>Second, some insight as to why AQM is (1) hard and (2) not deployed would > be useful.
Some of this is the nature of the beast; we (now) have to adapt over a huge dynamic range, of variable bandwidth and number of flows.
I have more space in the long paper on this topic.
AQM is *not* easy.
As to why it isn't universally enabled where it is available, this goes back to classic RED requires tuning, and if you get the tuning wrong, it can hurt you. So some ISP's run with RED in their core networks, and some do not.
But AQM typically isn't available *at all* in your broadband connection, your home router, and in your host, even if we had an algorithm that we knew worked.
> Is it really just that it's a burden on ISP's?
It is a burden to ISP's. They get the service calls when you suffer.
So much so that by last spring, the cable industry added a change to the DOCSIS spec to allow them to control buffering, so that sometime next calendar year, cluefull operators can reduce the overbuffering to something semi-sane. It won't be what AQM would give them, but should reduce the problem an order of magnitude.
> You'd think if
> it helped their customers they'd put engineering resources on it.
> I've heard that it's partially a prisoner's dilemma - if you turn
> AQM/RED off and others on the same pipe use it, you get better
> performance. And packet drops can't be manipulated like this,
> so TCP uses the measure that's harder to fake. True?
No, I don't believe so.
> Third, a working implementation of useful AQM would be nice. It's a year > or so from initial report, and I get that it's hard, but all we've
> gotten in this article is "hold onto your pants, Van Jacobsen is
> working on it".
And Kathie Nichols and some others. But it's hard and knowing your AQM really works is even harder once you think you have an algorithm. We (internet folks in general) already got the solution wrong once.
> I think the main problem is the disconnect - either bufferbloat is a
> terrible disaster, in which every available TCP architect should care
> about it and be working on it (and they're not), or it's a less general
> problem with workarounds and engineers and academics can afford to spend
> a year or more tinkering with solutions (where we're at). In my reading > all these articles by jg have this cognitive dissonance.
> To improve: stop talking about history, and talk about how to solve this.
And that's what I've been mostly doing recently.
Remember, the enemy of the good is the perfect: many mitigations can help greatly without requiring us to solve the whole problem.
Things like the byte queue stuff and the DOCSIS change are helpful and steps along the way. And lots of random bugs in ECN have been found and fixed.
- Jim
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 0:18 UTC (Tue) by Lennie (subscriber, #49641)
[Link]
I have a feeling more widespread use of 10 Gbps hardware is gonna make things worse, are there any indications of that ?
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 1:42 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
any time you have chokepoints where the bandwidth changes drastically, you are going to have problems
right now, you can have 1Gb wired network going to a <1Mb DSL upload rate. The machine connected to the 1G wired network has no way of knowing that the system that it's talking to is on the other side of such a slow connection, and so it needs to have large enough buffers to talk to another system connected to a 1Gb network. As the data trickles out over the 1Mb network you have horrible performance.
1Gb is only now taking over in home networks from 100Mb, I don't expect to see 10Gb on very slow networks like this for quite a while yet.
10Gb in the datacenter is a good thing for communications within the datacenter. having 1Gb of connectivity from the datacenter to the Internet is far from being unheard of, and 100Mb is touching the range of the high-end home user, even the 100Mb connection is less of a speed ratio than the home user 1Gb to 1Mb transition.
but a fixed transition like this is not _that_ hard for the routers to deal with. what absolutely kills things is where a wireless network may be anywhere from 300Mb/sec to under 1Mb/sec, and may vary within this range within the length of a single session, adapting to that is extremely hard.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 15:17 UTC (Tue) by marcH (subscriber, #57642)
[Link]
> right now, you can have 1Gb wired network going to a <1Mb DSL upload rate. The machine connected to the 1G wired network has no way of knowing that the system that it's talking to is on the other side of such a slow connection, and so it needs to have large enough buffers to talk to another system connected to a 1Gb network. As the data trickles out over the 1Mb network you have horrible performance.
Not a problem as long as the 1Mb link does not let the queue build up and does drop packets.
Throughput is not the problem, latency is.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 12, 2011 13:55 UTC (Mon) by ekj (guest, #1524)
[Link]
Yes, but the existence of *wildly* varying bandwiths, force queue-management.
A 10Gbit capable device, can have 10MB worth of buffer, and still transmit the entire buffer in 10ms, which is low enough that it could probably behave well with no queue-management.
If such a 10MB buffer ends up holding data that trickles out over a 1mbps link though, it'd take a *minute* for the buffer to empty, i.e. completely unusable.
At the same time, the 5kB buffer that may be reasonable for a 1Mbps link, is clearly much too small for a 10Gbit link.
In short, when link-speed varies a *lot* there is no correct buffer-size, instead you MUST actively manage your buffers, using some sort of AQM, which today most routers and devices do not, infact, do.
"does not let the queue build up" is the key phrase here. More specifically, does not let the queue for any one outgoing link grow beyond what can (probably) be transmitted over the next few milliseconds on that specific link, without dropping a packet or two as a signal that congestion is occuring.
wireless is a special challenge: just because that link is 5mbps this moment, doesn't mean it won't be a lot less 20ms from now, and if a packet is lost, you don't know if it was congestion or noise - and the apropriate response to high-noise-low-traffic is exactly the opposite response to low-noise-high-congestion.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 12, 2011 15:34 UTC (Mon) by marcH (subscriber, #57642)
[Link]
> If such a 10MB buffer ends up holding data that trickles out over a 1mbps link though, it'd take a *minute* for the buffer to empty, i.e. completely unusable.
This keeps coming... why would the 1Gb/s link hold the data on behalf on the 1Mb/s link?
Plug the modem - get the problems...
Posted Dec 12, 2011 15:46 UTC (Mon) by khim (subscriber, #9252)
[Link]
This keeps coming... why would the 1Gb/s link hold the data on behalf on the 1Mb/s link?
Buy modem, attach to computer, get the problem. Computer only know about 1GBit link to the modem and so allocates 1MB buffer, router only gets 1Mbit because your line is not ideal... instant 1000x impedance mismatch.
Plug the modem - get the problems...
Posted Dec 12, 2011 15:55 UTC (Mon) by marcH (subscriber, #57642)
[Link]
> instant 1000x impedance mismatch.
Not the problem.
What's happening here is bufferbloat inside *the modem*; NOT inside the computer. Make the modem adjust its queue size depending on the speed of each outgoing link and your problem is solved.
The problem is the modem having the same buffer size on every link (the buffer is probably even shared across the links). Simple laziness from the designers.
Plug the modem - get the problems...
Posted Dec 12, 2011 15:57 UTC (Mon) by marcH (subscriber, #57642)
[Link]
> What's happening here is bufferbloat inside *the modem*; NOT inside the computer.
... unless you have Ethernet flow control enabled, in which case you might have bufferbloat in BOTH places because of backpressure! Disable flow control right now since it's not compatible with Van Jacobson congestion control.
Plug the modem - get the problems...
Posted Dec 16, 2011 2:47 UTC (Fri) by quanstro (guest, #77996)
[Link]
ethernet flow control, at least through switches, makes
tcp-style flow control work better! at least that's been
my experience.
ethernet flow control
Posted Dec 16, 2011 16:57 UTC (Fri) by marcH (subscriber, #57642)
[Link]
Your mileage may vary. The effect of Ethernet flow control depends on a wide range of parameters.
Ethernet flow control is effectively chaining queues across devices. Since the aggregated queue is bigger I can see how it *may in some cases* enhance TCP throughput. But it will obviously make any existing bufferbloat even worse.
Most importantly, Ethernet flow control will create HOL blocking.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 12, 2011 15:47 UTC (Mon) by marcH (subscriber, #57642)
[Link]
> wireless is a special challenge: just because that link is 5mbps this moment, doesn't mean it won't be a lot less 20ms from now,
Is wireless link speed varying that fast that you cannot adjust your queue size accordingly with (again) results not good enough for scientists but decent enough for engineers and end users? This is a genuine question.
Surely when sitting at your desk your link does not keep jumping from 100Mb/s to just 1Mb/s several times per second, does it?
> and if a packet is lost, you don't know if it was congestion or noise - and the apropriate response to high-noise-low-traffic is exactly the opposite response to low-noise-high-congestion.
Yes, dropping packets is a very poor congestion signal. A LOT has been said about this already. Is it really related to bufferbloat? I do not think so. It was a concern a long time before anyone noticed bufferbloat, and for sure it will still be a concern a long time after bufferbloat is fixed (if ever...) I can imagine that the two can interact badly with each other however, this does not prevent working on and fixing the two problems independently of each other.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 1:52 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
>> You'd think if
>> it helped their customers they'd put engineering resources on it.
>> I've heard that it's partially a prisoner's dilemma - if you turn
>> AQM/RED off and others on the same pipe use it, you get better
>> performance. And packet drops can't be manipulated like this,
>> so TCP uses the measure that's harder to fake. True?
> No, I don't believe so.
I don't think that was the reason for TCP using packet dropping as the measure, remember that TCP congestion fallback predates RED and most other AQM proposals.
but that being said, I have heard that some of the AQM proposals do end up being disadvantaged if used over a congested link with traffic that isn't well behaved by that algorithm's definition.
the issue that the AQM protocols require tuning and attention to get the best performance, and incorrect tuning can cripple you is far more of an issue. If 'out of the box' with no AQM is 'good enough' anyone lacking manpower will be reluctant to add AQM that requires manpower to get right.
the issue is now we are getting to the point where no AQM is no longer being considered 'good enough' for many areas of the Internet, but the existing AQM protocols still have the same problems they always have had, so there is still research to try and find better protocols.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 19:32 UTC (Tue) by Unladen (guest, #72953)
[Link]
>the issue that the AQM protocols require tuning and attention to get the
>best performance, and incorrect tuning can cripple you is far more of an
>issue. If 'out of the box' with no AQM is 'good enough' anyone lacking
>manpower will be reluctant to add AQM that requires manpower to get right.
Thanks, that's useful - worst-case AQM is much worse than without it, but best-case is better, does explain a lot.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 17:19 UTC (Tue) by nye (guest, #51576)
[Link]
>Having said that, I am working on demo (and some data) I'll put up sometime soon in video form; we can use bandwidth shaping to get the buffers out of the broadband connection (sacrificing bandwidth and powerboost), and the difference is actually quite dramatic.
Is this materially different in some way to things like wondershaper and its ilk that have been around for years?
The whole bufferbloat issue confuses me since AFAIU it looks like a restatement of what every P2P user has known for a decade or more, and that's really not news. Maybe I'm missing something.
(And there's a corollary to that, which is that carriers really have no incentive to fix the problem since it mostly affects the users that they hate.)
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 19:28 UTC (Tue) by Unladen (guest, #72953)
[Link]
Fair enough. Sounds like you're saying that automatic buffer management algorithms don't work well because modern links often have variable bandwidth. Thus better algorithms need to be devised.
It is striking that many end users are told what bandwidth they achieve (numbers in browser download, file copy windows), but not link latency. Perhaps that has given ISPs an incentive to optimize for bandwidth (i.e. FastStart, or jack up buffer sizes) and not care about latency because it's hard to measure.
And if AQM/ECN requires all users of a link to turn it on, then adoption will be difficult. Tasks that want maximal bandwidth and don't care about latency (like browser downloads) have no incentive to use it.
Someone needs to write a mobile app that forces packet drops after it detects congestion and speeds up the browser experience. You do that and users will be clamoring for AQM.
Bufferbloat: Dark Buffers in the Internet (ACM Queue)
Posted Dec 6, 2011 19:51 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
AQM isn't an application thing, it's a system network stack thing.
in the past, entities have tested their systems for latency (usually with tiny packets using a small amount of bandwith) and separately for bandwidth (usually with huge packets)
if you bandwidth isn't saturated, the small packets never queue up and so the device shows good latency on the latency test. buffer size doesn't matter.
in the bandwidth test, larger buffers prevent packets from being dropped (and therefor retransmit times), so larger buffers help bandwith measurements (even if only marginally).
so this has lead to thinking that larger buffers are always better. Add in the fact that memory is getting cheaper (OLPCv1.5 went from 256M of ram to 1G of ram because it was cheaper to buy the 1G memory modules). If you are building a router and have more memory, what are you going to use it for besides larger buffers?
the bufferbloat problem is that when the bandwidth is saturated, latency becomes horrible. This is not a combination that vendors have been testing.