Gettys: Bufferbloat in 802.11 and 3G Networks
A simple concrete optimal example of such a busy network might be 25 802.11 nodes, each with a single packet buffer; no transmit ring, no device hardware buffer, trying to transmit to an access point. Some nodes are far away, and the AP adapts down to, say 2Mbps. This is common. You therefore have 25 * 1500 bytes of buffering; this is > .15 seconds excluding any overhead, if everything goes well; the buffers on the different machines have "aggregated" behavior. This is the optimal case for such a busy network. Even a 802.11g network with everyone running full speed will only be about 10 times better than this."
Posted Jan 4, 2011 5:11 UTC (Tue)
by smoogen (subscriber, #97)
[Link]
Posted Jan 4, 2011 9:38 UTC (Tue)
by job (guest, #670)
[Link] (13 responses)
Posted Jan 4, 2011 10:35 UTC (Tue)
by paulj (subscriber, #341)
[Link] (12 responses)
Posted Jan 4, 2011 13:00 UTC (Tue)
by erwbgy (subscriber, #4104)
[Link] (11 responses)
Agreed. I certainly wasn't aware of the problem until I started reading these blog entries.
Posted Jan 4, 2011 15:51 UTC (Tue)
by jg (guest, #17537)
[Link] (10 responses)
I don't doubt there are better texts/discussions: but they generally lack grounding to the reality of the problems we all experience right now.
Pointers to such texts and papers would be very welcome: this isn't really my area; rather it is an area a blundered into by accident and necessity of my job. I have to write this up properly into something more coherent over the next few months.
Posted Jan 5, 2011 0:09 UTC (Wed)
by calhariz (guest, #5003)
[Link] (9 responses)
I think this is your best text. It's clear, well written and easy to understand by people that don't have a background on networks.
Posted Jan 5, 2011 19:55 UTC (Wed)
by jg (guest, #17537)
[Link] (8 responses)
I see bufferbloat as an education problem: we've (the entire industry) all been making the same mistakes of excessive/static buffering for over a decade; and the consequences of which are not obvious.
And better references are really gratefully needed: for that formal publication, I really do not want to (nor will I have space for) the kind of exposition I've been trying to do in the blog.
Posted Jan 5, 2011 22:31 UTC (Wed)
by kleptog (subscriber, #1183)
[Link] (7 responses)
This also reminds me of playing with wireless >ten years ago, when they were full length ISA cards. We had a serious problems with packet loss and that TCP couldn't distinguish between packet loss (data corrupted) and packet loss (congestion). TCP would keep backing off, while sending more would have been more beneficial, kind of the opposite problem to here. I was actually thinking of building a retransmission protocol under TCP to compensate for it! Fortunately, the technology improved before I needed to do that.
What we need is way of hiding the unreliability of the network from TCP (so it doesn't back off) while at the same time keeping latency to a minimum. Forward Error Correction should do this, and recent protocols have stacks of it, but it's obviously not working. Or perhaps an Explicit Crappy Network Notification bit, to tell TCP that the packet got lost but it *wasn't* congestion.
Posted Jan 8, 2011 21:17 UTC (Sat)
by kleptog (subscriber, #1183)
[Link] (6 responses)
When I was playing with that lossy wifi, at 1% packet loss it was sorta ok. But at 5% it became totally unusable. You can experiment easily, there's an iptables module to simulate packet loss.
Posted Jan 8, 2011 22:49 UTC (Sat)
by dlang (guest, #313)
[Link] (5 responses)
if you think that 1 second latency is bad, remember that dropping a packet introduces a 30 second delay until that packet is retransmitted.
for a data transfer that will take several minutes, this is not a big deal, especially if it makes the back-off work properly. but if this is an interactive session, dropping a packet can be disastrous.
the 'right' answer for this is to use QOS and priorities to slow down and drop packets for the long-term data connections while keeping the interactive connections as close to loss-free as possible.
It may be that the right answer for this is to reduce the retransmit time for dropped packets.
Posted Jan 8, 2011 23:24 UTC (Sat)
by foom (subscriber, #14868)
[Link] (4 responses)
Posted Jan 13, 2011 20:01 UTC (Thu)
by dlang (guest, #313)
[Link] (3 responses)
the time to retransmit a dropped packet needs to be longer than the round trip time or you will be sending out replacements for packets that are still on their way to the destination.
with satellite single-hop ping times in the 1000ms range, and dialup lines in the 250-300ms range I don't see how it can possibly be less than a few seconds.
if it was 200ms, then a connection with a total ping time of 1000ms would send 6+ copies of every packet.
Posted Jan 13, 2011 22:26 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Posted Jan 13, 2011 23:39 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
Posted Jan 14, 2011 3:44 UTC (Fri)
by foom (subscriber, #14868)
[Link]
Look at tcp_rtt_estimator in net/ipv4/tcp_input.c
Posted Jan 5, 2011 22:49 UTC (Wed)
by jg (guest, #17537)
[Link]
Note that in reality, you can never fully distinguish congestion loss from random error, and you can easily get into other pain by trying to hide everything from TCP, including its ability to use SACK and fast retransmit.
So trying to paper over problems can easily be self defeating (as the 802.11 and 3g people have succeeded in doing). I have a bit more sympathy for the 3g folks, who were essentially doing a retrofit over existing technology, than the 802.11 people, who just should have done more study about how packet networks really work...
But getting this all cleaned up (as best we can) is an almost Sisyphean task. Help gratefully accepted...
Gettys: Bufferbloat in 802.11 and 3G Networks
Better texts
Better texts
Better texts
Better texts
- Jim
Better texts
Better texts
Thanks
Thanks
Thanks
Thanks
Thanks
Thanks
Thanks
Thanks
RE: Thanks
- Jim