Bufferbloat: the summary

Posted Feb 26, 2011 0:30 UTC (Sat) by zlynx (guest, #2285)
In reply to: Bufferbloat: the summary by jg
Parent article: The debloat-testing kernel tree

It's not insanely large when transmitting on the LAN they're designed for.

Bufferbloat: the summary

Posted Feb 26, 2011 2:05 UTC (Sat) by jg (guest, #17537) [Link]

Sometimes the buffer sizes insane even on the medium it was designed for.

Memory has gotten so big/so cheap that people often use values much much larger than makes sense under any circumstances.

For example, I've heard of DSL hardware with > 6 seconds of buffering.

Take a look at the netalyzr plots in: http://gettys.wordpress.com/2010/12/06/whose-house-is-of-...

The diagonal lines are latency in *seconds*.

Bufferbloat: the summary

Posted Feb 26, 2011 2:15 UTC (Sat) by jg (guest, #17537) [Link] (5 responses)

Also note there is never a "right" answer.

Example: A gigabit ethernet.

Say you have supposedly sized your buffers "correctly" presuming a global length path, at your maximum speed, presuming some number of flows, by the usual rule of thumb: bandwidth x delay x sqrt (#flows).

Now you plug this gigabit NIC, into your 100Mbps switch. Right off the bat, your system should be using 1/10th the # of buffers.

And you don't know how many flows.

So even if you did it "right", you have the *wrong answer*, and do so most of the time.

Example 2: 802.11n

Size your buffers for, say, 100Mbps over continental US delays, presuming some number of flows.

Now, go to a conference with 802.11g, and sit in a quiet corner. Your wireless might be running at a few megabits/second; but you are sharing the channel with 50 other people.

Your "right answer" for buffering can easily be off by 2-3 *orders magnitude*. At that low speed, it can take a *very long time* for your packets to finally get transmitted.

***There is no single right answer for the amount of buffering in most network environments.***

Right now, our systems' buffers are typically sized for the maximum amount of buffering they might ever need, even though we seldom operate them in that regime (if the buffer sizes were thought about by the engineers involved).

So the buffers aren't just over sized, they are downright bloated.

Bufferbloat: the summary

Posted Feb 26, 2011 12:20 UTC (Sat) by hmh (subscriber, #3838) [Link] (2 responses)

Actually, the answer is, and has always been, AQM. You can and should have a dynamically-sized queue, even on hosts (NOTE: socket buffers often should be rather large, this has nothing to do with the queues).

The queue should be able to grow large, but only for flows where the bandwidth-delay product requires it. And it should early-drop.

And the driver DMA ring-buffer size really should be considered part of the queue for any calculations, although you probably have to consider that part of the queue a "done deal" and not drop/reorder anything there. Otherwise, you can get even fast-ethernet to feel like a very badly behaved LFN (long fat network). However, reducing DMA ring-buffer size can have several drawbacks on high-throughput hosts.

Using latency-aware, priority-aware AQM (even if it is not flow-aware) should fix the worst issues, without downgrading throughput on bursty links or long fat networks. Teaching it about the hardware buffers would let it autotune better.

Bufferbloat: the summary

Posted Feb 26, 2011 13:37 UTC (Sat) by jg (guest, #17537) [Link] (1 responses)

Yes, AQM is the answer, including on hosts.

What AQM algorithm is a different question.

Van Jacobson says that RED is fundamentally broken, and has no hope of working in the environments we have to operate in. And Van was one of the
inventors of RED...

SFB may or may not hack it. Van has an algorithm he is finishing up the write up of that he thinks may work. Hopefully will be available soon. We have fundamentally interesting problem here. And testing this is going to be much more work than implementing, by orders of magnitude.

It isn't clear the AQM needs to be priority aware; wherever the queues are building, you are more likely to choose a packet to drop (literally drop, or ECN mark) just by running an algorithm across all the queues. I haven't seen arguments that makes me believe the AQM must be per queue (that doesn't mean there aren't any! just I haven't seen them).

And there are good reasons why the choice of packet to drop should have randomness in it; time based congestion can occur if you don't. Different packet types also have different leverage to them (acks, vs. data, vs. syn, etc.).

Bufferbloat: the summary

Posted Feb 26, 2011 16:14 UTC (Sat) by hmh (subscriber, #3838) [Link]

You're likely not going to get anywhere above "acceptable" using a simple AQM, even if it is SFB. It is not going to get to "good" or "excelent" marks.

The Diffserv model got it right, in the sense that even on a simple host, there are flows for which you do NOT want to drop packets (DNS, NTP) if you can help it, and that there is naturally an hierarchy of priorities of which services you'd rather suffer more packet drops than others during congestion.

I've also found that "socializing" the available bandwidth among flows of the same class is a damn convenient thing (SFQ). SFB does this well, AFAIK.

So, I'd say that what we should aim for hosts is an auto-tuned flow-aware AQM that at least pays attention to the bare minimum of priority ordering (802.1p/DSCP class selectors) and does a good job of keeping latency under wraps without killing throughput on high bandwidth-delay product flows. Such a beast could be enabled by default on a distro [for desktops] with little fear.

This doesn't mean you need multiple queues. However, you will want multiple queues in many cases because that's how hardware-assisted QoS works, such as what you find on any 802.11n device or non-el-cheap-o gigabit ethernet NIC.

Routers are a different deal altogether.

Bufferbloat: the summary

Posted Feb 26, 2011 17:58 UTC (Sat) by kleptog (subscriber, #1183) [Link] (1 responses)

Note: you have to be very careful to distinguish different kinds of buffers. The TCP windows and hence buffers at the *endpoints* of TCP connections do need to be scaled according to the bandwidth-delay product. However, the *routers* in between don't need anywhere near that much, they're routing IP packets and the buffers are just for getting good utilisation from the link.

Any delay in the routers adds to the overall delay and thus adds to the bandwidth-delay product. In essence your endpoint's memory usage is the sum of the memory used by all the routers in between. Packets spend more time in memory than they do in-flight.

There was a paper somewhere about buffers and streams and the more streams you have the few buffers you need. So your endpoints need big buffers, your modem smaller buffers and internet routers practically none.

Bufferbloat: the summary

Posted Feb 27, 2011 16:33 UTC (Sun) by mtaht (subscriber, #11087) [Link]

This paper has been very influential on my thinking about routers, home gateways and personal computers and coping with bufferbloat.

http://www.cs.clemson.edu/~jmarty/papers/PID1154937.pdf

Wireless is not part of this study and has special problems (retries)...