At the very least Gettys got a wicked T-shirt.

Posted Sep 13, 2011 21:24 UTC (Tue) by dlang (guest, #313)
In reply to: At the very least Gettys got a wicked T-shirt. by man_ls
Parent article: LPC: An update on bufferbloat

the problem is that there is no one right answer to what the settings should be, and the right settings depend on the sizes of pipes (and useage generated by other people) in between you and your destination.

right now there is no queuing or QoS setting or algorithm that is right for all situations

At the very least Gettys got a wicked T-shirt.

Posted Sep 13, 2011 21:43 UTC (Tue) by man_ls (guest, #15091) [Link] (8 responses)

There is no algorithm for the general situation, but there is a very good approximation for the endpoints, right? Prioritize your outgoing packets, limit your download rate to a little less than the available bandwidth for your incoming packets. That is what Gettys calls "dropping packets", which should be a good thing and not too hard to do in the OS.

At the very least Gettys got a wicked T-shirt.

Posted Sep 13, 2011 21:58 UTC (Tue) by dlang (guest, #313) [Link] (7 responses)

you can't limit your download rate if you don't know the size of the pipes.

if you have a machine plugged in to a gig-E network in your house, that is then connected to the Internet via a 1.5 down/768 up DSL line, your machine has no way of knowing what the bandwidth that it should optimize for is.

the prioritization and queuing fixes need to be done on your router to your ISP, and on the ISPs router to you.

you can't fix the problem on your end because by the time you see the downloaded packets they have already gotten past the chokepoint where they needed to be prioritized.

At the very least Gettys got a wicked T-shirt.

Posted Sep 13, 2011 22:08 UTC (Tue) by man_ls (guest, #15091) [Link] (6 responses)

Hmmm, true. What my router could do is drop packets for any traffic beyond 90% of its capacity; my computer in turn could drop packets for anything beyond 90% of its gig-E nominal capacity. Anything beyond my control, I cannot do anything but order a bufferbloat T-shirt.

At the very least Gettys got a wicked T-shirt.

Posted Sep 13, 2011 22:19 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

what the bufferbloat folks are trying to do is to find an QoS algorithm that works and if nothing else, get it so that the ISPs (and their upstream hardware suppliers) configure their routers with defaults that do not create as big a problem.

if you think about this, the ISP router will have a very high speed connection to the rest of the ISP, and then a lot of slow speed connections to individual houses.

having a large buffer is appropriate for the high speed pipe, and this will work very well if the traffic is evenly spread across all the different houses.

but if one house generates a huge amount of traffic (they download a large file from a very fast server), the buffer can be filled up by the traffic to this one house. that will cause all traffic to the other houses to be delayed (or dropped if the buffer is actually full), and having all of this traffic queued at the ISPs local router does nobody very much good.

TCP is designed such that in this situation, the ISPs router is supposed to drop packets for this one house early on and the connection will never ramp up to have so much data in flight.

but by having large buffers, the packets are delayed a significant amount, but not dropped, so the sender keeps ramping up to higher speeds.

the fact the vendors were not testing latency and bandwith at the same time hid this fact. the devices would do very well in latency tests that never filled the buffers, and they would do very well in throughput tests that used large parts of the buffers. without QoS providing some form of prioritization, or dropping packets, the combination of the two types of traffic is horrible.

At the very least Gettys got a wicked T-shirt.

Posted Sep 14, 2011 21:09 UTC (Wed) by jg (guest, #17537) [Link]

Actually, we need a AQM algorithm more than QOS; all QOS can do is change who suffers when, not prevent the buffers in some queue or another from filling. To do that, you need to signal congestion in a timely way via packet loss or ECN.

At the very least Gettys got a wicked T-shirt.

Posted Sep 14, 2011 2:35 UTC (Wed) by jg (guest, #17537) [Link] (3 responses)

There is a lot you can do with your router and knowledge about bufferbloat.

I'm going to try to write a blog posting soon about what is probably the best strategy for most to do.

And the tee shirt design is in Cafe Press, though I do want to add a footnote to the "fair queuing" line, that runs something like "for some meaning of fair", in that I don't really mean the traditional TCP fair queuing necessarily (whether that is fair or not is in the eye of the beholder).

At the very least Gettys got a wicked T-shirt.

Posted Sep 14, 2011 7:52 UTC (Wed) by mlankhorst (subscriber, #52260) [Link]

Yeah bufferbloat is a pain. I patched openwrt to include SFB (stochastic fair blue), then created a simple traffic shaping script based on the approximate upload limits of my dsl2+ router. I no longer get multisecond delays when the network is actually used in the way it's supposed to. :)

Enabled ECN where possible too, which helps with the traffic shaping method I was using. Windows XP doesn't support it, and later versions disable it by default.

Time, not bandwidth/delay, is the key

Posted Sep 14, 2011 14:53 UTC (Wed) by davecb (subscriber, #1574) [Link] (1 responses)

Some practical suggestions will be much appreciated.

Changing the subject slightly, there's a subtle, underlying problem in that we tend to work with what's easy, not what's important.

We work with the bandwidth/delay product because it's what we needed in the short run, and we probably couldn't predict we'd need something more ta the time. We work with buffer sizes because that's dead easy.

What we need is the delay, latency and/or service time of the various components. It's easy to deal with performance problems that are stated in time units and are fixed by varying the times things take. It's insanely hard to deal with performance problems when all we know is a volume in bytes. It's a bit like measuring the performance of large versus small cargo containers when you don't know if they're on a truck, a train or a ship!

If you expose any time-based metrics or tuneables in your investigation, please highlight them. Anything that looks like delay or latency would be seriously cool.

One needs very little to analyze this class of problems. Knowing the service time of a packet, the number of packets, and the time between packets is sufficient to build a tiny little mathematical model of the thing you measured. From the model you can then predict what happens when you improve or disimprove the system. More information allows for more predictive models, of course, and eventually to my mathie friends becoming completely unintelligable (;-))

--dave (davecb@spamcop.net) c-b

Time, not bandwidth/delay, is the key

Posted Sep 14, 2011 21:01 UTC (Wed) by jg (guest, #17537) [Link]

Most of the practical suggestions are in my blog already; I will try to pull something a bit more howtoish together.

You are exactly correct that any real solution for AQM must be time based; the rate of draining a buffer and the rate of growth of a buffer are related to the incoming and outing data per unit time.

As you note, not all bytes are created equal; the best example is in 802.11 where a byte in a multicast/broadcast packet can be 100 times more expensive than a unicast payload.

Thankfully, in miac80211 is a package called Minstrel, which is, on an on-going dynamic basis, keeping track of the costs of each packet (802.11n aggregation in particular makes this "interesting".

So the next step is to hook up appropriate AQM algorithms to it such as eBDP or the "RED Light" algorithm that Kathie Nichols and Van Jacobson are again trying to make work. John Linvile's quick reimplementation of eBDP (the current patch is in the debloat-testing tree) does not do this as yet and can't go upstream in it's current form for this and other reasons. eBDP seems to help as Van predicted it should (he pointed me at it in January), but we've not tested it much as
yet.

The challenge after that is going to be to get that all working while dealing with all the buffering issues along the way in the face of aggregation and QOS classification. There are some fun challenges for those who want to make this all work well; it's at least a three dimensional problem, so there will be no easy trivial solution ultimately, and will be a challenge. It's way beyond my understanding of Linux internals.

Please come help!