LPC: An update on bufferbloat

Posted Sep 13, 2011 20:11 UTC (Tue) by ncm (guest, #165)
Parent article: LPC: An update on bufferbloat

I wonder if chasing bufferbloat is barking up the wrong tree. If TCP doesn't tolerate generous buffers, surely that means there's something wrong with TCP, and we need to fix that.

One approach that has been wildly successful, commercially, is to ignore packet loss as an indicator of congestion, and instead measure changes in transit time. As queues get longer, packets spend more time in them, and arrive later, so changes in packet delay provide a direct measure of queue length, and therefore of congestion. Given reasonable buffering and effective filtering, this backs off well before forcing packet drops. Since it doesn't require router or kernel coöperation, it can be (and, indeed, has been) implemented entirely in user space, but a kernel implementation could provide less-noisy timing measurements.

Maximum performance depends on bringing processing in the nominal endpoints (e.g. disk channel delays) into the delay totals, but below a Gbps that can often be neglected.

LPC: An update on bufferbloat

Posted Sep 13, 2011 21:21 UTC (Tue) by dlang (guest, #313) [Link]

the problem isn't when there is just one application communicating, the problem is when there are multiple applications going through the same device.

In that case, some applications won't care at all about latency because they are doing a large file transfer, and so as long as the delays are not long enough to cause timeouts (30 seconds+), they don't care. These applicatons want to dump as much data into the pipe as possible so that the total throughput is as high as possible.

the problem comes when you have another application that does care about latency, or only has a small amount of data to transmit. This application's packets go into the queue behind the packets for the application that doesn't care about latency, and since the queue is fifo, the application can time out (for single digit seconds timeouts) before it's packets get sent.

since different applications care about different things, this is never going to be fixed in the applications.

LPC: An update on bufferbloat

Posted Sep 14, 2011 5:59 UTC (Wed) by cmccabe (guest, #60281) [Link]

Here's concrete example. If your internet is provided through a cable company, there is a box somewhere owned by that company. The box connects you and a bunch of your neighbors to some bigger uplink.

There's a buffer there that all of your packets are going to have to wait in before getting serviced. It doesn't matter how carefully you measure changes in transit time. If your neighbors are downloading big files, they are probably going to fill that buffer to the brim and you're going to have to wait a length of time proportional to total buffer size. Your latency will be bad.

LPC: An update on bufferbloat

Posted Sep 14, 2011 10:03 UTC (Wed) by epa (subscriber, #39769) [Link] (15 responses)

To the layman it does seem odd that 'packet loss is essential' as a way to signal congestion. Maybe each packet should have a bit which says 'I would have dropped this to tell you to back off a bit, but I delivered it anyway because I'm such a nice guy'. Then the endpoints could take notice of this in the same way as they normally notice dropped packets, but without the need to retransmit the lost data.

Essential, but not the first line

Posted Sep 14, 2011 11:36 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (14 responses)

Sure, that's roughly ECN as I understand it.

But if you just have a "congested bit" what happens is that you can "optimise" by half-implementing it, you just send as much as you can, and all your data gets "politely" delivered with an ignorable flag, while any poor schmuck sharing the link who listens to the flag throttles back more and more trying to "share" with a bully who wants everything for themselves.

This is a race to the bottom, so any congestion handling needs to be willing to get heavy handed and just drop packets on the floor so that such bullies get lousy performance and give up. "Essential" means it's a necessary component, not the first line of defence.

QoS for unfriendly networks has to work the same way. If you just have a bit which says "I'm important" all the bullies will set it, so you need QoS which offers real tradeoffs like "I'd rather be dropped than arrive late" or "I am willing to suffer high latency if I can get more throughput".

Essential, but not the first line

Posted Sep 14, 2011 17:58 UTC (Wed) by cmccabe (guest, #60281) [Link] (12 responses)

I feel like having two QoS types, bulk and interactive, would solve 99% of the problems real applications have.

There are a lot of applications where you really just don't care about latency at all, like downloading a software update or retrieving a large file. And then there's applications like instant messenger, Skype, and web browsing, where latency is very important.

If bulk really achieved higher throughput, and interactive really got reasonable latency, I think applications would fall in line pretty quickly and nobody would "optimize" by setting the wrong class.

The problem is that there's very little real competition in the broadband market, at least in the US. The telcos tend to regard any new feature as just another cost for them. Even figuring out "how much bandwidth will I get?" or "how many gigs can I download per month?" is often difficult. So I don't expect to see real end-to-end QoS any time soon.

Essential, but not the first line

Posted Sep 15, 2011 13:18 UTC (Thu) by joern (guest, #22392) [Link] (10 responses)

Now it would be nice if there were a good heuristic to determine whether a packet is bulk or interactive. Ssh might seem interactive, but my multi-gigabyte scp copies sure aren't. Http will often be interactive, unless initiated by wget. Or unless the "web page" is a kernel image or some similar large structure.

Tcp actually has a good heuristic. If a packet got lost, this connection is too fast for the available bandwidth and has to back off a bit. If no packets get lost, it will use a bit more bandwidth. With this simple mechanism, it can adjust to any network speed, fairly rapidly adjust changing network speeds, etc.

Until you can come up with a similarly elegant heuristic that doesn't involve decisions like "ssh, but not scp, unless scp is really small", consider me unconvinced. :)

Essential, but not the first line

Posted Sep 16, 2011 16:14 UTC (Fri) by sethml (guest, #8471) [Link] (9 responses)

I think simply tracking how many bytes have been transferred over a given TCP connection in the past, say, 250 ms would be a decent measure of the importance of low latency. A TCP connection which has only transferred a small amount of data recently is likely either an http connection I just initiated or an interactive say connection, and either way, low latency is probably desirable. A TCP connection with a large amount of recent data is probably a larger download, and cares more about bandwidth than latency. I think this sort of adaptive approach which requires no application-level changes is the only sort of approach which is likely to work in the real world.

Unfortunately my scheme requires routers to track TCP connection state, which might be prohibitively expensive in practice on core routers.

Essential, but not the first line

Posted Sep 16, 2011 21:03 UTC (Fri) by piggy (guest, #18693) [Link]

Quite a few years ago some friends and I set up a test network based on interlocking rings of modems. One direction around the ring was "high throughput" and the other was "low latency". We sent full-MTU packets the high throughput direction and runts the other. For our toy loads it seemed to be a pretty good heuristic.

Essential, but not the first line

Posted Sep 22, 2011 5:28 UTC (Thu) by cmccabe (guest, #60281) [Link] (7 responses)

I don't think short connection = low latency, long connection = high throughput is a good idea.

Due to the 3-way handshake, TCP connections which only transfer a small amount of data have to pay a heavy latency penalty before sending any data at all. It seems pretty silly to ask applications that want low latency to spawn a blizzard of tiny TCP connections, all of which will have to do the 3-way handshake before sending even a single byte. Also, spawning a blizzard of connections tends to short-circuit even the limited amount of fairness that you currently get from TCP.

This problem is one of the reasons why Google designed SDPY. The SPDY web page explains that it was designed "to minimize latency" by "allow[ing] many concurrent HTTP requests to run across a single TCP session."
(See http://www.chromium.org/spdy/spdy-whitepaper)

Routers could do deep packet inspection and try to put packets into a class of service that way. This is a dirty hack, on par with flash drives scanning the disk for the FAT header. Still, we've been stuck with even dirtier hacks in the past, so who knows.

I still feel like the right solution is to have the application set a flag in the header somewhere. The application is the one who knows. Just to take your example, the ssh does know whether the input it's getting is coming from a tty (interactive) or a file that's been catted to it (non-interactive). And scp should probably always be non-interactive. You can't deduce this kind of information at a lower layer, because only the application knows.

I guess there is this thing in TCP called "urgent data" (aka OOB data), but it seems to be kind of a veniform appendix of the TCP standard. Nobody has ever been able to explain to me just what an application might want to do with it that is useful...

Essential, but not the first line

Posted Sep 22, 2011 8:23 UTC (Thu) by kevinm (guest, #69913) [Link]

I have heard of exactly one use for TCP URG data - a terminal emulator sending a BREAK to a remote system immediately after the user types it, allowing it to jump other keystrokes that may be still on their way.

Essential, but not the first line

Posted Sep 22, 2011 17:20 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

I still feel like the right solution is to have the application set a flag in the header somewhere. The application is the one who knows. Just to take your example, the ssh does know whether the input it's getting is coming from a tty (interactive) or a file that's been catted to it (non-interactive). And scp should probably always be non-interactive. You can't deduce this kind of information at a lower layer, because only the application knows.

And SSH can do just this: if DISPLAY is unset and SSH is running without a terminal, it sets the QoS bits for a bulk transfer: otherwise, it sets them for an interactive transfer. Unfortunately scp doesn't unset DISPLAY, so if you run scp from inside an X session I suspect it always gets incorrectly marked as interactive... but that's a small thing.

Essential, but not the first line

Posted Sep 23, 2011 6:44 UTC (Fri) by salimma (subscriber, #34460) [Link] (1 responses)

Can't you do "DISPLAY= scp ..." ?

Essential, but not the first line

Posted Sep 23, 2011 10:57 UTC (Fri) by nix (subscriber, #2304) [Link]

Yes, but my point is that you shouldn't have to. scp should pass in "-o IPQoS throughput" by default. (Speaking as someone who, er, hasn't written the patch to make it do so.)

Essential, but not the first line

Posted Sep 23, 2011 0:52 UTC (Fri) by njs (subscriber, #40338) [Link] (2 responses)

> I don't think short connection = low latency, long connection = high throughput is a good idea.

I don't think anyone does? (I've had plenty of multi-day ssh connections; they were very low throughput...)

I think the idea is that if some connection is using *less* than its fair share of the available bandwidth, then it's reasonable to give it priority latency-wise. If it could have sent a packet 100 ms again without being throttled, but chose not to -- then it's pretty reasonable to let the packet it sends now jump ahead of all the other packets that have arrived in the last 100 ms; it'll end up at the same place as it would have if the flow were more aggressive. So it should work okay, and naturally gives latency priority to at least some of the connections that need it more.

Essential, but not the first line

Posted Sep 23, 2011 9:39 UTC (Fri) by cmccabe (guest, #60281) [Link] (1 responses)

Penalizing high bandwidth users is kind of an interesting heuristic. It's definitely better than penalizing long connections, at least!

However, I think you're assuming that all the clients are the same. This is definitely not be the case in real life. Also, not all applications that need low latency are low bandwidth. For example video chat can suck up quite a bit of bandwith.

Just to take one example. If I'm the cable company, I might have some customers with a 1.5 MBit/s download and others with 6.0 MBit/s. Assuming that they all go into one big router at some point, the 6.0 MBit/s guys will obviously be using more than their "fair share" of the uplink from this box. Maybe I can be super clever and account for this, but what about the next router in the chain? It may not even be owned by my cable company, so it's not going to know the exact reason why some connections are using more bandwidth than others.

Maybe there's something I'm not seeing, but this still seems problematic...

Essential, but not the first line

Posted Sep 24, 2011 1:30 UTC (Sat) by njs (subscriber, #40338) [Link]

Well, that's why we call it a heuristic :-) It can be helpful even if it's not perfect. A really tough case is flows that can scale their bandwidth requirements but value latency over throughput -- for something like VNC or live video you'd really like to use all the bandwidth you can get, but latency is more important. (I maintain a program like this, bufferbloat kicks its butt :-(.) These should just go ahead and set explicit QoS bits.

Obviously the first goal should be to minimize latency in general, though.

The QoS lost cause

Posted Sep 29, 2011 21:40 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I feel like having two QoS types, bulk and interactive, would solve 99% of the problems real applications have.

Interesting, but never going to happen. The main reason why TCP/IP is successful is because QoS is optional in theory and non-existent in practice.

The end to end principle states that the network should be as dumb as possible. This is at the core of the design of TCP/IP. It notably allows interconnecting any network technologies together, including the least demanding ones. The problem with this approach is: as soon as you have the cheapest and dumbest technology somewhere in your path (think: basic Ethernet) there is a HUGE incentive to align your other network section(s) on this lowest common denominator (think... Ethernet). Because the advanced features and efforts you paid in the more expensive sections are wasted.

Suppose you have the perfect QoS settings implemented in only a few sections of your network path (like many posts in this thread do). As soon as the traffic changes and causes your current bottleneck (= non-empty queue) to move to another, QoS-ignorant section then all your QoS dollars and configuration efforts become instantly wasted. Policing empty queues has no effect.

An even more spectacular way to waste time and money with QoS in TCP/IP is to have different network sections implementing QoS in ways not really compatible with each other.

The only cases where TCP/IP QoS can be made to work is when a *single* entity has a tight control on the entire network; think for instance VoIP at the corporate or ISP level. And even there I suspect it does not come cheap. In other cases bye bye QoS.

Essential, but not the first line

Posted Sep 16, 2011 9:22 UTC (Fri) by ededu (guest, #64107) [Link]

What about the following simple solution? The buffer is divided in two, a 5% part and a 95% part; the first allows low latency (since small size), the latter achieves high throughput (since big size). The sender sets in each packet a bit to choose in which buffer part the packet will be put. The router serves in round-robin (one packet from the 1st part, one packet from the 2nd, one from 1st, one from 2nd etc.)

(An optimisation can be done to use the 5% part if it is partly used and the 95% part is full.)