Gettys: Traditional AQM is not enough [LWN.net]

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 16:34 UTC (Thu) by mtaht (subscriber, #11087) [Link] (22 responses)

I would like to know what barriers would remain to get this teeny patch, turning on fq_codel by default in the linux mainline, at this point.

http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003...

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 18:03 UTC (Thu) by darthscsi (guest, #8111) [Link] (16 responses)

I'd like to see a compile time option for default queue similar to the block layer.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 18:25 UTC (Thu) by roskegg (subscriber, #105) [Link] (13 responses)

Yeah, disk access just trashes my system responsiveness. While reading Getty's article I was thinking the same thing; do this for processes accessing the hard drives too!

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:08 UTC (Thu) by jg (guest, #17537) [Link] (9 responses)

That's an interesting idea to preferentially schedule I/O operations to processes that don't build queues. But aggregation (key to disk performance) may become a major issue...

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:14 UTC (Thu) by roskegg (subscriber, #105) [Link] (8 responses)

I asked Mark Shuttleworth if the "smoothness" of his XMir meant that he has solved the "dipping into swap kills the whole system, not just the application being swapped out". He didn't reply. Maybe apply this codel thing to apps being paged out?

But even sustained disk io of any kind makes the system... jittery. The nightly locatedb run, for instance.

It is the same problem; bulk transfer versus a small throughput hit for small latency. As a desktop user I want no latency for keyboard and mouse, and the throughput is "good enough".

Could there be a sysctl to enable and disable this type of queueing for everything, page swapping, disk io, etc, to make it smoother?

Even something as simple as bittorrent finishes a download and "verifies" the gigs of downloaded material, makes everything shudder for a few seconds.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:38 UTC (Thu) by jg (guest, #17537) [Link] (5 responses)

Grrrr.... As to display systems, the other topic I know just a bit about:

1) There are HOL (Head of Line) blocking latency problems in display systems too, and some of the heritage left over from XFree86 brain damage that have not, to my knowledge been completely stamped out in X.org, that cause bad CPU scheduling behavior.

In short, operating systems like to give the CPU back to a process which is not "greedy", ones that use small amounts of compute and go back to sleep. (the OS can easily identify this behavior as "interactive").

So if you aren't very careful about busy waiting in user space on graphics device registers and yielding to the OS in a timely fashion, display servers (whether X.org, wayland or Mir) get identified as pigs and don't get preferentially scheduled, and therefore tend to suffer from other processes.

The original X implementations (before XFree86) usually got this right. X.org, the last I looked, not so much....

2) It turns out that there is another similarity to network devices (particularly highly variable throughput such as wireless): the command ring buffers display hardware uses are very easy to load up with a long string of commands, so they can operate autonomously. It's really tempting for benchmarking to make the number of commands outstanding to the display high. But it can kill latency and cause lots of HOL waiting.

In networking, at least if you know the current throughput (which Minstrel can give us for 802.11 someday) you can begin to choose how much buffering is appropriate to get the most bang for the buck without destroying latency.

But graphics has a yet harder problem: predicting how long any given command will take is not knowable, even simple commands may take radically different amounts of executution time, and with today's 3D engines, there are Turing complete processors out there that may never even terminate. So from that perspective, controlling display latency is harder than the problems we have in networking, until the GPU's grow into full processors running multiple threads themselves (which seems the long term trend). Again, it's really easy to get this wrong if you only do throughput benchmarking; you gotta measure latency.

I think careful attention to 1) above and care in the number of commands allowed to be outstanding in 2) above could go a long way to making the Linux display environment (whether X11 or otherwise) work much better. But
solving the second problem in the general case is just a PITA.

First, measure; ***then*** go fix....
- Jim

Gettys: Traditional AQM is not enough

Posted Jul 12, 2013 13:10 UTC (Fri) by jg (guest, #17537) [Link] (4 responses)

Keithp notes the X.org server no longer ever busy waits a long time in user space. So that problem (#1) is back to where we were in the X Window System in 1985 :-). Dunno what Wayland and Mir may be doing.

The HOL blocking problems are, as I discussed, non-trivial. People interested in tackling those problems, to the extent they can be attacked, I'm sure would be greatly appreciated. Given the dynamic range of graphics hardware performance, it's an "interesting" optimization problem.

Developers of all sorts need to understand that latency is at least as important as throughput, and should be routinely testing for it. I greatly applaud all the work so many have done to getting Linux and the desktop to boot quickly; over the last 5 years, boot times seem to have dropped from of order a minute to well under 10 seconds, only some of which is due to SSD's.

Gettys: Traditional AQM is not enough

Posted Jul 12, 2013 16:39 UTC (Fri) by roskegg (subscriber, #105) [Link] (3 responses)

The main problem I'm seeing these days is heavy disk IO.

Your article reminds me of this classic:

http://trafficwaves.org/

An article by William Beaty. He found that he could clear up traffic jams fairly easily. What I took away from his research is that if you back off from shoving yourself forward as much as possible (bulk transfer) and let others pop in front of you, most of the time, they are transients who just wanted to change lanes to get onto the off exit. Or they needed to get in from the on ramp and transfer to another lane.

So, just like traffic in real life, I think looking at the overall picture, overall throughput will be optimal if per-process throughput gives up the "greedy" approach and lets the transients pop in, do their thing, and vanish.

When there are no transients, the big apps can still do mega block transfers and the kernel can still aggregate them.

I hope by combining the insights of Bill Beatty and the codel method, disk access can be fixed so that bittorrent and updatedb no longer eat up latency and freeze my desktop.

Gettys: Traditional AQM is not enough

Posted Jul 13, 2013 10:47 UTC (Sat) by kleptog (subscriber, #1183) [Link] (1 responses)

Thanks for the trafficwaves link. In the Netherlands the freeways have variable speed traffic signs. Sometimes for no apparent reason speed limits appear and then a few km further they vanish again and you wonder why.

Now I figure there's some monitoring system in the back nipping all these traffic waves in the bud. I figured it was something like that, but now I understand why it works.

Gettys: Traditional AQM is not enough

Posted Jul 20, 2013 5:04 UTC (Sat) by giraffedata (guest, #1954) [Link]

I hadn't heard of variable speed limit signs, but I can see that that would work. The best device I've seen for eliminating stop-and-go traffic jams is the "Don't change lanes" sign that lights up when there is congestion.

I don't think any of this automobile traffic theory helps us at all with network traffic, though, because the automobile traffic unpleasantness is caused by two things that have no analog on data networks: 1) multiple lanes that cars can switch between continuously; and 2) following distance psychology. If all freeways had solid lines between the lanes except for occasional openings and cars had computers to ensure constant distance from the car ahead, there wouldn't be any stop-and-go traffic. You especially wouldn't see a car come to a stop because another one two miles down the road slowed down 3 mph to look at an accident.

Gettys: Traditional AQM is not enough

Posted Jul 23, 2013 0:31 UTC (Tue) by Baylink (guest, #755) [Link]

Well, kinda. Traffic Waves, for me, is more about building a bunch of space in front of you when traffic speeds are variable, by slowing enough that you can keep *your* forward speed steady, even if the peak speed is lower, so as to smooth out the traffic behind you -- since that makes the traffic easier to deal with.

I do it whenever possible, and I've found that, on urban interstates, I'm not the only one. Didn't realize it had a website, though.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 21:40 UTC (Thu) by cesarb (subscriber, #6266) [Link] (1 responses)

There is a patch series in 3.11 (starting at https://git.kernel.org/linus/75485363ce8552698bfb9970d901...) which might make a difference on these kinds of loads.

Gettys: Traditional AQM is not enough

Posted Jul 12, 2013 4:42 UTC (Fri) by roskegg (subscriber, #105) [Link]

Beautiful! I hope it becomes default.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:26 UTC (Thu) by darthscsi (guest, #8111) [Link] (2 responses)

I primarily meant that the default queuing discipline for the network should be a compile time option in an analogous way to how the default queuing discipline for the block layer is specified. (I was making no claim on whether a similar queuing strategy could help block device load)

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:50 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

you already have many configurable options for the queuing, this is about changing the default, not having it as an option.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 20:31 UTC (Thu) by darthscsi (guest, #8111) [Link]

I understand that. My point is the default should be a config option. This is an orthogonal issue to what the default should be set to (once there is momentum from changing it from pfifo). This is also orthogonal to it being controllable at runtime, as it is now.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:02 UTC (Thu) by jg (guest, #17537) [Link] (1 responses)

There is already both run time compile options and qdisc's can be set with the tc command. Some/many distros already build and ship fq_codel. But see my comment above about it not, by itself, solving all the world's bufferbloat problems. We have other bloat problems than in Linux's generic queuing discipline layer.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 20:42 UTC (Thu) by mtaht (subscriber, #11087) [Link]

Well, I think an argument can be made for making it easier to compile in an alternate default qdisc, from a distro basis.

It is very hard to configure tc correctly otherwise for a wide range of devices, notably multiqueued ones, and htb. This has been the primary barrier to adoption by bleeding edge distros. (OpenWrt went the kernel patch route which appears to be working out well.)

I will propose a patch in a week or so that does that rather than just arbitrarily switch.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:00 UTC (Thu) by jg (guest, #17537) [Link] (2 responses)

Consensus that fq_codel has been tested enough. We'll probably have that discussion at LPC at the end of the summer.

Note the fq_codel qdisc not a panacea: don't believe that just because it becomes the default, all your bufferbloat problems will go away.

There is lots of driver work left to do, particularly for 802.11, DSL, and cellular to name a few. Some of these drivers are closed source, and getting it deployed everywhere is going to take a long time even after they've been fixed.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:52 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

> Consensus that fq_codel has been tested enough.

Exactly, changing the default is a big step. fq_codel hasn't even been out long enough to make it in to any of the Enterprise distros as an option, let alone having any of them turn it on by default.

As such, it's probably premature for the upstream kernel to turn it on by default.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 20:02 UTC (Thu) by jg (guest, #17537) [Link]

Heh. If we wait for enterprise distros to do anything, Linux slows to a crawl. It's not as if enterprise guys don't go through their own cycle of configuration testing and ship kernel.org kernels unchanged and untested.

The big point in favor of changing the default in kernel.org (after more a year upstream in kernel.org, I'll note) is just how terrible PFIFO_FAST turns out to be as shown by Toke's CDF plot in that blog entry and all the other measurements we've made. We're orders of magnitude from where we know we can be. The improvement is *not* subtle.

But changing a default *is* a serious step, and until we get everyone in a room to talk it through, expressing all the pros and cons and concerns, it's not a decision to be taken lightly.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 22:22 UTC (Thu) by gdt (subscriber, #6284) [Link] (1 responses)

Nothing stopping fair queuing CoDel being the default. Not too surprising, since we're nearing the end of the process. Queueing is very much like CPU scheduling -- it's quite possible for a change to look good in broad testing, but to totally hurt an entire class of users which weren't considered during the testing. A new default queuing discipline has to have enough "runs on the board" to show that this isn't likely to happen.

My criticism of the proposed default is the lack of explicit provision for control plane traffic. The very HTTP behaviour FQCoDel is designed to punish (ten odd packets back-to-back) is exactly the pattern of most network control planes. If you starve the control plane then you blackhole (lose connectivity despite the link being available). Note that the largest FQCoDel deployment had to add a queue for control plane traffic, so that deployment can't be used as an argument for unmodified FQCoDel as the default.

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 22:38 UTC (Thu) by mtaht (subscriber, #11087) [Link]

I'm not aware of the "largest fq_codel" deployment having published anything about its deployment nor of the characteristics of its control plane. I have generally long assumed that that deployment is google... and would certainly like that deployment to talk a bit. I am certainly aware of pushback as to making fq_codel "the" default in Linux. (see ongoing thread here:

https://lists.bufferbloat.net/pipermail/codel/2013-July/0...

I'm trying to make it possible to have "a default other than pfifo_fast" at the moment.

In most of the rest of the universe, (and I'm talking android, desktops, home routers, most servers) the "control plane" consists of one (or two) packets at a time, consisting of basic routing/arp/nd sorts of packets which work just fine (and in the general case, better than pfifo_fast)

I do agree that retaining a three tier model has some uses, but very few - and most of my concern is in coming up with a structure for implementing it efficiently and generally. And deployably. I don't quite buy the control plane argument if it is only 10 packets...

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 19:28 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

That last graph is stunning. Now we just need to deploy this to several million machines...

Gettys: Traditional AQM is not enough

Posted Jul 11, 2013 20:30 UTC (Thu) by jg (guest, #17537) [Link]

Actually, we need to deploy to hundreds of millions of machines: you forget all the home routers and all the android phones....

But tons of work remains in drivers and elsewhere.... The enemy of the good is the perfect; draining this size swamp will take quite a few years yet.

Fq_codel is not the end, or all of the solution, it is a (very good) step in dealing with bufferbloat.

Gettys: Traditional AQM is not enough

Posted Jul 12, 2013 8:16 UTC (Fri) by mchazaux (guest, #64024) [Link] (1 responses)

How does fq_codel work with UDP ? Is it only a TCP thing ? UDP is never mentioned in any of the articles I read on the subject.

Gettys: Traditional AQM is not enough

Posted Jul 12, 2013 13:14 UTC (Fri) by jg (guest, #17537) [Link]

Flows are flows, whether TCP or UDP.

And UDP apps can fill buffers just as easily as TCP can (often faster).

It turns out that isolating flows lets the mark/drop policy of CoDel function much better in the face of uncontrolled UDP applications that are unresponsive to signalling (by mark or drop), so the two sides of fq_codel are synergistic.

Gettys: Traditional AQM is not enough

Posted Jul 23, 2013 0:22 UTC (Tue) by Baylink (guest, #755) [Link]

All I can say is "thank ghod NetPhone died".

(Is that what it was even called? Back in 96 or so I was Chief at a teeny ISP -- we fit 60 dialups into a 256kb/s frame link to Texas, which then uplinked through a T-1.

We were actually pretty decent, at that point in the net's evolution... except when netPhone came out... it was UDP, you see, and disrespected TCP flow control, and 4 people melted my entire link.

I didn't like to do it, but I had to ban it.)