A damp discussion of network queuing
When Jim Gettys discovered and named bufferbloat some years ago, he had stumbled onto a set of problems that networking developers had been aware of for years. But nobody quite understood the extent of the problem or how it could affect everyday networking. In short, bufferbloat comes about when one or more players in the networking pipeline buffer far more data than they should. The user-visible results can include degraded download speeds, uploading not working at all, and high latencies — all with no packet loss.
The latency issue is problematic at a number of levels. If you are trying
to provide remote display service, 15ms latencies will ruin the usability of the
system. At 100ms delay, voice over IP protocols cease to work well. Users
will generally hit the "reload" button (or give up) on a web page load
after about one
second. Bufferbloat can create latencies far longer than that. But it's
the lack of packet loss that is the real problem; without dropped packets,
the TCP congestion control algorithms cannot do their job. So the
networking stack keeps trying to send more data when the proper response is
to slow down and let the queues drain.
So who is to blame for the bufferbloat problem? One possible response, Stephen said, was to blame Linux. After all, Windows XP limited TCP connections to 64KB of outstanding data; there is not much buffering happening there. Windows 7, instead, added a rate limiter that would throttle all connections. Android has done something similar, adding a limit on the size of the receive window for any connection. Linux developers tend not to be enamored with artificial limits, so Linux users get to experience the full pain of the bufferbloat problem.
An alternative, he said, is to blame the customer. That is why Internet service providers like Comcast went through a period of blaming its biggest customers for its networking problems. Comcast went as far as capping bandwidth use and charging extra in some cases. But the real problem was not those customers, it was bufferbloat in their internal network.
Getting wet
At this point the aquatic games began. Stephen put together a set of
demonstrations where a network queue was represented by an inverted plastic
bottle. The bottle could hold a fair amount of water (packets), but there
are limits on how quickly the water can drain out. So if water arrives
more quickly than the bottle can drain, the bottle begins to fill. If the
bottle is quite full, a drop of water added at the top will take a long
time to reach the opening and exit the bottle — especially if the bottle is
large. Bufferbloat, thus, was represented as bottlebloat.
In the real world, network queuing systems are more complicated than a single bottle, though. The default queuing discipline in Linux employs three parallel bottles of varying sizes; one for bulk traffic, one for high-priority traffic, and one for everything else ("normal" traffic). Almost all traffic goes through the normal bottle; SSH can use the high-priority queue, while Dropbox and BitTorrent use the bulk queue. It was an OK idea for its time, Stephen said, but it does not work on today's net. Those three bottles do nothing to prevent excess buffering.
The first attempt to come up with a smarter solution was the RED queue management algorithm. RED was represented by poking a bunch of holes into a (red, naturally) bottle. Once the water level in the bottle goes above the holes, water escapes through those holes and is lost; that corresponds to dropping packets in the real world. Rather than dropping packets, RED can set the explicit congestion notification (ECN) bit in the TCP header, notifying the receiver that it needs to reduce the size of the receive window to slow down the connection. It's a nice idea, but the net broke it. Routers will drop packets with ECN set, or, worse, simply reset the bit. As a result, Linux "will play the game," Stephen said, but only if the other side initiates it. The networking developers just do not want to deal with complaints about dropped connections.
A different approach is called "hierarchical token bucket"; it looks like a bunch of small bottles all connected in parallel. Each type of traffic gets its own bottle (queue), and packets are dispatched equally from all queues. The problem with this mechanism is that it requires a great deal of configuration to work well. That might be manageable on a server with a static workload, but it is not useful on desktop systems.
An alternative is stochastic fair queuing (SFQ). The same set of small bottles is used, but each network connection is assigned to its own bottle by way of a hash function. No configuration is required. SFQ can make things work better, but it is not a full solution to the bufferbloat problem; it was the state of the art in the Linux kernel about five years ago.
In an attempt to come up with a smarter solution, Kathie Nichols and Van
Jacobson created the "Controlled Delay" or CoDel algorithm. CoDel looks
somewhat like RED, in that it starts to drop packets when buffers get too
full. But CoDel works by looking at the packets at the head of the queue —
those which have been through the entire queue and are about to be
transmitted. If they have been in the queue for too long, they are simply
dropped. The general idea is to try to maintain a maximum delay in the
queue of 5ms (by default). CoDel has a number of good properties, Stephen
said; it drops packets quickly (allowing TCP congestion control algorithms
to do their thing) and maintains reasonable latencies. The "fq_codel"
algorithm adds an SFQ dispatching mechanism in front of the CoDel queue,
maintaining fairness between network connections.
Stephen noted that replacing a queue with something like fq_codel is a good thing, but one should remember that there are a lot of queues in a typical system. Only the one with the smallest hole (the slowest outgoing link) matters in the end, since that's where the packets will accumulate.
After a discussion of how most network benchmarking utilities look at the wrong thing (one should examine upload speed, download speed, and latency simultaneously), he put up a set of plots showing how the network responds to load with the various queuing mechanisms. The results clearly showed the CoDel solves the bufferbloat problem well, and that fq_codel does even better.
So, are we there yet? As noted, there are a lot of queues in a typical network path, and not all of them have been addressed. Different techniques are needed at different levels. For excessive buffering at the socket layer, for example, TCP small queues can be used. A bigger problem is Internet service providers, which tend to have large amounts of legacy equipment in their racks. There is not a lot the networking developers can do about that. Still, it helps to be running the best software locally. So Stephen encouraged everybody to run a command like:
sysctl -w net.core.default_qdisc=fq_codel
That will cause fq_codel to be used for all future connections (up to the next reboot). Unfortunately, the default queuing discipline cannot be changed, since it will certainly disturb some user's workload somewhere.
The good, the bad, and the ugly
Stephen concluded by saying that there are good, bad, and ugly parts to bufferbloat and the efforts to solve it. On the good side, the industry is aware of the problem. Bufferbloat is routinely talked about at IETF meetings, and researchers are working on it. Perhaps best of all, the solutions are all open source. In some cases (CoDel for example), open-source publication was deliberately chosen to forestall the adoption of patent-encumbered techniques.
On the bad side, there is a lot of legacy equipment and software out there. Original equipment manufacturers, Stephen said, are focused on cost, not on queue management details. So a lot of equipment out there — especially consumer-level equipment — is bad and will stay that way for some years yet. There are also issues with backbone congestion, but they tend to be more political than technical in nature.
The ugly part is wireless networking, which has a bunch of unique buffering problems of its own. Packet aggregation, for example, can help with bandwidth, but creates latency problems. Wireless systems are mostly using proprietary software and are never updated. Standards bodies are starting to pay attention, Stephen said, but a solution in this area is distant.
Even with the bad and ugly parts, though, the message was mostly positive,
if a bit damp. Quite a bit of good work has been done to address the
bottlebufferbloat problem, and that work has shown up
first as free software. Bufferbloat will be with us for a while yet, but
solutions are far closer than they were a few years ago.
Index entries for this article | |
---|---|
Kernel | Networking/Bufferbloat |
Conference | Linux Plumbers Conference/2014 |
Posted Oct 15, 2014 21:55 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Oct 15, 2014 22:22 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (3 responses)
Like I already wrote a million times, a single thing would be enough for things to change and improve much quicker: if http://www.speedtest.net/ (or similar) were to finally (re-)implement MTR. Simply because zillions of speedtest screenshots are posted and made available daily. While cost is one focus, losing customers because of bad reputation is a very important other.
netalyzr comes the closest but still way too technical for the masses.
However I'm afraid speedtest.net on the one hand and bufferbloat researchers and coders on the other hand live in two different parallel universes.
Posted Oct 16, 2014 9:23 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link]
They do already have IPv6 speeds for example (even though 99% of the UK still doesn't have IPv6 in 2014), and they show both a single link download rate and a multi-HTTP download which can indicate if there is something throttling specific TCP connections rather than your bandwidth as a whole.
Posted Oct 17, 2014 3:44 UTC (Fri)
by mtaht (subscriber, #11087)
[Link] (1 responses)
That change to these tests alone would send a message to millions of people that there is a tradeoff between bandwidth and latency that currently is biased far, far, far too much towards the bandwidth side of the equation.
Relevant thread is here, letter in progress, multiple folk have agreed to sign. (sorry for the busted cert)
https://lists.bufferbloat.net/pipermail/bloat/2014-Septem...
Posted Oct 26, 2014 16:02 UTC (Sun)
by marcH (subscriber, #57642)
[Link]
As people already answered there, to make it work you would also need a quick and dirty "netalizer lite" site that does only one thing and explains it and does it well: measuring latency while downloading and uploading. Only after such a site is implemented and published under the name http://www.speedtest-is-telling-lies.net would the open letter (published on the same site) stand a chance to make any difference.
Posted Oct 16, 2014 0:29 UTC (Thu)
by rfunk (subscriber, #4054)
[Link] (8 responses)
Posted Oct 16, 2014 5:11 UTC (Thu)
by cglass (guest, #52152)
[Link] (7 responses)
Posted Oct 16, 2014 7:07 UTC (Thu)
by mtaht (subscriber, #11087)
[Link] (3 responses)
1) fq_codel looks more like "DRR++" + codel. A SFQ + codel implementation exists in ns2 and ns3, but not linux as yet.
2) It is now on by default in openwrt Barrier Breaker, and part of CeroWrt's SQM system, openwrt's qos-scripts, and many other third party router firmwares. It's also part of qualcomm's streamboost and netgear's dynamic QoS in their new X4 product.
The effects of applying a qos script enabled with fq_codel on a router against nearly every ISP technology is outstanding. Cable result: http://snapon.lab.bufferbloat.net/~cero2/jimreisert/resul...
3) It "does no harm" on servers and clients, and can often do good. I'd like it very much if a desktop oriented distro tried a switch by default. There are some exceptions, notably a good hi precision clock source is needed.
4) While the effects on wireless are not as good as we'd like, it does take the edge off the worst of the problems there.
Try it! On any modern linux you can toss that sysctl into an /etc/sysctl.d/bufferbloat.conf file....
Posted Oct 17, 2014 17:17 UTC (Fri)
by nix (subscriber, #2304)
[Link] (2 responses)
Hm actually I think you mentioned this a few months ago on the cerowrt list. Great minds think alike etc etc.
Posted Oct 17, 2014 19:09 UTC (Fri)
by mtaht (subscriber, #11087)
[Link] (1 responses)
http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes...
I do hope that the more companies realize that BQL support is essential to high performance (I'm looking at *you*, Arm, Cisco, AMD, and Xilinx and a dozen others), the more BQL drivers (with xmit_more support) will land on everything. Certainly nearly all the 10GigE makers "get it", but that knowledge has not fully propagated down into the older and slower devices...
http://www.bufferbloat.net/projects/bloat/wiki/BQL_enable...
There is a paper in progress on how much BQL helps - answer, quite a lot - while we (in the bufferbloat world) know this, that sort of stuff needs to land on CTO and academic and driver developer desks.
I wrote up some issues are in adding BQL to a device driver here, I had planned to write a tutorial but haven't got around to it.
https://lists.bufferbloat.net/pipermail/bloat/2014-June/0...
So far as I recall the via rhine was updated to BQL recently, but will check.
Posted Oct 21, 2014 16:25 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Oct 19, 2014 14:33 UTC (Sun)
by BenHutchings (subscriber, #37955)
[Link] (2 responses)
Posted Oct 21, 2014 18:12 UTC (Tue)
by jheiss (subscriber, #62556)
[Link] (1 responses)
> sysctl net.core.default_qdisc
On one with 3.14 from backports:
> sysctl net.core.default_qdisc
Posted Oct 21, 2014 19:29 UTC (Tue)
by BenHutchings (subscriber, #37955)
[Link]
tc qdisc replace dev eth0 root fq_codel
Posted Oct 16, 2014 5:46 UTC (Thu)
by krivenok (guest, #42766)
[Link] (3 responses)
Posted Oct 19, 2014 6:47 UTC (Sun)
by sitaram (guest, #5959)
[Link] (2 responses)
Posted Oct 19, 2014 13:32 UTC (Sun)
by corbet (editor, #1)
[Link] (1 responses)
Posted Oct 19, 2014 23:28 UTC (Sun)
by sitaram (guest, #5959)
[Link]
Posted Oct 16, 2014 8:03 UTC (Thu)
by iq-0 (subscriber, #36655)
[Link] (14 responses)
But TCP still reacts bad to some hop along the way performing bad buffering, which can be seen as significantly increased latency. Wouldn't it be just as helpful if (for TCP) you'd focus more on changes in latency in addition to watching for dropped packets?
This would not be so different as to how certain bittorrent clients automatically limit their upload/download speeds to prevent either one from clogging up the other (and even other network traffic).
The "optimum" latency is connection specific (or at least specific between endpoints) and could change over time, but the change in latency is probably more relevant than the actual latency itself for such an algorithm to work.
Posted Oct 16, 2014 8:42 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
Posted Oct 16, 2014 9:38 UTC (Thu)
by iq-0 (subscriber, #36655)
[Link] (2 responses)
You need to do both of course (since you don't necessarily have bufferbloat problems). A significant increase could be considered equal to a dropped packets, thus as an indication of congestion (only not involving a retransmit obviously).
The big problem would be to identify when the latency has increased *not* due to bufferbloat along the path.
This is of course not perfect, but by dynamically reacting to increased latency along the path (as a possible indication of buffer bloat) that you probably can't fix, you can prevent yourself from contributing to the problem.
And while you will probably loose to people not playing nicely, your network responsiveness will probably increase, which is often just as relevant for the perceived network quality.
Posted Oct 16, 2014 9:47 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
But even if only latency-based actors are active on the network then there are still pathological behaviors when one actor can accidentally hog others' bandwidth. Or wild oscillations of the bandwidth that require explicit dampening that defeats the purpose of latency-based control.
Quite a few algorithms have been tried since the introduction of TCP timestamp option, but as far I know none of them helped much. I think people even tried to use a neural network predictor for the window size (it helped but required too much computational capacity) - http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber... .
Posted Oct 17, 2014 4:35 UTC (Fri)
by mtaht (subscriber, #11087)
[Link]
AQM (codel in this case) regulates TCP and encapsulated traffic better. Nearly nobody advocates codel by itself. (Many DO advocate AQM by itself.)
FQ solves multiple flows competing against each other. It's way more
(for eloquent defense of the properties ofFQ, see: https://www.lincs.fr/events/fair-flows-fair-network-attra...)
ECN solves packet loss being the sole indicator of congestion problem. ECN can now be negotiated with 60% or more of the alexa top 1m.
A delay based TCP would work fairly well if deployed universally (but it is impossible to have a flag day!), however the bottleneck router has a more intimate view of current congestion and can do smarter things than a TCP. Most delay based TCPs (and uTP) start backing off at 100ms of induced delay which is rather late. There is some very interesting new work out there on bufferbloat aware TCP's, notably "fq + pacing".
Furthermore, FQ+AQM technology can (and is) be deployed incrementally.
With a FQ'd network, delay based TCPs *could* begin to back off long before 100ms of buffering is reached.
A delay based AQM works on all traffic and turns delay based TCPs back into packet loss or ECN marked congestion control.
http://perso.telecom-paristech.fr/~drossi/paper/rossi13tm...
I'm hoping that clears matters up a bit.
Posted Oct 17, 2014 3:34 UTC (Fri)
by mtaht (subscriber, #11087)
[Link] (9 responses)
A lot of people are wedded to the idea that pfifo_fast's prioritization features accomplish useful stuff, that is rare in today's environment where multiple, unclassified TCP flows dominate. Thus, I encourage people to turn on fq_codel by default everywhere and run a few tests... I certainly have been running fq_codel everywhere for 2+ years now, and my network is measurably faster, smoother and less annoying under load.
Things like usb networking, and wifi improve quite a a bit. Ethernet gets better - (way better if you have BQL in your drivers)
Here's a 100Mbit result on ethernet at 100Mbit, baseline pfifo, fq_codel/fq and fq_codel with w/wo BQL:
http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes...
GigE with pfifo vs fq_codel with and without offloads.
http://snapon.lab.bufferbloat.net/~cero2/nuc-to-puck/resu...
Still, once you have all the above working, prioritization (notably deprioritization) can help a bit more, and there is work on a new qdisc (tenatively called cake) that adds a few tiers of prioritization on top of DRR + fq_codel as well as an optional rate limiter.
There is also a specific-to-servers sch_fq in mainline linux now.
None of the above means that a given distro should continue to wait to switch away from pfifo_fast to fq_codel as the default. There are nearly no circumstances where pfifo_fast has better network behavior than fq_codel.
https://kau.toke.dk/modern-aqms/
It was certainly my, and stephen hemminger's, and much of the bufferbloat communities' hope that distros would start to make the switch once the sysctl landed. We certainly continue to evolve things - sch_fq is now a very good choice for a bare metal web server (but not a vm), for example.
But I hope we've now made a set of compelling arguments that pfifo_fast must die!
Posted Oct 18, 2014 13:09 UTC (Sat)
by nix (subscriber, #2304)
[Link] (8 responses)
So... any scripting? (Not that I can really use it yet, since as mentioned previously my firewall's NIC doesn't have BQL support yet -- but in future, it would be nice if I could arrange for my networking gear to not be bufferbloated to death. I think this means fixing my firewall and ditching the ADSL routers it's connected to and replacing them with cerowrt-capable routers or something like that -- which would be beneficial anyway, since I'd be able to have them reliably communicate the state of the ADSL link to the firewall, which could pull up/down the components of the multipath route appropriately. Right now I'm relying on horrible hacks like looking at passing inbound packets and *hoping* they come from the Internet rather than, say, the ADSL router's administrative interface, and falling back on periodic pings if none are seen... all quite horrible.)
Posted Oct 21, 2014 4:57 UTC (Tue)
by dlang (guest, #313)
[Link] (7 responses)
What are you trying to setup? fq_codel and BQL can be setup at compile time and need no scripts to function.
Cerowrt has additional configuration scripts for other benefits (artificially limiting outbound traffic to make this box the bottleneck rather than allowing an upstream router to be the bottleneck, and working to limit inbound bandwidth usage), these are the things that are hard to setup.
But just enabling fq_codel helps, as does BQL, and they work well when combined.
Posted Oct 21, 2014 16:27 UTC (Tue)
by nix (subscriber, #2304)
[Link] (6 responses)
I used to use wondershaper for this but it's completely bitrotted and doesn't work at all any more.
Posted Oct 22, 2014 4:16 UTC (Wed)
by dlang (guest, #313)
[Link] (5 responses)
The difficulty for outbound traffic is in automating the discovery of what you're available bandwith is.
Posted May 15, 2015 0:23 UTC (Fri)
by nix (subscriber, #2304)
[Link] (4 responses)
commit 92bf200881d978bc3c6a290991ae1f9ddc7b5411
net: via-rhine: add BQL support
Add Byte Queue Limits (BQL) support to via-rhine driver.
[edumazet] tweaked patch and changed TX_RING_SIZE from 16 to 64
Signed-off-by: Tino Reichardt <milky-kernel@mcmilk.de>
So those of us with Rhine-based NICs finally have access to the codel and fq_codel goodness. And what goodness! It's as zero-configuration as advertised: with no configuration at all, even without any outbound traffic shaping and with unaltered probably bloated-as-hell queues inside the ADSL modems, all my bufferbloat symptoms have silently vanished and my line is smooth and usable under load, with ping times only a few ms up on a saturated line as on an idle one. Now that's a *nice* qdisc!
mtaht et al have done a really really *really* good job here. I can see why every distro is jumping on this as fast as they can.
Posted May 16, 2015 17:55 UTC (Sat)
by mtaht (subscriber, #11087)
[Link] (3 responses)
However I must note that your excellent result was probably due to fq_codel taking advantage of hardware flow control exerted by the DSL modem, which then is seen by fq_codel as delay and managed appropriately, where pfifo_fast would just keep buffering until it hits the packet limit.
Most edge devices today do not exert hardware flow control. Certainly I feel that {dsl,cable}modems should use hardware flow control! It is a sane signal that can also be aware of congestion on media. But nearly everybody put switches, rather than ethernet devices in the path here, over the past 5 years, and lost that capability. So we have generally have had to use software rate limiting (sqm-scripts) to succeed here... or to push for more hardware flow control (or smarter modems)
Posted May 19, 2015 19:23 UTC (Tue)
by nix (subscriber, #2304)
[Link] (2 responses)
Thank you for a most excellent qdisc, anyway!
Posted May 19, 2015 19:40 UTC (Tue)
by dlang (guest, #313)
[Link] (1 responses)
they are commonly a system running linux with an ADSL modem and an ethernet connected to a 4-port switch.
Unfortunately they usually are using a binary driver for the DSL side, so getting them supported by OpenWRT is hard :-(
If they were based on the current OpenWRT instead of a several-year-old one, they would be using fw_codel by default.
Posted May 20, 2015 10:15 UTC (Wed)
by paulj (subscriber, #341)
[Link]
Posted Oct 16, 2014 10:02 UTC (Thu)
by rvolgers (guest, #63218)
[Link] (3 responses)
This is why we can't have nice things, Linux. Sheesh. Guess it's up to distros to change the default.
Posted Oct 18, 2014 13:09 UTC (Sat)
by thestinger (guest, #91827)
[Link] (2 responses)
http://lists.freedesktop.org/archives/systemd-devel/2014-...
Posted Oct 21, 2014 21:21 UTC (Tue)
by bronson (subscriber, #4806)
[Link] (1 responses)
Posted Oct 21, 2014 21:26 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Oct 16, 2014 11:32 UTC (Thu)
by Richard_J_Neill (subscriber, #23093)
[Link] (2 responses)
1. Wifi which is limited by interference, rather than by congestion. i.e.
2. Ajax stuff over https. The problem here is that each Ajax connection (after Apache has finished the keepalive) requires a complete cycle of re-establishing all the encryption layers, even if the actual data is only tiny. It would be useful to have some way to keep an https session alive for many minutes.
(The combination of Ajax, https, and slightly non-ideal wifi results in a horrible experience!)
Posted Oct 16, 2014 11:58 UTC (Thu)
by JGR (subscriber, #93631)
[Link]
Much of the interference/noise is traffic for other APs or traffic for other 2.4GHz protocols. Sending even more data makes the noise problem worse.
If you're getting 50% packet loss, then you'd be better off fixing that rather than trying to work round with client fudges (i.e. change radio channel, move/add APs, change/move antennae, etc.).
As for 2, as I understand it HTTP 2 solves this.
Posted Oct 17, 2014 15:13 UTC (Fri)
by grs (guest, #99211)
[Link]
Posted Oct 16, 2014 12:25 UTC (Thu)
by michich (guest, #17902)
[Link] (5 responses)
Posted Oct 16, 2014 12:35 UTC (Thu)
by TomH (subscriber, #56149)
[Link] (3 responses)
Only I tried more or less exactly that last night on a Fedora 20 machine and have so far failed to get it to work.
First off the "all connections" thing, which I know came from the original article, makes no sense, as that setting applies to network interfaces, not to individual connections.
So in order to take effect it needs to be set at the point when an interface is created - well actually I think from my current testing that it needs to be set when the interface is first brought up.
What I found when I set that in a dropin in /etc/sysctl.d and rebooted was that not only did it not get applied to my interfaces, but it didn't even manage to actually change the sysctl value! Oddly restarting systemd-sysctl.service did at least cause the sysctl value to be changed, but of course by then it is too late to affect the interfaces which are already up.
My current hypothesis (based on complete guesswork - need to look at the kernel source next) is that systemd-sysctl.service is setting it too early when it runs during the boot and something is resetting it afterwards.
Posted Oct 16, 2014 13:38 UTC (Thu)
by michich (guest, #17902)
[Link] (2 responses)
Posted Oct 16, 2014 20:30 UTC (Thu)
by TomH (subscriber, #56149)
[Link] (1 responses)
That is, as best I can see, exactly what is happening to me. If I add a file in /etc/module-load.d to make sure sch_fq_codel is preloaded then everything works.
Why the module is failing to load when triggered by the kernel as a result of the systctl write is not clear however.
Posted Oct 16, 2014 20:41 UTC (Thu)
by TomH (subscriber, #56149)
[Link]
So your proposed chage to systemd is fine, it will just need a corresponding change in the Fedora selinux policy.
Posted Oct 17, 2014 3:37 UTC (Fri)
by mtaht (subscriber, #11087)
[Link]
There are (a very few) caveats to switching away from pfifo_fast. Please feel free to contact us over at cerowrt-devel or the codel list to discuss.
More importantly, run your own benchmarks against the results (measuring latency simultaneously with load), or try ours (netperf-wrapper's rrul tests in particular),
Posted Oct 17, 2014 9:53 UTC (Fri)
by paulj (subscriber, #341)
[Link] (5 responses)
I believe jg has explained that on LWN before, e.g. see comments in http://lwn.net/Articles/418918/ .
Posted Oct 20, 2014 7:55 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (4 responses)
http://thread.gmane.org/gmane.linux.network/6366/focus=11785
However I think this got eventually lost in the "agitation of life"; unlike jg I had neither fame nor a very catchy name for it :-)
---
(Good) words are incredibly important, it's funny how so many engineers don't realize they make all the difference.
http://martinfowler.com/bliki/TwoHardThings.html
Posted Oct 20, 2014 8:18 UTC (Mon)
by paulj (subscriber, #341)
[Link] (3 responses)
It's amazing that the default txqueuelen got bumped up to 1000 for all interfaces. It's even more amazing that this is *still* the default, even on wifi devices 10 years later. :(
1000 packet queues are just insane, even on high-speed links.
I've had "for H in <list of devices> ; do ip link set dev $H qlen 5; done" in my rc.local for quite a while. Unfortunately though it doesn't apply to devices brought up post-boot by, e.g., NetworkManager. I havn't yet looked into how to make NM set the qlen.
Posted Oct 23, 2014 2:54 UTC (Thu)
by dcbw (guest, #50562)
[Link] (2 responses)
http://cgit.freedesktop.org/NetworkManager/NetworkManager...
more information in 'man NetworkManager'.
Posted Dec 1, 2014 20:57 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Posted Dec 1, 2014 22:51 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Posted Oct 20, 2014 4:28 UTC (Mon)
by fmarier (subscriber, #19894)
[Link] (1 responses)
Interestingly enough, the bufferbloat.net wiki recommends
Posted Oct 20, 2014 15:41 UTC (Mon)
by mtaht (subscriber, #11087)
[Link]
fq_codel is a good general purpose default, no matter the workload.
sch_fq is better on servers "for tcp heavy workloads". It has been tuned for >10GigE, in particular, and does some really nice stuff with pacing.
On hosts, at lower speeds, on reverse traffic, it's not clearcut, and it seems to be a lose on wifi presently (but wifi has many other problems), and it's the wrong thing on routers entirely.
Please note I'd be just as happy if either one became the linux default and pfifo_fast went the way of the dodo.
I'd be happiest if that the "right" qdisc was chosen always, and more work went into choosing sane defaults for things like tcp small queues, txqueuelen (if you must stick with pfifo_fast), TSO/GSO sizes, etc for when you are running at rates below 10GigE.
Posted Nov 13, 2014 13:02 UTC (Thu)
by TimSmall (guest, #96681)
[Link]
This made me wonder if the default Linux setting of "Enable ECN when requested by incoming connections but do not request ECN on outgoing connections." should be changed?
It will be interesting to see if MS stick with this on-by-default behaviour in the next release of Windows Server and/or push it into their desktop releases - a quick web search shows the new default in Windows Server 2012 has caused problems for at least one user:
http://hardforum.com/showthread.php?t=1805750
My assumption is that most things which currently break with ECN are NAT and firewall boxes, which lead me to wondering whether Linux should request ECN by default for outgoing IPv6 connections, since I hope fewer IPv6 connections play badly with ECN.
At the moment, the ECN behaviour of Linux IPv6 TCP is controlled by the value of the sysctl variable net.ipv4.tcp_ecn (which in itself is a bit surprising) - and there's no way to control IPv6 behaviour independently of IPv4 behaviour.
A damp discussion of network queuing
A damp discussion of network queuing
thinkbroadband.com
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
There are some exceptions, notably a good hi precision clock source is needed.
IIRC, fq_codel also needs BQL support in the NIC driver. Some embedded firewall boxes (in my case, the Soekris net5501) have NICs such as the VIA Rhine for which BQL is not implemented yet. (There are old patches for the Rhine, but nothing for recent kernels that I know of.)
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
sysctl: cannot stat /proc/sys/net/core/default_qdisc: No such file or directory
net.core.default_qdisc = pfifo_fast
A damp discussion of network queuing
It was one of the most exciting sessions at the conference. A great example of how to give a great talk. Thanks again Stephen!
A damp discussion of network queuing
A damp discussion of network queuing
As far as I can tell, only the plenary sessions in the big room were videotaped. The Plumbers folks had wanted to do video for the LPC sessions, but the cost was prohibitive.
Video
Video
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
While current round-trip times are higher than the reference round-trip time you slowly increase the base round trip-time and consider the connection congested.
But when the current round-trip time is lower than the reference round-trip time you reset the base round-trip time and consider the link no longer congested.
A damp discussion of network queuing
The problem is that if there's at least _one_ other packet-drop-based actor on the network then they would use up all the bandwidth.
A damp discussion of network queuing
effective than AQM in most circumstances. There are a lot of
FQ advocates that think FQ solves everything - but I'm not one of them. No matter how many buckets you have for flows, it still pays to keep queue lengths as short as possible)
A damp discussion of network queuing
http://snapon.lab.bufferbloat.net/~d/nuc-client/results.html
(It is my hope the new xmit_more bulking patches lessen the need for TSO/GSO/GRO offloads on gigE devices)
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
Author: Tino Reichardt <milky-kernel@mcmilk.de>
Date: Tue Feb 24 10:28:01 2015 -0800
Tested-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A damp discussion of network queuing
A damp discussion of network queuing
I am glad to see the rhine patches finally landed. There were a few popular devices used as firewalls that used that chipset.
Indeed there were (though I don't know if the Soekris net5501 I use could ever be defined as 'popular' except among the geekiest crowd and those who want to network up oil rigs.)
However I must note that your excellent result was probably due to fq_codel taking advantage of hardware flow control exerted by the DSL modem
That's what I presumed. Nothing else could explain how it managed to figure out my ADSL bandwidth given the total absence of any other way to detect it.
But nearly everybody put switches, rather than ethernet devices in the path here, over the past 5 years, and lost that capability.
Yeah. I guess if your modem has only one port, and you don't have a dedicated multi-port firewall box, that's all you can really do... I wonder: are the very common 'four-port ADSL modems' actually an ADSL modem and a switch in the same box? If so, I guess they're eschewing flow control too, right? :(
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
What about forward-error-correction?
where a significant fraction of packets will get dropped, but not because of other traffic, but rather because of interference/noise or distance from the AP. TCP causes the client to back-off, when in fact, I think it should be more aggressive, and perform forward-error-correction: i.e. the client should assume that many packets will not make it, and it should re-transmit everything twice or more within 100ms. (This is especially true for UDP, eg for DHCP IP allocation over a network with 50% packet loss, it's nearly impossible to get the link established).
What about forward-error-correction?
When a packet is lost at the IP layer, one or more of its fragments have already failed to be received after a number of retransmissions.
What about forward-error-correction?
A damp discussion of network queuing
http://lists.freedesktop.org/archives/systemd-devel/2014-...
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
http://marc.info/?l=linux-netdev&m=108462579501312
http://oss.sgi.com/archives/netdev/2004-06/msg00917.html
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
A damp discussion of network queuing
So Stephen encouraged everybody to run a command like: sysctl -w net.core.default_qdisc=fq_codel
fq
instead for everything except routers:For host (rather than router) based queue management, we recomend sch_fq instead of fq_codel as of linux 3.12, for tcp-heavy workloads.
A damp discussion of network queuing
When to make ECN on-by-default for Linux (just on IPv6)?