The CoDel queue management algorithm
Your editor had a classic bufferbloat experience at a conference hotel last year. An attempt to copy a photograph to the LWN server (using scp) would consistently fail with a "response timeout" error. There was so much buffering in the path that scp was able to "send" the entire image before any of it had been received at the other end. The scp utility would then wait for a response from the remote end; that response would never come in time because most of the image had not, contrary to what scp thought, actually been transmitted. The solution was to use the -l option to slow down transmission to a rate closer to what the link could actually manage. With scp transmitting slower, it was able to come up with a more reasonable idea for when the data should be received by the remote end.
And that, of course, is the key to avoiding bufferbloat issues in general. A system transmitting packets onto the net should not be sending them more quickly than the slowest link on the path to the destination can handle them. TCP implementations are actually designed to figure out what the transmission rate should be and stick to it, but massive buffering defeats the algorithms used to determine that rate. One way around this problem is to force users to come up with a suitable rate manually, but that is not the sort of network experience most users want to have. It would be far better to find a solution that Just Works.
Part of that solution, according to Nichols and Jacobson, is a new algorithm called CoDel (for "controlled delay"). Before describing that algorithm, though, they make it clear that just making buffers smaller is not a real solution to the problem. Network buffers serve an important function: they absorb traffic spikes and equalize packet rates into and out of a system. A long packet queue is not necessarily a problem, especially during the startup phase of a network connection, but long queues as a steady state just add delays without improving throughput at all. The point of CoDel is to allow queues to grow when needed, but to try to keep the steady state at a reasonable level.
Various automated queue management algorithms have been tried over the years; they have tended to suffer from complexity and a need for manual configuration. Having to tweak parameters by hand was never a great solution even in ideal situations, but it fails completely in situations where the network load or link delay time can vary widely over time. Such situations are the norm on the contemporary Internet; as a result, there has been little use of automated queue management even in the face of obvious problems.
One of the key insights in the design of CoDel is that there is only one parameter that really matters: how long it takes a packet to make its way through the queue and be sent on toward its destination. And, in particular, CoDel is interested in the minimum delay time over a time interval of interest. If that minimum is too high, it indicates a standing backlog of packets in the queue that is never being cleared, and that, in turn, indicates that too much buffering is going on. So CoDel works by adding a timestamp to each packet as it is received and queued. When the packet reaches the head of the queue, the time spent in the queue is calculated; it is a simple calculation of a single value, with no locking required, so it will be fast.
Less time spent in queues is always better, but that time cannot always be zero. Built into CoDel is a maximum acceptable queue time, called target; if a packet's time in the queue exceeds this value, then the queue is deemed to be too long. But an overly-long queue is not, in itself, a problem, as long as the queue empties out again. CoDel defines a period (called interval) during which the time spent by packets in the queue should fall below target at least once; if that does not happen, CoDel will start dropping packets. Dropped packets are, of course, a signal to the sender that it needs to slow down, so, by dropping them, CoDel should cause a reduction in the rate of incoming packets, allowing the queue to drain. If the queue time remains above target, CoDel will drop progressively more packets. And that should be all it takes to keep queue lengths at reasonable values on a CoDel-managed node.
The target and interval parameters may seem out of place in an algorithm that is advertised as having no knobs in need of tweaking. What the authors have found, though, is that a target of 5ms and an interval of 100ms work well in just about any setting. The use of time values (rather than packet or byte counts) makes the algorithm function independently of the speed of the links it is managing, so there is no real need to adjust them. Of course, as they note, these are early results based mostly on simulations; what is needed now is experience using a functioning implementation on the real Internet.
That experience may not be long in coming, at least for some kinds of links; there is now a CoDel patch for Linux available thanks to Dave Täht and Eric Dumazet. This code is likely to find its way into the mainline fairly quickly; it will also be available in the CeroWrt router distribution. As the early CoDel implementation starts to see some real use, some shortcomings will doubtless be encountered and it may well lose some of its current simplicity. But it has every appearance of being an important component in the solution to the bufferbloat problem.
Of course, it's not the only component; the problem is more complex than
that. There is still a need to look at buffer sizes throughout the stack;
in many places, there is simply too much buffering in places where it can
do no good. Wireless networking adds some interesting challenges of its
own, with its quickly varying link speeds and complexities added by packet
aggregation. There is also the little problem of getting updated software
distributed across the net. So a full solution is still somewhat distant,
but the understanding of the problem is clearly growing and some
interesting approaches are beginning to appear.
| Index entries for this article | |
|---|---|
| Kernel | Networking/Bufferbloat |
