By Jonathan Corbet
May 9, 2012
"Bufferbloat" can be thought of as the buffering of too many packets in
flight between two network end points, resulting in excessive delays and
confusion of TCP's flow control algorithms. It may seem like a simple
problem, but the simple solution—make buffers smaller—turns out not to
work. A true solution to bufferbloat requires a deeper understanding of
what is going on, combined with improved software across the net.
A new paper from
Kathleen Nichols and Van Jacobson provides some of that understanding and
an algorithm for making things better—an algorithm that has been
implemented first in Linux.
Your editor had a classic bufferbloat experience at a conference hotel last
year. An attempt to copy a photograph to the LWN server (using
scp) would consistently fail with a "response timeout" error.
There was so much buffering in the path that scp was able to
"send" the entire image before any of it had been received at the other
end. The scp utility would then wait for a response from the
remote end; that response would never come in time because most of the
image had not, contrary to what scp thought, actually been
transmitted. The solution was to use the -l option to slow down
transmission to a rate closer to what the link could actually manage. With
scp transmitting slower, it was able to come up with a more
reasonable idea for when the data should be received by the remote end.
And that, of course, is the key to avoiding bufferbloat issues in general.
A system transmitting packets onto the net should not be sending them more
quickly than the slowest link on the path to the destination can handle
them. TCP implementations are actually designed to figure out what the
transmission rate should be and stick to it, but massive buffering defeats
the algorithms used to determine that rate. One way around this problem is
to force users to come up with a suitable rate manually, but that is not
the sort of network experience most users want to have. It would be far
better to find a solution that Just Works.
Part of that solution, according to Nichols and Jacobson, is a new
algorithm called
CoDel (for "controlled delay"). Before describing that algorithm, though,
they make it clear that just making buffers smaller is not a real solution
to the problem. Network buffers serve an important function: they absorb
traffic spikes and equalize packet rates into and out of a system. A long
packet queue is not necessarily a problem, especially during the startup
phase of a network connection, but long queues as a steady state just add
delays without improving throughput at all. The point of CoDel is to allow
queues to grow when needed, but to try to keep the steady state at a
reasonable level.
Various automated queue management algorithms have been tried over the years; they
have tended to suffer from complexity and a need for manual configuration.
Having to tweak parameters by hand was never a great solution even in ideal
situations, but it
fails completely in situations where the network load or link delay time
can vary widely over time. Such situations are the norm on the
contemporary Internet; as a result, there has been little use of automated
queue management even in the face of obvious problems.
One of the key insights in the design of CoDel is that there is only one
parameter that really matters: how long it takes a packet to make its way
through the queue and be sent on toward its destination. And, in
particular, CoDel is interested in the minimum delay time over a
time interval of interest. If that minimum is too high, it indicates a
standing backlog of packets in the queue that is never being cleared, and
that, in turn, indicates that too much buffering is going on.
So CoDel works by adding a timestamp to each packet as it is received and
queued. When the packet reaches the head of the queue, the time spent in
the queue is calculated; it is a simple calculation of a single value, with
no locking required, so it will be fast.
Less time spent in queues is always better, but that time cannot always be
zero. Built into CoDel is a maximum acceptable queue time, called
target; if a packet's time in the queue exceeds this value, then the
queue is deemed to be too long. But an overly-long queue is not, in
itself, a problem, as long as the queue empties out again. CoDel defines a
period (called interval) during which the time spent by packets in
the queue should fall below target at least once; if that does not
happen, CoDel
will start dropping packets. Dropped packets are, of course, a signal to
the sender that it needs to slow down, so, by dropping them, CoDel should
cause a reduction in the rate of incoming packets, allowing the queue to
drain. If the queue time remains above target, CoDel will drop
progressively more packets. And that should be all it takes to keep queue
lengths at reasonable values on a CoDel-managed node.
The target and interval parameters may seem out of place in
an algorithm that is advertised as having no knobs in need of tweaking.
What the authors have found, though, is that a target of 5ms and an
interval of 100ms work well in just about any setting. The use of
time values (rather than packet or byte counts) makes the algorithm
function independently of the speed of the links it is managing, so there
is no real need to adjust them. Of course, as they note, these are early
results based mostly on simulations; what is needed now is experience
using a functioning implementation on the real Internet.
That experience may not be long in coming, at least for some kinds of
links; there is now a CoDel patch for Linux
available thanks to Dave Täht and Eric Dumazet. This code is likely to
find its way into the mainline fairly quickly; it will also be available in
the CeroWrt
router distribution. As the early CoDel implementation starts to see some
real use, some shortcomings will doubtless be encountered and it may well
lose some of its current simplicity. But it has every appearance of being
an important component in the solution to the bufferbloat problem.
Of course, it's not the only component; the problem is more complex than
that. There is still a need to look at buffer sizes throughout the stack;
in many places, there is simply too much buffering in places where it can
do no good. Wireless networking adds some interesting challenges of its
own, with its quickly varying link speeds and complexities added by packet
aggregation. There is also the little problem of getting updated software
distributed across the net. So a full solution is still somewhat distant,
but the understanding of the problem is clearly growing and some
interesting approaches are beginning to appear.
(
Log in to post comments)