LWN.net Logo

The CHOKe packet scheduler

The CHOKe packet scheduler

Posted Jan 13, 2011 10:26 UTC (Thu) by marcH (subscriber, #57642)
Parent article: The CHOKe packet scheduler

Yet another brilliant summary, thanks. Two comments.

The only congestion signal available to TCP is packet loss for a practical reason: *inter*-networking. Among other design goals, the *Inter*net was meant to connect different type of networks together. Now guess which congestion signal "technology" is supported across every single type of network, even the most primitive ones?
Dropping packets is also consistent with the End To End principle, which states the network should be as dumb and stateless as possible for a number of reasons, not the least scalability.

I do not think CHOKe is related to bufferbloat. Bufferbloat is quite obviously about unreasonable queue sizes (more than 10 milliseconds), while CHOKe or RED should also apply to manage queues of reasonable size. In other words, bufferbloat is a plain bug while queue management is an optimization. Queues of unreasonable sizes should not just be "managed", they should first of all be made smaller.

I suspect that Jim Gettys' opinion might differ on this latter point. Unfortunately his summary writing skills do not seem as good as LWN's and my life is too short.


(Log in to post comments)

The CHOKe packet scheduler

Posted Jan 13, 2011 19:53 UTC (Thu) by njs (guest, #40338) [Link]

> Bufferbloat is quite obviously about unreasonable queue sizes (more than 10 milliseconds), while CHOKe or RED should also apply to manage queues of reasonable size

Bufferbloat is about unreasonable *average* queue sizes; if you receive 100 ms worth of traffic in a lump every 100 ms, then you aren't oversubscribed at all, and the right thing to do is to buffer all of it. (In general, the proper queue size grows with the RTT of the flows, which may be *way* above 10 ms.) So it makes sense to have a reasonably large queue; the point of AQM algorithms is to make sure that this queue is only used to smooth out burstiness, instead of becoming a simple delay line.

IIUC.

The CHOKe packet scheduler

Posted Jan 13, 2011 22:07 UTC (Thu) by marcH (subscriber, #57642) [Link]

If you receive 100 ms worth of traffic in a lump and buffer them all then the last packets to go out will suffer a 100 ms extra delay (just on this link!). Unless you do not care a bit about latency, you do not want that. What you want is to drop a large number of these packets so a similar burst does not happen again.

Note: since TCP is ACK-clocked, it is not bursty at all.

> In general, the proper queue size grows with the RTT of the flows,

I do not see why. The RTT matters only for end to end buffers.

The CHOKe packet scheduler

Posted Jan 13, 2011 22:55 UTC (Thu) by Shewmaker (subscriber, #1126) [Link]

> Note: since TCP is ACK-clocked, it is not bursty at all.

How do you define a burst? If it is more than one packet
sent without waiting between them, then isn't the window
size of a TCP connection its burst size?

Of course, a qdisc or something at a lower level may
break up a window's worth of packets. Still, saying
TCP is not bursty at all doesn't seem accurate.

The CHOKe packet scheduler

Posted Jan 14, 2011 6:48 UTC (Fri) by marcH (subscriber, #57642) [Link]

> How do you define a burst? If it is more than one packet
sent without waiting between them, then isn't the window
size of a TCP connection its burst size?

TCP does not send a receive window size at a time because it is also constantly limited by the congestion window. Sending is then regulated by the reception of ACK (one every two packets).

So a (single!) TCP connection is practically never bursty in normal conditions.

The CHOKe packet scheduler

Posted Jan 14, 2011 2:20 UTC (Fri) by njs (guest, #40338) [Link]

> If you receive 100 ms worth of traffic in a lump and buffer them all then the last packets to go out will suffer a 100 ms extra delay (just on this link!). Unless you do not care a bit about latency, you do not want that. What you want is to drop a large number of these packets so a similar burst does not happen again.

Dropping packets tells the sender to slow down. But in this case, the sender is already sending at the proper speed! You don't want them to reduce throughput, you just want them to smooth out their sending. But dropping packets doesn't tell them to do that, it just tells them to slow down.

Note that if they smoothed out their sends, so you got 1 ms of traffic every 1 ms, then that last packet would just get sent 100 ms later. There's no unnecessary latency being added here.

> I do not see why.

To tell the truth, I'm not sure either, but I'm quoting Van Jacobson et al (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2...) so I believe it. Perhaps someone will come along shortly to explain better.

The CHOKe packet scheduler

Posted Jan 14, 2011 6:53 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Dropping packets tells the sender to slow down. But in this case, the sender is already sending at the proper speed! You don't want them to reduce throughput, you just want them to smooth out their sending. But dropping packets doesn't tell them to do that, it just tells them to slow down.

TCP sending is smooth by design, check the literature.

If it is not TCP, yes you are right this might be too drastic and harm throughput. But it's only because your application does not behave. And it will preserve latency. And the bandwidth lost by killing the burst might be reused by better behaved applications.

The CHOKe packet scheduler

Posted Jan 14, 2011 9:03 UTC (Fri) by marcH (subscriber, #57642) [Link]

> If it is not TCP, yes you are right this might be too drastic and harm throughput. But it's only because your application does not behave.

By the way, if you need to smooth UDP-like traffic then DCCP has been designed expressly for you (Datagram Congestion Control Protocol).

The CHOKe packet scheduler

Posted Jan 13, 2011 22:50 UTC (Thu) by paulj (subscriber, #341) [Link]

Yes, buffer-bloat is just a plain bug. RED and CHOKe are means to tickle sender TCP's congestion control to activate in a more progressive fashion, which is more network friendly and less likely to cause those TCPs to synchronise (i.e. all back off at same time, and all ramp up again at same time) if congestion is applied uniformly to most flows. As the queue size increases above the min-threshold, the probability of dropping a newly arrived packet linearly increases, until it reaches 1 at the max-threshold.

The problem with fixing buffer-bloat is finding an economic justification for reducing buffers. Other than in quite high-rate routers, memory for buffering is generally cheap and there's little economic incentive to not over-spec buffers. The crux of the problem is that it is not entirely clera what the correct smallest size is. Indeed that optimal size may vary for different deployments. If you make the buffers too small, your router will under-perform - especially in benchmarks in high-bandwidth settings. Making them too large OTOH is unlikely to cost you sales: few people benchmark performance in real-world scenarios, with congestion - except network congestion researchers.

The CHOKe packet scheduler

Posted Jan 14, 2011 6:59 UTC (Fri) by marcH (subscriber, #57642) [Link]

> The crux of the problem is that it is not entirely clera what the correct smallest size is. Indeed that optimal size may vary for different deployments. If you make the buffers too small, your router will under-perform - especially in benchmarks in high-bandwidth settings. Making them too large OTOH is unlikely to cost you sales: few people benchmark performance in real-world scenarios, with congestion - except network congestion researchers.

Agreed that the exact *optimal* size is not clear. However this is not an excuse for unreasonable sizes that harm latency with NO throughput benefit.

This research topic is *not* new! This 2004 paper demontrates that just 1ms (!) is enough: http://portal.acm.org/citation.cfm?id=1015499

The CHOKe packet scheduler

Posted Feb 27, 2011 6:10 UTC (Sun) by gmaxwell (subscriber, #30048) [Link]

It's absolutely _trivial_ to demonstrate that 1ms is not unconditionally enough.

Take a long pipe with a several ms of delay. Run a single TCP flow across it. Observe that your flow gets nowhere near line rate, but instead it sawtooths against line rate and leaves the link idle for a significant amount of time.

Yes, a single flow is a corner caseĀ— but not not an outrageous one. The behavior also holds true for a small number of flows, especially if they experience identical end to end delays.

So there is the excuse you were missing.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds