Sounds complicated. Perhaps and easier way to deal with it is to treat buffers downstream of qdisc as "external" to the system and just drop packets from those buffers if they sit there too long, just the same as if they were transmitted and someone else dropped them.
This would place an absolute upper bound on latency between the app and the wire.
Posted Sep 14, 2011 16:55 UTC (Wed) by njs (guest, #40338)
[Link]
That (plus intelligently setting up the buffers so that packets *don't* sit there too long) would be an improvement. Is it good enough? I dunno. Say we set a latency target of 10 ms. That means that sendfile()'s going to incur 100 wakeups/second, which is probably more than we'd like, but maybe acceptable (and maybe we'd need to wake up that often to deal with ACKs anyway). It's also not clear that that's an aggressive enough latency target. For a web server, that's already 10% of Amazon's "100 ms latency = 1% lost sales" guideline. For servers chatting with each other inside a datacenter, I just measured 1/4 of a ms as the average ping between two machines in our cluster, so call it an 80x increase in one-way latency. That seems like a lot, maybe?
And there are a lot of advantages to picking the *right* packets to drop -- if the packet you drop happens to be DNS, or interactive SSH, or part of a small web page, then you'll cause an immediate user-visible hiccup, and won't get any benefits in terms of reduced contention (like you would if you had dropped a packet from a long-running bulk TCP flow that then backs off). Then again, maybe that's okay, and re-ordering packets that have already been handed off to the driver does sound pretty tricky! (And might require hardware support.)
But it's useful to try and find the "right" solution first, because that way even if you give up on achieving it, at least in plan B you know what you're trying to approximate.
Hardware support
Posted Sep 14, 2011 17:19 UTC (Wed) by dmarti (subscriber, #11625)
[Link]
Yes, it would be a win to have hardware that can timestamp packets going into its buffers and drop "stale" ones on the way out instead of transmitting them. (relevant thread on timestamps from the bufferbloat list). Right now, hardware assumes that late is better than never, and TCP would prefer never over too late.
Hardware support
Posted Sep 15, 2011 7:58 UTC (Thu) by johill (subscriber, #25196)
[Link]
I'm pretty sure that's possible with a bunch of wireless devices, but I don't know how the timestamps are checked etc. off the top of my head.
Hardware support
Posted Sep 15, 2011 16:06 UTC (Thu) by dmarti (subscriber, #11625)
[Link]
That would be useful to see. It looks like the problem of bufferbloat is that packets stay in the buffer until they get stale -- so checking staleness directly, ideally without having to involve the CPU, could be a way to save having to tune the buffer size.