Posted Jan 25, 2011 19:49 UTC (Tue) by jthill
In reply to: LCA: Vint Cerf on re-engineering the Internet
Parent article: LCA: Vint Cerf on re-engineering the Internet
I think I've got what might be a helpful contribution here. I spent many years, long ago, fixing and building high-performance networking code in address spaces handling thousands of connections, so I hope it turns out to be worth attention. It might also be something everybody knows about already, but I get the suspicion from the discussion I'm seeing that, maybe it isn't a known and discarded idea. Your reasoning shouldn't have been so easily missed if it were.
So let me start with a slightly artificial example to get all the elements in play: on such a router with high-volume TCP endpoints of its own, the TCP buffers need to be kept separate from the routed-packet buffers because the TCP buffers are necessary only for TCP retransmit and shouldn't be allowed to clog the queue for telnet or whatnot.
No need to burden QoS for this: separate and shrink the routed-packet pool, and arrange to have the routed pool ask for another packet from the TCP pool shortly before it needs it. That will do the trick automatically if I have it right. It occurs to me, since endpoint TCP has much more info available than any router, it should be able to do that-much-better prioritizing anyway.
To keep fairness with non-local sources, local TCP gets some simple proportion of the packets in the pool relative to the packets from other sources. Packets going to local TCP never enter the routed pool at all: it's a matter of swapping a full routed-packet buffer for an empty receive-window buffer. TCP can offer ACK packets in return right there.
So, to the payload: even though doing this for local TCP achieves the purpose in that scenario, I don't see any intrinsic reason to do this only for TCP, or only locally.
When the opportunity and need coincide, why not do this kind of coordinated buffer management across links?
This isn't source-quench. The basic idea needs extension to handle more general cases, but start small.
Pick a leaf router, where one link reaches the vast majority of the net and virtually everything reaching it is going to use that link. Use the idle local bandwidth to make the backpressure explicit.
To put it in a way that might horrify some, why not have a congested leaf router convert its downstream links to half-duplex for the duration of the congestion? It's easy: "Ok. Go 'way now, I'm busy". "Gimme a packet". "Ok, send what you like".
Those need acks to avoid throttling in error, but again those are sent on links that should be idle anyway. This is one-hop link management.
Plainly, when life starts to get interesting (i.e. when more than one of the router's links is likely to get congested), the poll should explicitly list congested routing entries. An overbroad (or ignored) choke list would slow some things down unnecessarily, but if the choke is honored at all (and the router sanely reserves one or two packets for each link no matter what) the congestion gets pushed directly to its source.
When you get to nodes where combinations of inbound links are saturating combinations of outbound links I think this starts running out of steam, but as I understand it those aren't the nodes where we're seeing this problem in the first place.
So, thanks for reading, if it's a good idea I don't feel all possessive about it, and either way I'd appreciate feedback.
to post comments)