RPS and RFS

Posted Aug 2, 2010 8:40 UTC (Mon) by farnz (subscriber, #17727)
In reply to: RPS and RFS by jd
Parent article: The 2.6.35 kernel is out

The core of the problem is that for a NIC design to avoid being the bottleneck as CPUs get faster, it needs to be stateless, or to be able to hold more state than any future CPU and motherboard combination can handle. The second of these is clearly unrealistic; if you can hold that much state, you're too pricy for the market (you need things like multiple megabyte buffers for receive and send windows).

Thus, NIC designers go down the first route; things like GSO (and its subsets UFO and TSO) on the transmit side, and GRO (a generalisation of LRO) on the receive side directly help you scale up on one CPU. Then you add multiqueue transmit (so that multiple CPUs can send packets via the same ethernet card without interacting with each other) and RPS/RFS so that multiple CPUs can be used for packet reception without interacting with each other, and you get something which scales well with the speed of CPUs.

Note that RPS/RFS is a smart for speeding up kernel processing; by spreading packets across CPUs such that each CPU doesn't interact (cache effects etc) with other CPUs that are processing network packets, I get a near-linear speedup in packet reception with increasing CPU numbers. Without it, cacheline bouncing as CPUs inspect packets to see if they're of interest to this CPU or another CPU gets painful.