RPS and RFS

Posted Aug 2, 2010 4:28 UTC (Mon) by jd (guest, #26381)
Parent article: The 2.6.35 kernel is out

Receive packet steering and receive flow steering would seem to be mostly necessary because NICs don't (as a rule) offer any smarts for kernel bypass. Seems like the chief problem is that commodity hardware hasn't kept up with the CPUs and the OS is having to circumvent such limitations as best it can.

That's not to say that they're not cool patches - they are wonderful patches. Rather, it is to say that the kernel cannot have infinite amounts of neatness. Sooner or later, there's going to have to be a dramatic shift in the way the work is done. Linux seems pretty ready for most likely scenarios, but it bothers me some that it has been mainly reactive rather than proactive in that regard. Not certain what the options are on the proactive side, but it would seem Linux is a great tool for manufacturers to experiment with ideas on FPGA - no shortage of people willing to be early adopters of experimental ideas, so no shortage of cheap market feedback before committing to anything.

RPS and RFS

Posted Aug 2, 2010 5:13 UTC (Mon) by daney (guest, #24551) [Link] (11 responses)

There are NICs that implement an entire IP stack on the NIC, thus off-loading almost all processing from the kernel. But (as a rule) getting drivers for these merged can be problematical.

RPS and RFS

Posted Aug 2, 2010 5:40 UTC (Mon) by jwb (guest, #15467) [Link] (10 responses)

As a rule, these are also tragically bad NICs which can't handle huge numbers of connections (seen by load balancers, to name just one) nor corner cases in the IP protocol, and which generally have some extremely expensive means of falling back into software.

RPS and RFS

Posted Aug 2, 2010 11:37 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

And, of course, it's in hardware and proprietary so you can't fix it. A slight speed boost in exchange for piles of extra bugginess and a closed-source black box implementing one of the most critical parts of the networking layer? Just Say No.

RPS and RFS

Posted Aug 2, 2010 13:21 UTC (Mon) by pabs (subscriber, #43278) [Link] (8 responses)

If OpenMoko folks can replace the proprietary GSM firmware on the FreeRunner, surely we can replace proprietary NIC firmware and get the best of both worlds?

RPS and RFS

Posted Aug 2, 2010 13:56 UTC (Mon) by cesarb (subscriber, #6266) [Link] (1 responses)

You are confusing Openmoko with OsmocomBB.

RPS and RFS

Posted Aug 22, 2010 9:25 UTC (Sun) by pabs (subscriber, #43278) [Link]

Well, OsmocomBB was started by folks who worked for Openmoko so...

RPS and RFS

Posted Aug 2, 2010 15:31 UTC (Mon) by nix (subscriber, #2304) [Link]

Not if anyone else does it the same way e1000e does it (every system has different EPROM contents, blowing it away makes the card useless). (Not that e1000e's firmware needs replacing: it's just an example of the sorts of things that could happen.)

RPS and RFS

Posted Aug 3, 2010 7:29 UTC (Tue) by jd (guest, #26381) [Link] (1 responses)

This would be an interesting problem. In theory, you could clean-room re-implement the existing firmware as stage 1, then replace the buggy IP stack you've now implemented with a good one. Provided the hardware was up to it, you'd be fine.

This brings up the issue raised by another poster of states. The amount of state you'd need to support must exceed that of the CPU and motherboard. Can you do this?

Yes, on two conditions: the card MUST have its own PCI Express controller, AND it must be capable of DMA operations. If it can access main memory in exactly the same way as the CPU, =plus its own=, then provided the regular kernel's VMM could take care of the ethernet adapter's memory needs as well as the OS', then the ethernet card can handle just as much state information as the OS could have at exactly the same speed.

Instead of trying to reverse-engineer a card, initially, I would suggest a proof-of-concept by building an ethernet card with either the Linux or NetBSD TCP/IP stack AND full DMA access to memory.

(Really, the full DMA access should really be there or you'd never be able to pass the packets from the card to the buffer of the client software through kernel bypass. However, that's an aside.)

If such a card was developed, even as a crude garage prototype, you could learn a lot about what such devices actually need in order to work well and thereby anticipate what the kernel will need at some future point when hardware does shift in nature.

RPS and RFS

Posted Aug 4, 2010 13:32 UTC (Wed) by marcH (subscriber, #57642) [Link]

This seems to make some of your dreams come true:

http://www.myri.com/Myri-10G/10gbe_solutions.html

Caveat: it is not cheap.

RPS and RFS

Posted Nov 24, 2010 5:13 UTC (Wed) by pabs (subscriber, #43278) [Link] (2 responses)

Another reason to replace proprietary NIC firmware with open source versions:

http://esec-lab.sogeti.com/dotclear/index.php?post/2010/1...

RPS and RFS

Posted Nov 24, 2010 5:28 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

or, to play devil's advocate, another reason to not allow the OS to have access to the firmware and make it completely closed.

RPS and RFS

Posted Nov 24, 2010 6:32 UTC (Wed) by foom (subscriber, #14868) [Link]

or, to play angel's advocate: another reason devices should have their firmware uploaded into volatile storage by the kernel during boot, rather than storing it in eeprom/flash.

RPS and RFS

Posted Aug 2, 2010 8:40 UTC (Mon) by farnz (subscriber, #17727) [Link]

The core of the problem is that for a NIC design to avoid being the bottleneck as CPUs get faster, it needs to be stateless, or to be able to hold more state than any future CPU and motherboard combination can handle. The second of these is clearly unrealistic; if you can hold that much state, you're too pricy for the market (you need things like multiple megabyte buffers for receive and send windows).

Thus, NIC designers go down the first route; things like GSO (and its subsets UFO and TSO) on the transmit side, and GRO (a generalisation of LRO) on the receive side directly help you scale up on one CPU. Then you add multiqueue transmit (so that multiple CPUs can send packets via the same ethernet card without interacting with each other) and RPS/RFS so that multiple CPUs can be used for packet reception without interacting with each other, and you get something which scales well with the speed of CPUs.

Note that RPS/RFS is a smart for speeding up kernel processing; by spreading packets across CPUs such that each CPU doesn't interact (cache effects etc) with other CPUs that are processing network packets, I get a near-linear speedup in packet reception with increasing CPU numbers. Without it, cacheline bouncing as CPUs inspect packets to see if they're of interest to this CPU or another CPU gets painful.