Linux and TCP offload engines
Linux has never supported the TOE features of any network cards. For some time, there had not even been much discussion of TOE support. The topic has returned, however, with this patch adding TOE support which was posted by Scott Bardone of Chelsio Communications. This TOE patch is clearly intended to support Chelsio's line of network adapters, but it has been coded as a more generic "open TOE" framework. The Chelsio folks would very much like to see this patch merged for the 2.6.14 kernel release.
Those who are curious about the TOE patch can go in and look at the code; it is relatively straightforward. At its core, it creates a new type of extended network device (struct toedev) with an additional set of methods:
int (*open)(struct toedev *dev); int (*close)(struct toedev *dev); int (*can_offload)(struct toedev *dev, struct sock *sk); int (*connect)(struct toedev *dev, struct sock *sk); int (*send)(struct toedev *dev, struct sk_buff *skb); int (*recv)(struct toedev *dev, struct sk_buff **skb, int n); int (*ctl)(struct toedev *dev, unsigned int req, void *data); void (*neigh_update)(struct net_device *lldev, struct toedev *dev, struct neighbour *neigh, int fl);
There are various hooks sprinkled through the TCP code to detect when a TOE-capable device is being used and call the appropriate method rather than performing the TCP processing in the kernel. One assumes that the patch works as advertised, but its chances of getting into the kernel appear to be relatively small. There is a very long list of objections which have been raised, including:
- The TOE code must, by necessity, hook deeply into the Linux TCP
implementation. These hooks will make it harder to make high-level
TCP changes in the future. The TOE patch thus represents a long-term
maintenance burden.
- TOE shorts out much of the Linux networking code. In the process, it
cuts out little features like netfilter, traffic control, and more.
So a Linux system using TOE will lack many of the capabilities which
characterize the Linux networking stack. The networking hackers can
already foresee the interminable series of "why doesn't my TOE adapter
support netfilter?" questions which will go their way.
- The Linux networking stack is easy to fix when a bug or security issue
comes up. If a security problem turns up in a TOE adapter, instead, there is
very little which can be done to fix it.
- The performance benefits from TOE are minimal at best. Even if a TOE adapter and software stack currently outperforms "dumber" adapters for very high networking speeds (10G currently, say), that advantage tends to disappear by the time those speeds are in common use. Jeff Garzik claims that 100Mb/s TOE adaptors (which used to be the bleeding-edge high speed) are now slower than the Linux networking stack. So any performance advantage from TOE is a temporary thing, but, once it is merged, the code must be supported forever.
There is also the inconvenient little detail that a company called Alacritech owns several patents relating to TOE. It recently used those patents to extract money from Microsoft, which is including TOE support in its upcoming Windows release. This, alone, would almost certainly cause distributors to disable TOE support, even if it were to find its way into the kernel. (For the record, Chelsio claims to have done its legal homework, but not everybody finds that claim to be convincing).
Will it find its way in? Not if David Miller has anything to say on the matter:
There is essentially zero chance of a networking patch being merged over David's objections, so the TOE developers have an uphill road ahead of them.
One might well ask: if TOE cannot be merged, how will Linux maintain competitive speeds as networks get faster? A big area of interest, currently, is offloading parts of the protocol which do not require great intelligence or state in the card. The kernel already supports TCP segmentation offloading (TSO), where an adapter can create TCP packets out of a large array of data. TSO reduces the necessary CPU power, bus overhead, and cache impact to send a series of packets, but it still does not require that the adapter actually know anything about specific TCP connections. There is talk of using a similar technique for incoming packets: an adapter could merge a configurable set of incoming packets into a single array, thus reducing the demands on the rest of the system. One way or another, the networking stack is likely to keep up with the demands of current hardware.
It has often been said that a maintainer's real job is to say "no" to
patches. Not all features are worth their (very real) cost, and merging
some patches can be detrimental to the kernel in the long run. For years,
the networking maintainers have felt that TOE support is the kind of patch
which should not be accepted, and the current implementation appears not to
have changed their minds. TOE appears to be one of those ideas which never
really goes away, however, so chances are good that we will see this debate
again in the future.
Index entries for this article | |
---|---|
Kernel | Networking |
Kernel | TCP |
Posted Aug 25, 2005 6:32 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (2 responses)
I hope the TOE guys will plug a few gigE network cards into a dual Opteron setup and run a network benchmark (SpecWEB? ttcp?). Then turn on TOE and check the speedup. If it's better than 30% then I would support integrating TTCP into the kernel. Short of that, I would jus wait the six months until computing hardware closes the gap, or add another machine. I'm skeptical that TOE will even get 30%.
So... Have any reliable numbers been produced yet?
Posted Aug 25, 2005 8:00 UTC (Thu)
by gdt (subscriber, #6284)
[Link] (1 responses)
At 10Gbps the issue is not the speed so much as the networking stack using so much of the CPU as the computer has too little user space CPU left to do anything much useful. That's what TOEs address. Also note that the choice isn't about offload, but the amount of state the offload needs and provides. For example, an offload which allowed the important TCP control decisions to be made by the CPU would allow most of the advances in the Linux kernel whilst not increasing CPU load overly (since connections rarely alter rate or state). A TSO which played out at a specified rate would be extremely useful. You are right, the stack can always revert to using the CPU when a feature which requires it is configured. But any network engineer that has used a router which radically drops its throughput when you innocently alter the configuration can tell you how frustrating this design choice can be. There must be a way of manually disabling the TOE, just as other offloads can be manually disabled now. That just isn't useful for security, but for fault finding, resilience and running with known bugs (eg, the TSO feature was not compliant with congestion control needs in some kernel versions). What concerns me more about Linux networking software is that the developers are getting fine results using ttcp and iperf but users that want to do large file transfers (ie, something useful as well as shunting about packets) are getting numbers typically around 300Mbps. The users have too few tools allowed for by the kernel for tracking down the source of their poor performance. It's a major exercise in patch application to get simple data like the amount of CPU and I/O used by kernel subsystems; or to get TCP's view of the performance of the network and remote host.
Posted Sep 2, 2005 6:37 UTC (Fri)
by mingo (guest, #31122)
[Link]
so the fundamental question is the basic question that the network maintainers always stressed, and which this open letter does not address at all: how does TOE compare to TSO. I.e. the issue is indeed what you stated too: not whether to offload, but how much state to offload.
so any attempt to try to mischaracterise this whole issue as some "unwillingness of the Linux networking maintainers to integrate TOE" is misguided at best.
Posted Aug 25, 2005 7:37 UTC (Thu)
by hingo (guest, #14792)
[Link]
Posted Aug 25, 2005 10:08 UTC (Thu)
by pvaneynd (subscriber, #898)
[Link]
Posted Aug 26, 2005 0:44 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
Something I've always wondered about TCP offload and every other kind of offload: Is it better to add CPU power to a network adapter than to add it to the motherboard? Adding it to the motherboard (faster CPU chip, more SMP CPUs) is certainly simpler, which ought to mean cheaper. Are there technological limits that make it impossible for a single SMP complex to handle all the 10 Gbps packets and all the higher level stuff too?
Sometimes, people add intelligence to the periphery of a system in order to make things more simple by allowing the central processor to remain blissfully ignorant of what's going on out there. But since TCP offload actually requires the central processor to cooperate, this doesn't fit that pattern.
Posted Aug 26, 2005 7:03 UTC (Fri)
by njhurst (guest, #6022)
[Link]
In the case of TOE, I imagine that there are checksums and so on that could be performed whilst dmaing the data through. If this were done by the main CPU it might be that the data arrives much slower than the CPU could handle them, but faster than is suitable for interrupts. So the driver has to do the checksum on a block of data after it is in main memory, effectively doubling the time spend on each packet. The card could just compute the checksum as it does the dma transfer, and leave that somewhere convenient.
I remember that back in the days of the 68k macs they used a separate 6801 or similar to handle the io, as that reduced the time spent in interrupts drastically, without requiring a much more powerful CPU.
Posted Sep 2, 2005 2:12 UTC (Fri)
by krishna (guest, #24080)
[Link]
Posted Sep 6, 2005 16:53 UTC (Tue)
by abhishek (guest, #10716)
[Link] (1 responses)
Posted Dec 8, 2015 20:16 UTC (Tue)
by SEJeff (guest, #51588)
[Link]
As always, the patch submitter must prove that the patch is necessary before it can be accepted. I've seen a lot of TOE code flying around but so far no good performance numbers.Linux and TCP offload engines
Linux and TCP offload engines
note that TSO (TCP Segmentation Offload) has extensive support in the 2.6 Linux kernel, and been supported for a long time. All the network hardware that is capable of doing TSO has native Linux driver support for it: tg3, e1000, ixgb, s2io, bnx2, qeth, tg3, 8139cp - you name it.Linux and TCP offload engines
Objections nr 2 and 3 are not very good objections at all. IANAKH (I am not a kernel hacker) but: A more intelligent patch would just fall back on the Linux stack, if it recognizes that netfilter or something else is being used. Similarly, if a TOE card is found to be vulnerable, a security update would just remove that card from the list of TOE cards.
Linux and TCP offload engines
Objection nr 4 is the important one. Adding more code to the kernel without any performance benefit would obviously be silly. If the authors of the patch have done their homework, they will have benchmarks to start the discussion with.
Another counter-argument is that TOE hide problems: several times already a 'slow network' turned out to be a semi-broken one. The fact that 'netstat -ni' showed no errors at all until TOE was turned off make finding the problem rather difficult; not even tcpdump could see the retransmissions, only a packet leaving and arriving a _long_ time later.Linux and TCP offload engines
Standard procedure now is to turn off TOE until a proven need is determined.
Linux and TCP offload engines
I know very little about this, but I think the two things you gain are the ability to make hardware specifically to handle the kinds of calculations (think GPU, FPU, DSP etc) and the ability to concentrate on the data without having to deal with things like interrupts and memory management (polling).Linux and TCP offload engines
I have to wonder if Chelsio has had this patch out long enough for them to Linux and TCP offload engines
understand what FAQs pop up as a result (e.g., netfilter not working,
performance) as well as identifying how much work it takes to maintain it
over time, and whether it's reasonable to expect the kernel core
developers to just pick it up.
Albeit with zero facts, I suspect that the patch may have been submitted
without having been tested in their customers' environments over a few
linux kernel releases. From this, Chelsio would get a first feel for
FAQs, continuous maintenance, etc., and even viability in their own
customers' environments (e.g., "no netfilter? Screw it, we'll turn TOE off
in that case"). Also, it would be interesting to see how the BSD folks'
responses would compare if this patch was submitted to them.
Finally, it seems that some kind of committee for discussing kernel<->TOE
integration and use issues would make sense prior to tossing a patch
implementing an 'Open TOE' interface out there.
How does Linux perform with 4 10GBPS NICs? ...Anyone?Linux and TCP offload engines
Linux and TCP offload engines