LWN.net Logo

No fear of the fire engine

The Linux developers have long, and with reason, been pleased with the performance of the kernel's networking subsystem. For various reasons, there is also a longstanding rivalry between the Linux networking hackers and their Sun counterparts. So when The Register posted an article about the "Fire Engine" networking stack which will be part of future Solaris releases, it drew some attention. This quote from John Fowler, Sun's software CTO:

Also we focused on CPU utilization. One of the little secrets of networking is high speed interfaces can in fact pump lots of bits, but they chew up lots of CPU, which means you aren't doing other things. We worked hard on efficiency, and we now measure, at a given network workload on identical x86 hardware, we use 30 percent less CPU than Linux.

also didn't help.

The dissection of Sun's claim was quick to begin. It was pointed out that we don't know which version of Linux is being referred to in the quote. There's a lot of differences between the 2.4 and 2.6 kernels, and it would not be quite sporting for Sun to be comparing its upcoming, unreleased technology with an old version of Linux.

Sun's performance improvements appear to be based on the use of "TCP Offload Engine" (TOE) technology. The idea of a network adaptor which can take on the network protocol overhead is not particularly new; such hardware has been available for many years. The Linux networking hackers have always had a low opinion of the TOE approach, however. TOE hardware may offload a bit of work from the processor, but it suffers from a number of disadvantages:

  • When you use TOE hardware, you have just moved your networking stack into a firmware-based, close-source module. This code can not be inspected, fixed, or improved.

  • TOE-based networking suffers from latency problems. The setup and teardown of network connections still requires the processor's intervention, and that means several round trips over the bus for each connection.

  • As Larry McVoy heard from "Sun employee #1," processors are getting faster much more quickly than TOE hardware is. Even if a TOE adaptor performs reasonably when it is released, it will be quickly outstripped by processor-based TCP implementations.

The 2.6 networking stack is happy to offload some functions to smart interfaces; examples include packet checksumming and TCP segmentation. But the full TCP offload approach is likely to remain unpopular into the future.

In general, the networking hackers do not feel threatened by "Fire Engine." That didn't stop them from having a discussion of how Linux networking could be made faster, however. The conversation was based around a shopping list of possible improvements posted by Andi Kleen. This list includes a number of good ideas, but the bulk of the debate concerned a relatively obscure topic: timestamp generation.

Certain applications want to get each packet packaged with a timestamp saying exactly when that packet was received. Tools like tcpdump, for example, make use of this capability. The socket interfaces were designed in such a way that the networking subsystem cannot know if any particular packet needs to be timestamped or not; as a result, it generates timestamps for all incoming packets, even though they are rarely used.

The problem is that this timestamp generation gets to be expensive when you have thousands of packets flowing through the system every second. Depending on the architecture Linux is running on, generating the timestamp can involve talking to a (slow) off-CPU timer or moving cache lines frequently between processors. Improving the timestamp generation might be the most straightforward way of speeding up Linux networking, at least at the high end.

That fix is not entirely easy, however. Networking maintainer David Miller is unwilling to make any changes that would reduce the accuracy of the timestamps returned to user space. Any such changes would be seen as an API change; somebody, somewhere, would be badly affected by it. The proper solution, as proposed by David, is the creation of a new fast_timestamp_t type which is quicker to generate, but which can be converted to a real time when the need arises. The optimal implementation of this type would be highly dependent on the underlying architecture; on many systems the CPU cycle timer could be used, but that approach would not work universally. A default, architecture-independent "fast timestamp" implementation is easy to add, however. Creating that sort of structure for the architecture maintainers to play with may be one of the first things to happen when the 2.7 series opens up.


(Log in to post comments)

No fear of the fire engine

Posted Dec 4, 2003 6:36 UTC (Thu) by yohahn (subscriber, #4107) [Link]

It's nice that CPU's get faster, but what of the embeded projects?

When you have a fixed amount of processor power for a price, it's hard to not wish that networking took less cpu time.

Is there a place for some off cpu, toe usage in the embeded world?

No fear of the fire engine

Posted Dec 4, 2003 8:38 UTC (Thu) by Nick (subscriber, #15060) [Link]

The TOE doesn't come for free though. It would probably be cheaper and simpler to use a more powerful general purpose CPU rather than a TOE solution, wouldn't it?

Once more, dear friends, around the wheel of karma

Posted Dec 4, 2003 14:07 UTC (Thu) by davecb (subscriber, #1574) [Link]

The cycle from cpu-based graphics to add-on-card-based graphics is so common that it has a nickname: "the wheel of karma". Expect the same to apply to add-on-card ethernet. I also recollect discussing this same subject with a router/firewall designer in the 386 era (;-))

Once more, dear friends, around the wheel of karma

Posted Dec 4, 2003 19:20 UTC (Thu) by acristianb (guest, #1702) [Link]

I agree that as network becomes faster and the CPU cannot keep up with it the need for offloading arises. But I remember vaguely that there was a NIC or maybe something more than that that was based on an embeded Linux chip. I seem to remember seeing it like a year ago or so. Anybody has more details? Could this solve the problem for openess (i.e. you have an embeded Linux running on the xGig card and upgrade it when necessary)?

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds