|From:||"Asgeir Eiriksson" <asgeir-AT-chelsio.com>|
|Subject:||Response to "TOE performance" letter in Sept 8 edition|
|Date:||Wed, 14 Sep 2005 10:55:34 -0700|
In his "TOE performance" letter in the September 8th issue, Dave S.
Miller asked for some further TOE performance information, and I'd like
to provide the following response.
Chelsio Communications Inc.
We welcome the chance to respond to the concerns that David Miller has
about TOE cards.
First, let me remark that it seems to me that he's been badly burned in
the past by over-hyped TOE cards, but I maintain that we at Chelsio have
learned from these prior mistakes by other people, and we should have a
fresh look at TOE cards at 10GE speeds. I'll mention some of the reasons
in the following:
> > You might want to ask the Chelsio guys to provide some performance
> > metric other than their "land speed record" that, as Linux
> > networking stack maintainer, I'm frankly sick of hearing about over
> > and over again.
Considerable number of HPC folk pay close attention to the LSR, and it
also demonstrates the resiliency of the TOE implementation to different
topologies (not all TOEs were created equal) and applications for remote
back-up (e.g. for single connection data transfers such as FTP).
Admittedly, the LSR has in fact gotten too much press.
> > What's more interesting to me is an area I know TOE is poor in, and
> > that is TCP connection rates. It's all too easy to make one sole
> > connection pump a lot of data, but it's hard to make a web or
> > database server serve hundreds of thousands of connections per
> > second. TOE cards generally cannot do that because each connection
> > setup/teardown requires setting up and tearing down state on the
> > network card, which subsequently kills TCP connection rates.
I agree with your list of important performance corners so the Chelsio
TOE is designed from the ground-up with these in mind.
I believe, your observations on the connection setup process might be
valid for the way Microsoft Chimney currently sets up connections (time
will tell), but this is not how Chelsio is proposing to do TOE on Linux.
In the proposed Linux patch the connection setup and teardown is
offloaded to the NIC, and a SYN that hits an offloaded listening server
triggers a request-to-host/response-from-host to "ASK" the host if the
connect request should be accepted or not (this allows full integration
with linux access controls, etc.). The response-from-host triggers the
sending of the SYN+ACK so I would maintain that this flavor of
connection setup integrates well with the Linux access controls, and you
will see some benefit in setup/teardown performance. The hardware is
capable of processing SYN packets at line rate (some millions/sec but
the setup loop includes socket creation/destruction, kernel checks and
controls on acceptance, making the overall rate lower). In the proposed
patch the active open is also offloaded, and this will lead to a
significant benefit in performance. Finally, the FIN processing is
offloaded for both types of close.
The Chelsio TOE does not have any on-chip caches and therefore has a
flat performance profile as the number of connections is increased. We
have measured the performance up to 14000 connections with linux 2.6.*
(about 6Gbps aggregate BW on an Opteron) and our profiling of the code
indicated to us that we were running into linux bottlenecks (select()
de-multiplexing, etc.) at that point and not hitting TOE issues (yet).
The veritest report Figure 3 and 4 at the following location
http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf shows the BW
from 1 up to 1000 connections, but the 14000 connection measurement is
unpublished internal data.
We also included low end-to-end latency in the list of design
objectives, and traffic shaping and pacing capabilities. The first
requirement is to enable going toe-to-toe with the IB and FC crowd. The
second requirement is useful for media pumping applications of 10s of
thousands of audio, or thousands of MPEG streams at one extreme, and on
the other end of the scale to throttle and give-priority to connections
that are going to e.g. storage, etc.
These issues are just a sampling of the issues that we've encountered in
our TOE integration work, and I have no doubt that the linux community
at large is capable of improving the integration, utility, and
performance of the TOE even further, and this frankly is one of the
motivations behind open sourcing our TOE software and submitting the
> > So if you're a scientist trying to break the land speed record
> > between Stanford University in California and some place in the
> > middle of Europe on the other side of the planet, yeah TOE is
> > probably a great toy to play with.
> > TOE users are niche, always have been, and always will be. It is no
> > mistake that the Chelsio guys do not delve into this aspect of their
> > technology.
> > And the study they mentioned in their mail to you of course will be
> > full of accolades for their approach. If you read only the documents
> > posted on their web site, you might think that TOE is the best thing
> > since sliced bread.
The publications on the Chelsio website fall into three broad
categories: a) PR by marketing people, b) white-papers by our engineers,
and c) published papers by some of the top names in the HPC field, and
you're no doubt referring to a) and maybe b) above in your remarks. The
papers in category c) are by independent researchers in the HPC field,
and in their papers they've chosen the applications to benchmark, and
they've chosen what to measure and how to measure it.
The following is the list of such publications at this writing:
 "Head to TOE Evaluation of High-Performance Sockets over Protocol
Offload Engines", by Dr. Wu Feng of Los Alamos National Labs, Dr. DK
Panda of Ohio State University, et al., that will appear at Cluster
2005, Boston. Available at
 "Performance Evaluation of a 10-Gigabit Ethernet TOE", by Dr. Wu
Feng of Los Alamos National Labs, et al. that appeared last month at Hot
Interconnect 2005. Available at
 "Infiniband and 10-Gigabit Ethernet for I/O in Cluster Computing",
by Helen Chen of Sandia National Labs, et al. that appeared in July at
the Cluster Symposium 2005. Available at
When I look through , , and  I observe the following:
- the performance for the applications chosen, and presumably the
applications that the researchers care about show TOE outperforming NIC
- the end-to-end latency for socket API and TOE is less than for SDP
- there are various traffic profiles in the benchmarks and TOE does well
on every single one.
There's of course a lot more there in the papers, and I invite people to
look through the results for themselves and reach their own conclusions.
As an aside: it is also interesting to note that TCP+Ethernet flow
control does great against the supposedly superior IB flow control in
all the above experiments.
> > The TOE folks are frankly between a rock and a hard place. They need
> > some support in upstream Linux for their solution to really be far
> > reaching and viable, yet the negative aspects of their technology
> > are such that this is likely not going to happen.
> > They also refuse to actively consider stateless offloads, which are
> > much better for long term maintainability and do not bypass the
> > Linux TCP networking stack we've been tuning for 10+ years. Doing so
> > would at least make these guys appear less anti-social and I would
> > certainly pay more attention to their concerns if they at least made
> > some efforts in this area. But they'll never do something so open
> > minded because their whole buisness model surrounds TOE.
> > With that in mind I applaud folks like Lenoid Grossman who are
> > working on stateless TCP receive offloads for highspeed networks on
> > the products they work on.
> > Take care.
Our NIC in addition to having TOE support, also offloads iSCSI, and
iWARP (RDMA), and has support for stateless offload technology such as
TSO and checksum generation/checking, and supports MSS of 1500B up to
jumbo frame size for each of these traffic types.
So, we are obviously for customer choice, and if I were to extract one
NIC vs. TOE observation from all our performance comparison work to
date, it would be the following:
- NIC with jumbo frames can fill a 10GE wire in the Tx or Rx directions,
but the NIC gets into trouble as the average packet size goes down
(packet frequency goes up) or the connection count goes up.
- TOE with a traffic mix anywhere from 500B average frame size to jumbo
frame size will fill a 10GE wire, i.e. no performance corners at any
packet size or connection count
Finally, not all applications are data mover applications that can use
jumbo frames, and there are applications with smaller packet sizes that
clearly benefit from TOE at 10GE speeds, so high performance TOE
integration into Linux clearly deserves to be considered without any
preconceived notions, similar to all other new technologies that Linux
considers for inclusion.
Chelsio Communications Inc.
Page editor: Jonathan Corbet
Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds