Checksum offloads and protocol ossification

Posted Dec 8, 2015 19:12 UTC (Tue) by josh (subscriber, #17465)
Parent article: Checksum offloads and protocol ossification

> convincing the networking maintainer that the hardware designers have heard his complaint may take a little longer.

That's going to cause problems if the only acceptable indication of "heard his complaint" is "decided he's right and done what he's demanding". If the answer turns out to be "no", I expect an ongoing demonstration of selective hearing difficulties towards any answer that doesn't sound like "yes". And even if the answer is "yes", hardware development cycles are long; hopefully all work on hardware offload won't stall until a new generation of hardware exists with these features.

To quote another mail from the thread (http://thread.gmane.org/gmane.linux.network/388085/focus=...):

"So we (as a kernel community) have users *NOW* who want this
feature, and hardware that is available *now* that has this feature.
Do you think we should wait for a unicorn to arrive that has a fully
programmable de-ossified checksum engine? How long?

[...]

I think that trying to force an agenda with no fore-warning and also
punishing the users in order to get hardware vendors to change is the
wrong way to go about this. All you end up with is people just asking
you why their hardware doesn't work in the kernel.

You have a proposal, let's codify it and enable it for the future, and
especially be *really* clear what you want hardware vendors to
implement so that they get it right."

The statement in the article that the networking developers "are, instead, developing a simpler, protocol-independent mechanism by which the hardware can support any protocol with checksum offloading." does not give any indication of the degree of overlap or discussion between the developers of that mechanism and the set of people who design networking hardware. Developing a mechanism for offloading functionality to networking hardware without working with hardware developers is like developing a specification for a new syscall without talking to kernel developers.

One question that Linux networking needs to be dealing with is "why are an increasing number of users bypassing the Linux networking stack entirely, such as to get more performance or smaller size?". DPDK and its performance, and lwIP/uIP and their size, are demonstrations that the Linux networking stack fails to meet the requirements of many potential users. In an ideal world, either those shouldn't exist at all because Linux already meets their requirements, or the Linux network stack should be designed to better integrate frameworks like those and bring them into the fold.

I don't see an *obvious* reason why Linux's networking stack needs to be significantly larger than lwIP, or significantly slower than DPDK. Today it is, but that doesn't seem like an innate property.

Checksum offloads and protocol ossification

Posted Dec 8, 2015 19:28 UTC (Tue) by SEJeff (guest, #51588) [Link] (1 responses)

And DPKT isn't even the only one. Mellanox's VMA (from before it bought Voltaire), and Solarflare's OpenOnload, both have been around much longer than DPDK. There are entire industries (finance) which rely on things like this for extremely low latency.

Checksum offloads and protocol ossification

Posted Dec 9, 2015 15:01 UTC (Wed) by nysan (guest, #81015) [Link]

Don't forget 6WIND.
And there is now an open source project at www.openfastpath.org

Checksum offloads and protocol ossification

Posted Dec 8, 2015 19:34 UTC (Tue) by pizza (subscriber, #46) [Link] (1 responses)

> I don't see an *obvious* reason why Linux's networking stack needs to be significantly larger than lwIP, or significantly slower than DPDK. Today it is, but that doesn't seem like an innate property.

For a good discussion of why, look up what happened with Van Jacobson channels.

In short, Linux's stack is bigger and/or slower than some alternatives because it does [much] more than those alternatives, and by the time you add $FeatureX to the alternatives it's no longer as small or fast as it used to be.

Checksum offloads and protocol ossification

Posted Dec 8, 2015 21:35 UTC (Tue) by josh (subscriber, #17465) [Link]

> For a good discussion of why, look up what happened with Van Jacobson channels.

I found the LWN article presenting those, but the only reference I have on their disposition suggests that the code never got published and remained slideware.

> In short, Linux's stack is bigger and/or slower than some alternatives because it does [much] more than those alternatives, and by the time you add $FeatureX to the alternatives it's no longer as small or fast as it used to be.

That's not an argument that the Linux stack *can't* match the size or performance of those alternatives. Given that people successfully use those alternatives, clearly $FeatureX is not essential for them.

For example, matching the size of lwIP would clearly require compiling out large parts of the stack. And matching the performance of DPDK would require large parts of the kernel to stop touching packets, and the moment you touch a packet in a way that requires additional software processing, performance properties would nosedive.

Checksum offloads and protocol ossification

Posted Dec 9, 2015 12:38 UTC (Wed) by ballombe (subscriber, #9523) [Link] (1 responses)

If you want low latency, bypassing the kernel entirely will always save you some milliseconds, some there is incentive to do it.

Checksum offloads and protocol ossification

Posted Dec 11, 2015 0:03 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]

The latency differences are in the microseconds. But aside from latency, it is also possible to achieve much higher packet rates with a more restricted user-space network stack.