LWN: Comments on "Checksum offloads and protocol ossification" https://lwn.net/Articles/667059/ This is a special feed containing comments posted to the individual LWN article titled "Checksum offloads and protocol ossification". en-us Mon, 03 Nov 2025 14:22:37 +0000 Mon, 03 Nov 2025 14:22:37 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Checksum offloads and protocol ossification https://lwn.net/Articles/671815/ https://lwn.net/Articles/671815/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; If you want to 'really scale' then the best thing you can do is ignore the kernel and perform as much networking as possible in your application. Other people have mentioned some of the userspace network drivers that by-pass the kernel implementation in the comments in this article already. </font><br> <p> Another (closed source) example: <a href="http://ats.aeroflex.com/virtualized-ip-test-solutions/product-overview/tvm-standard-hardware">http://ats.aeroflex.com/virtualized-ip-test-solutions/pro...</a><br> </div> Thu, 14 Jan 2016 06:53:27 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/668690/ https://lwn.net/Articles/668690/ Lennie <div class="FormattedComment"> Dan Kaminsky did a talk ones about using HTTP to transport packets:<br> <a href="https://www.youtube.com/watch?v=YwbpnZe74ds">https://www.youtube.com/watch?v=YwbpnZe74ds</a><br> <p> ;-)<br> </div> Mon, 21 Dec 2015 08:53:35 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/668482/ https://lwn.net/Articles/668482/ moltonel <div class="FormattedComment"> To me that looks like the right thing to do. Open up the network hardware like has been done for graphic cards. CPU, GPU... NPU ? It might be a big initial R&amp;D investment for network hardware manufacturers, but the software development savings should make up for it quickly enough. Plus, whoever's first to market gets to design the generic API that just happens to fit their hardware best :p<br> </div> Fri, 18 Dec 2015 13:34:23 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667648/ https://lwn.net/Articles/667648/ drag <div class="FormattedComment"> This was posted a while ago to a lwn comments. Sorry I forget who posted it, but it's a good one:<br> <p> <a href="http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html">http://highscalability.com/blog/2013/5/13/the-secret-to-1...</a><br> <p> <p> Basically: <br> <p> If you want to 'really scale' then the best thing you can do is ignore the kernel and perform as much networking as possible in your application. Other people have mentioned some of the userspace network drivers that by-pass the kernel implementation in the comments in this article already. <br> <p> Seems to me that if people really want the 'bestest fastest lowest latencinest performance' from their TCP stack for specialized application then going about it by a hardware-based offload of TCP/IP seems to be the wrong approach. The right approach is to use a application-level network driver and let the application due the calculations. If you want to throw hardware acceleration at the problem then have it be something that applications can use to help accelerate the calculations they need to do rather then something that hides in a nic card. <br> <p> Then if the kernel is involved at all then all it should do is provide a reasonable method for those applications to access the 'acceleration hardware' via some sort of mechanism like DRM drivers do.<br> <p> As far as tunneling goes.. as much as I love things like vxlan they really seem to be mostly used to work around IPv4 addressing limits. A much better approach seems to be things like let your virtualmachines/containers/etc get their own ipv6 address automatically and then rely on level 3 routing to deliver packets to everything. Any tunneling going on should just be IPv6 over IPv4 udp as a stop gap solution to deal with shitty 'cloud' networks. Otherwise you can just end up with tunnels in tunnels in tunnels and nobody wants that. Anything 'container/virt' infrastructure that doesn't integrate service discovery (and/or 'VIPS' or whatever) to help services and clients find things automatically is just a half-assed solution anyways, which means that it makes the difficulty of dealing with 'static' ipv6 addresses and dhcp mute. If done correctly there is no reason at all that end users should be aware that they are using ipv6 or ipv4. <br> <br> oh well. Never had good luck with 'offload engines' anyways.<br> </div> Sun, 13 Dec 2015 07:02:33 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667616/ https://lwn.net/Articles/667616/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; so that it can be routed efficiently and not be blocked by middleboxes that don't understand it [... ] We have slowly optimized ourselves into a situation where the development and deployment of new protocols (or even significant enhancements to existing protocols) is increasingly difficult; even well defined protocols like SCTP and DCCP are hard to deploy in real-world settings.</font><br> <p> I think one of the main reasons, and maybe even the main one, is the complete "black box" aspect of IP networking. More opaque than the most closed source software.<br> <p> The end to end principle was great and all but it did not anticipate that the network would fight back and grow a lot of smarts (firewalls et al.) anyway even when it was not supposed to. Since they never were and are still not supposed to exist, these smarts are not required to provide any feedback, so when they fail they just fail silently/stealthily and can be neither identified nor pinpointed. This tends to please network administrators in their basement more than happy to dodge support calls[*] since they can't even be located.<br> <p> When even the most opaque application fails, one can typically still dig out somewhere some error message than can be Googled. Worst case the behaviour can be described. With networking it's dead end road every way. Hidden so well, IP networking never changes, never gets fixed,... ossifies.<br> <p> [*] <a href="https://www.youtube.com/watch?v=rksCTVFtjM4">https://www.youtube.com/watch?v=rksCTVFtjM4</a><br> <p> <font class="QuotedText">&gt; An attempt to push hardware designers in a different direction may seem a bit like throwing Linux's weight around</font><br> <p> Turning things around just once.<br> <p> </div> Sat, 12 Dec 2015 05:26:11 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667545/ https://lwn.net/Articles/667545/ jezuch <div class="FormattedComment"> <font class="QuotedText">&gt; Basically, you have people using websockets over HTTPS to open tunnels between services.</font><br> <p> Though my professor at the university was not really amused when I started insisting that the Internet's protocol stack is 8-layer (instead of the "traditional" 7 layers), where HTTP(S) is the top-most layer :) And it was a nontrivial number of years ago already.<br> </div> Fri, 11 Dec 2015 11:19:52 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667539/ https://lwn.net/Articles/667539/ kleptog <div class="FormattedComment"> The RFC is supposed to be a joke, but it's surprisingly close to the truth. Basically, you have people using websockets over HTTPS to open tunnels between services. Bypasses firewalls, proxies, load balancers, everything. Evolution in action: by punishing anything that looks out of the ordinary, all network traffic evolves to becoming indistinguishable from eachother.<br> </div> Fri, 11 Dec 2015 07:34:33 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667523/ https://lwn.net/Articles/667523/ BenHutchings <div class="FormattedComment"> The latency differences are in the microseconds. But aside from latency, it is also possible to achieve much higher packet rates with a more restricted user-space network stack.<br> </div> Fri, 11 Dec 2015 00:03:45 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667399/ https://lwn.net/Articles/667399/ alexl <div class="FormattedComment"> The cost is not that huge for a shared memory architecture like the intel gpus. And if you're doing crc32 for instance you could run parallel crcs on different substrings and then combine on the CPU (like crc32_combine from zlib).<br> <p> Still, I dunno if it is faster, it may be memory bandwidth bound.<br> </div> Thu, 10 Dec 2015 09:21:59 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667396/ https://lwn.net/Articles/667396/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; We increasingly find ourselves on an Internet that can only manage TCP and UDP, and relatively unchanging versions of TCP and UDP at that.</font><br> <p> Of course you meant HTTP.<br> <p> <a href="https://tools.ietf.org/html/rfc3093">https://tools.ietf.org/html/rfc3093</a> Firewall Enhancement Protocol (FEP)<br> <p> </div> Thu, 10 Dec 2015 08:38:02 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667389/ https://lwn.net/Articles/667389/ eternaleye That's pretty much exactly what the <a href="http://www.barrelfish.org/ma-antoinek-dragonet.pdf">Dragonet networking architecture</a> is about; it's a fascinating design at least in part because, from the perspective of userspace, the kernel can then be viewed as just such a NIC. Thu, 10 Dec 2015 04:33:02 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667276/ https://lwn.net/Articles/667276/ nysan <div class="FormattedComment"> Don't forget 6WIND.<br> And there is now an open source project at www.openfastpath.org<br> </div> Wed, 09 Dec 2015 15:01:06 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667221/ https://lwn.net/Articles/667221/ ballombe <div class="FormattedComment"> If you want low latency, bypassing the kernel entirely will always save you some milliseconds, some there is incentive to do it.<br> </div> Wed, 09 Dec 2015 12:38:35 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667262/ https://lwn.net/Articles/667262/ mjthayer <div class="FormattedComment"> If the kernel just provided a full generic software stack which let driver override selected parts in hardware at minimal complexity cost to the kernel stack, is there anecdotal evidence that people are likely to limit their use of protocols to ones which are accelerated in hardware? Especially if finding out which those are for any particular network set-up requires additional effort on their part, and things just work (slightly more slowly) for their preferred choice? Of course if pieces of hardware along the network actively prevented use of protocols that would be a different matter, but I don't see how not allowing selected hardware acceleration would prevent that.<br> </div> Wed, 09 Dec 2015 11:00:50 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667257/ https://lwn.net/Articles/667257/ xav <div class="FormattedComment"> Nope. Passing data to/from the GPU has an enormous fixed cost, and GPUs are slow for non-parallel computations, so this is the exact case of what NOT to offload to a GPU.<br> </div> Wed, 09 Dec 2015 09:50:36 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667260/ https://lwn.net/Articles/667260/ paulj <div class="FormattedComment"> Agreed on the TCP checksum. E.g., it doesn't detect re-ordering. Which has bitten me in the past with dodgy hardware.<br> </div> Wed, 09 Dec 2015 09:49:46 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667252/ https://lwn.net/Articles/667252/ paulj <div class="FormattedComment"> They can protect against bugs in between the checksum being calculated and the packet passing through the L2 CRC engine.<br> <p> I've seen weird driver bugs where chunks of packets were being dropped after being sent by userspace. The L2 CRC was fine, but the kernel applied header checksum was wrong. Turned out to be a subtle bug in the proprietary forwarding hardware driver, on raw sockets, iirc.<br> </div> Wed, 09 Dec 2015 09:13:51 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667248/ https://lwn.net/Articles/667248/ alexl <div class="FormattedComment"> Would it not be possible to use the GPU to offload some of these calculations?<br> </div> Wed, 09 Dec 2015 07:46:34 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667242/ https://lwn.net/Articles/667242/ luto <div class="FormattedComment"> I'd love to see some focus shift from extremely weak checksums like UDP's to stronger ones like CRC. CRC has all the magic properties needed: it's linear, so you can subtract parts off, and it can be shifted, so you can take a CRC of some suffix or middle chunk of a packet and extend it to the CRC of the whole thing.<br> <p> And yes, I have seen bad packets over TCP that survive the checksum check. It's just too weak.<br> </div> Wed, 09 Dec 2015 03:49:00 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667206/ https://lwn.net/Articles/667206/ josh <div class="FormattedComment"> Bad checksums do happen; among other things, I've seen case studies of them happening due to memory errors. Also see <a href="http://dinaburg.org/bitsquatting.html">http://dinaburg.org/bitsquatting.html</a> , and notice the mentions that bit errors at some phases of the process will get rejected due to checksums.<br> <p> Some protocol in the stack needs to have *cryptographic* integrity; for instance, TLS provides cryptographic integrity guarantees. However, at the lower levels, a quick checksum to confirm valid packet delivery allows the network stack to say "didn't get that, send it again", transparently to the application, as part of the normal ACK/NAK process.<br> <p> Also see <a href="https://en.wikipedia.org/wiki/End-to-end_principle">https://en.wikipedia.org/wiki/End-to-end_principle</a> .<br> </div> Tue, 08 Dec 2015 21:45:36 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667204/ https://lwn.net/Articles/667204/ josh <div class="FormattedComment"> <font class="QuotedText">&gt; For a good discussion of why, look up what happened with Van Jacobson channels.</font><br> <p> I found the LWN article presenting those, but the only reference I have on their disposition suggests that the code never got published and remained slideware.<br> <p> <font class="QuotedText">&gt; In short, Linux's stack is bigger and/or slower than some alternatives because it does [much] more than those alternatives, and by the time you add $FeatureX to the alternatives it's no longer as small or fast as it used to be.</font><br> <p> That's not an argument that the Linux stack *can't* match the size or performance of those alternatives. Given that people successfully use those alternatives, clearly $FeatureX is not essential for them.<br> <p> For example, matching the size of lwIP would clearly require compiling out large parts of the stack. And matching the performance of DPDK would require large parts of the kernel to stop touching packets, and the moment you touch a packet in a way that requires additional software processing, performance properties would nosedive.<br> </div> Tue, 08 Dec 2015 21:35:45 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667199/ https://lwn.net/Articles/667199/ yootis <div class="FormattedComment"> <p> Is there even value in checksums in headers anymore? All of the transport mechanisms like ethernet already have much more powerful CRCs. I've never heard of packets getting delivered with bad checksums, so why are they even used? <br> <p> <p> </div> Tue, 08 Dec 2015 21:19:58 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667188/ https://lwn.net/Articles/667188/ flussence <div class="FormattedComment"> Keeping everything but generic number-crunching in the kernel is probably a very good idea, for much the same reasons as RAID.<br> <p> I was experimenting with `ethtool -k` settings on my LAN the other day; the hardware doesn't have much of a feature set and it's all off-by-default, but on one end (a RTL8168e) enabling any of the interesting offloading features it claims to support... breaks everything. Nothing more fun than silent failures caused by buggy hardware!<br> <p> Admittedly that experience is based on $0.10 desktop Realtek chips, but at the same time, paying 10-1000× more for any type of hardware doesn't have a linear correlation to quality.<br> </div> Tue, 08 Dec 2015 20:12:28 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667187/ https://lwn.net/Articles/667187/ pizza <div class="FormattedComment"> <font class="QuotedText">&gt; I don't see an *obvious* reason why Linux's networking stack needs to be significantly larger than lwIP, or significantly slower than DPDK. Today it is, but that doesn't seem like an innate property.</font><br> <p> For a good discussion of why, look up what happened with Van Jacobson channels.<br> <p> In short, Linux's stack is bigger and/or slower than some alternatives because it does [much] more than those alternatives, and by the time you add $FeatureX to the alternatives it's no longer as small or fast as it used to be.<br> </div> Tue, 08 Dec 2015 19:34:27 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667186/ https://lwn.net/Articles/667186/ SEJeff <div class="FormattedComment"> And DPKT isn't even the only one. Mellanox's VMA (from before it bought Voltaire), and Solarflare's OpenOnload, both have been around much longer than DPDK. There are entire industries (finance) which rely on things like this for extremely low latency.<br> </div> Tue, 08 Dec 2015 19:28:47 +0000 Checksum offloads and protocol ossification https://lwn.net/Articles/667166/ https://lwn.net/Articles/667166/ josh <div class="FormattedComment"> <font class="QuotedText">&gt; convincing the networking maintainer that the hardware designers have heard his complaint may take a little longer. </font><br> <p> That's going to cause problems if the only acceptable indication of "heard his complaint" is "decided he's right and done what he's demanding". If the answer turns out to be "no", I expect an ongoing demonstration of selective hearing difficulties towards any answer that doesn't sound like "yes". And even if the answer is "yes", hardware development cycles are long; hopefully all work on hardware offload won't stall until a new generation of hardware exists with these features.<br> <p> To quote another mail from the thread (<a href="http://thread.gmane.org/gmane.linux.network/388085/focus=388135">http://thread.gmane.org/gmane.linux.network/388085/focus=...</a>):<br> <p> "So we (as a kernel community) have users *NOW* who want this<br> feature, and hardware that is available *now* that has this feature.<br> Do you think we should wait for a unicorn to arrive that has a fully<br> programmable de-ossified checksum engine? How long?<br> <p> [...]<br> <p> I think that trying to force an agenda with no fore-warning and also<br> punishing the users in order to get hardware vendors to change is the<br> wrong way to go about this. All you end up with is people just asking<br> you why their hardware doesn't work in the kernel.<br> <p> You have a proposal, let's codify it and enable it for the future, and<br> especially be *really* clear what you want hardware vendors to<br> implement so that they get it right."<br> <p> The statement in the article that the networking developers "are, instead, developing a simpler, protocol-independent mechanism by which the hardware can support any protocol with checksum offloading." does not give any indication of the degree of overlap or discussion between the developers of that mechanism and the set of people who design networking hardware. Developing a mechanism for offloading functionality to networking hardware without working with hardware developers is like developing a specification for a new syscall without talking to kernel developers.<br> <p> One question that Linux networking needs to be dealing with is "why are an increasing number of users bypassing the Linux networking stack entirely, such as to get more performance or smaller size?". DPDK and its performance, and lwIP/uIP and their size, are demonstrations that the Linux networking stack fails to meet the requirements of many potential users. In an ideal world, either those shouldn't exist at all because Linux already meets their requirements, or the Linux network stack should be designed to better integrate frameworks like those and bring them into the fold.<br> <p> I don't see an *obvious* reason why Linux's networking stack needs to be significantly larger than lwIP, or significantly slower than DPDK. Today it is, but that doesn't seem like an innate property.<br> </div> Tue, 08 Dec 2015 19:12:37 +0000