LWN: Comments on "Accelerating netfilter with hardware offload, part 1" https://lwn.net/Articles/809333/ This is a special feed containing comments posted to the individual LWN article titled "Accelerating netfilter with hardware offload, part 1". en-us Sat, 08 Nov 2025 03:11:32 +0000 Sat, 08 Nov 2025 03:11:32 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Ternary Computing https://lwn.net/Articles/811228/ https://lwn.net/Articles/811228/ brouhaha <blockquote> At least Wikipedia (the world's One True Single Source Of Truth, obviously) says it's typically implemented with a second bit rather than relatively exotic multi-level logic. </blockquote> The TCAM used in network switches, routers etc. definitely works that way, storing the ternary values as two bits each. It is ternary in the same sense that BCD is decimal; both are encoded using only binary digits. A TCAM cell is effectively much more than twice the size of a normal SRAM cell because it also contains the comparator logic. This is one reason why TCAM chips are orders of magnitude more expensive than an equivalent amount of SRAM. <p> It would be possible to build SRAM using multilevel cells, but most likely that would result in larger and slower memory than using binary. <p> On the other hand, two-bit-per-cell masked ROM technology exists. Each cell has transistors chosen from four transistor sizes resulting in four possible on-state resistances. Reading from it works the same way as MLC flash; the sense amplifier feeds analog comparators to distinguish the levels. The microcode of the original Intel 8087 numeric coprocessor was stored in <a href="http://www.righto.com/2018/09/two-bits-per-transistor-high-density.html">two-bit-per-cell masked ROM</a>. Fri, 31 Jan 2020 21:07:18 +0000 200 cycles or less https://lwn.net/Articles/810643/ https://lwn.net/Articles/810643/ Cyberax <div class="FormattedComment"> You can easily get more than 1 million packets per second on general CPUs without any offloading, this works out to more than 10GBps easily.<br> </div> Mon, 27 Jan 2020 17:47:28 +0000 200 cycles or less https://lwn.net/Articles/810577/ https://lwn.net/Articles/810577/ robbe <div class="FormattedComment"> The 200 cycles number is really a ballpark figure and should not be taken for granite. A simple optimisation is to use mainly jumbo frames, which raises your per-packet budget to more than 1000 cycles.<br> <p> I also think that no OS achieves CPU-involved forwarding speeds of even 10Gbps without a lot of NIC offloading (coalescing, TSO, etc.)<br> </div> Mon, 27 Jan 2020 14:23:26 +0000 200 cycles or less https://lwn.net/Articles/809862/ https://lwn.net/Articles/809862/ ghane <div class="FormattedComment"> Thanks, I will apply for my licence today :-)<br> <p> </div> Fri, 17 Jan 2020 06:55:31 +0000 Accelerating netfilter with hardware offload, part 1 https://lwn.net/Articles/809823/ https://lwn.net/Articles/809823/ BenHutchings <div class="FormattedComment"> TCAM is necessary for the most general filter matching, but somewhat more restricted packet filtering can be done using a hash table with open addressing.<br> </div> Thu, 16 Jan 2020 18:50:14 +0000 Ternary Computing https://lwn.net/Articles/809703/ https://lwn.net/Articles/809703/ leromarinvit <div class="FormattedComment"> On second thought, maybe if you can implement TCAM with DRAM, you could get the X state by charging the capacitor a little less (shorter / via a higher resistance path). Then design the comparator such that it accepts both 0 and 1 if the other input is in this "middle band". If the refresh cycle is fast enough that a 1 won't decay into an X (or an X into a 0), then maybe this could work.<br> <p> But I'm sure people much smarter than me have tried to optimize TCAM for many years, and are already using ideas much better than I can think of, so I'll stop now.<br> </div> Thu, 16 Jan 2020 08:30:52 +0000 Ternary Computing https://lwn.net/Articles/809686/ https://lwn.net/Articles/809686/ leromarinvit <div class="FormattedComment"> Interesting perspective, thanks! That is indeed a lot of SRAM.<br> </div> Thu, 16 Jan 2020 00:02:43 +0000 Ternary Computing https://lwn.net/Articles/809685/ https://lwn.net/Articles/809685/ Sesse <div class="FormattedComment"> TCAM may be peanuts in a NIC with 256 entries, but not in a large switch/router. It's how switches manage to do route lookups in wirespeed; you have one entry (IIRC, typically 192 TCAM bits for matching, well, various stuff) per route, and then something like 512k routes. More in modern devices, now that the IPv4 routing table is larger than that… so think 1M routes, 192 bits for each, so now you have 192M SRAM cells and comparators to run in parallel! And each line card has the same amount! So if you could somehow design those with exotic logic instead of two bits, it would be a win.<br> <p> Someone once described TCAM to me as “the stuff you upgrade in your router, and then the power bill goes up”.<br> </div> Wed, 15 Jan 2020 23:51:02 +0000 200 cycles or less https://lwn.net/Articles/809678/ https://lwn.net/Articles/809678/ leromarinvit <div class="FormattedComment"> I think the 200 cycles number is just meant as a reminder that it's "not much" time per packet. The linked article seems to be talking about a single 3 GHz CPU. Obviously the available cycles vary with average packet length and CPU clock, and processing can be split over multiple cores. That is, of course, no reason not to try making the best use of the available cycles, since latency will suffer if you just rely on parallellism to stem the load.<br> </div> Wed, 15 Jan 2020 22:45:25 +0000 Ternary Computing https://lwn.net/Articles/809679/ https://lwn.net/Articles/809679/ Cyberax <div class="FormattedComment"> For MLCs it's implemented as a true multi-level device. It basically uses different charge levels to encode different bit combinations.<br> </div> Wed, 15 Jan 2020 22:36:49 +0000 Ternary Computing https://lwn.net/Articles/809673/ https://lwn.net/Articles/809673/ leromarinvit <div class="FormattedComment"> At least Wikipedia (the world's One True Single Source Of Truth, obviously) says it's typically implemented with a second bit rather than relatively exotic multi-level logic. That would have been my gut feeling as well, that designers would rather avoid complicating their design and process for saving what's essentially peanuts in transistor count.<br> <p> Also, this is SRAM. MLC flash works by storing different charge levels in the cell. The closest equivalent I can think of for SRAM would be different voltages - more or less impossible to achieve using a single supply, without first generating a second voltage from that. Which wastes chip area and power for no real gain, making the two-bit solution look even better in comparison.<br> </div> Wed, 15 Jan 2020 22:27:25 +0000 200 cycles or less https://lwn.net/Articles/809650/ https://lwn.net/Articles/809650/ hkario <div class="FormattedComment"> remember that those 200 cycles are for processing header only, the length of payload doesn't matter (here it's averaged over typical frame sizes)<br> <p> it's just like navigation: handling a 20t truck in principle is not different than a 3.5t truck<br> </div> Wed, 15 Jan 2020 19:00:55 +0000 200 cycles or less https://lwn.net/Articles/809596/ https://lwn.net/Articles/809596/ ale2018 <div class="FormattedComment"> Those 200 cycles seem to be a very short timeframe to do something useful. Can one implement a firewall, querying a database on some packets? Using FPGAs??<br> <p> Except for routers, to be able to communicate faster than one can think sounds nonsensical. Something like arriving before leaving...?<br> <p> </div> Wed, 15 Jan 2020 16:39:22 +0000 Ternary Computing https://lwn.net/Articles/809559/ https://lwn.net/Articles/809559/ Cyberax <div class="FormattedComment"> In TCAM the third level is more like a wildcard for addressing.<br> <p> But there are many other silicon devices that use multiple levels, like MLC flash cells. After all, the world is analog.<br> </div> Tue, 14 Jan 2020 21:48:02 +0000 Ternary Computing https://lwn.net/Articles/809555/ https://lwn.net/Articles/809555/ jccleaver <div class="FormattedComment"> I had no idea ternary memory and logic was being used at that low level. Brings to mind the aborted attempts to actually run full ternary computers (<a href="https://en.wikipedia.org/wiki/Ternary_computer">https://en.wikipedia.org/wiki/Ternary_computer</a>) up into the 1970s.<br> <p> Surprised to see that logic used there (although the 1/0/NULL of SQL is another example of modern usage) -- I wonder if ternary silicon is an area of research for this hardware.<br> </div> Tue, 14 Jan 2020 21:26:23 +0000