LWN: Comments on "TCP small queues and WiFi aggregation — a war story" https://lwn.net/Articles/757643/ This is a special feed containing comments posted to the individual LWN article titled "TCP small queues and WiFi aggregation — a war story". en-us Thu, 16 Oct 2025 11:47:57 +0000 Thu, 16 Oct 2025 11:47:57 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net MTU https://lwn.net/Articles/758233/ https://lwn.net/Articles/758233/ farnz <p>You do, but they're not using IEEE standard Ethernet (jumbo frames implies not IEEE standard) - and WiFi standards (including frame aggregation) are written to get high performance when using IEEE standard Ethernet. <p>Hence frame aggregation rather than high MTUs - a high MTU for performance means being outside the IEEE standard, while a 1500 MTU allows you to be inside the standard. Mon, 25 Jun 2018 17:40:52 +0000 MTU https://lwn.net/Articles/758231/ https://lwn.net/Articles/758231/ raven667 <div class="FormattedComment"> That seems bogus, I'm pretty sure I have many pieces of equipment using jumbo frames, whatever IEEE has in their written specs or whatever they "like".<br> </div> Mon, 25 Jun 2018 17:28:18 +0000 MTU https://lwn.net/Articles/758221/ https://lwn.net/Articles/758221/ farnz <p>Nope. The issue is that you cannot have an MTU above 1500 on Ethernet without breaking the IEEE specs for Ethernet and for WiFi. You are simply not allowed a jumbo MTU on the Layer 2 link, and the IEEE won't accept changes to 802 series standards that increase the user MTU beyond 1500. <p>IPv6 is not relevant here - it's an IEEE decision because even in IPv4, with router fragmentation allowed, the IEEE doesn't like it. Mon, 25 Jun 2018 16:20:40 +0000 MTU https://lwn.net/Articles/758219/ https://lwn.net/Articles/758219/ raven667 <div class="FormattedComment"> That's what I first though about this but the issue is not that you can't have jumbo 9000 byte MTU on the local layer2 link, either Ethernet or WiFi (although I think WiFi standard only supports 1500), which would be negotiated between endpoints as part of the TCP MSS, it's that intervening links over the Internet at some point are likely to only permit 1500 byte frames, so somewhere along the line you would need to fragment the layer3 packets, which is not allowed for IPv6 which relies on PMTU discovery to find the correct MSS/MTU that can cleanly make it through the whole path, leading to needing aggregation on the first WiFi hop for efficiency, to make up for not requiring upstream routers to fragment/reassemble. You could probably sidestep this if you had a jumbo-clean path, and you weren't concerned about stations monopolizing airtime with large frames, but that's unlikely unless you control the whole infrastructure between endpoints.<br> </div> Mon, 25 Jun 2018 16:01:06 +0000 MTU https://lwn.net/Articles/758140/ https://lwn.net/Articles/758140/ farnz <p>Nope; the answer is IEE 802.3 Ethernet. WiFi (IEEE 802.11) is designed to transparently interoperate with 802.3 Ethernets. The IEEE has declared that the Ethernet MTU is fixed at 1500 bytes[1]; this implies that WiFi per-frame MTUs are also fixed at 1500 bytes. Given that it is a hard requirement for WiFi that the frame MTU is no more than 1500 bytes, you need things like aggregation to get a decent speed. <p>If larger frames were permitted on 802.11, then you would not be able to bridge 802.11 with IEEE standard 802.3; while it's common to support jumbo frames on Ethernet, this is technically a non-standard extension, and IEEE standard 802.11 can't assume that any Ethernet it is connected to will permit jumbo frames. <p>[1] While the IEEE 802.3 MTU is 1500 bytes, they also now require all equipment to handle frames of up to 2000 bytes in total size, to allow for headers, checksums, VLAN tags etc. WiFi is similar - 2304 byte maximum MSDU frame, of which 1500 bytes maximum is user MTU, and the other 804 bytes are reserved for VLAN tags etc. Mon, 25 Jun 2018 09:16:15 +0000 MTU https://lwn.net/Articles/758055/ https://lwn.net/Articles/758055/ meuh <div class="FormattedComment"> I asked myself why not increasing the IP MTU (Maximum Transmission Unit) instead of relying on WiFi Frame Aggregation behavior ? I guess the answer is IPv6: as IPv6 doesn't expect routers to fragment packets, a typical TCP connection over the Internet involve packets no larger than 1280 bytes. So the WiFi hardware has to handle such small packets, hence the Frame Aggregation feature.<br> <p> </div> Sat, 23 Jun 2018 18:16:23 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/758041/ https://lwn.net/Articles/758041/ mtaht <div class="FormattedComment"> the qualcomm work was first documented by a good ccc talk, the video for which I cannot find right now.<br> <p> But: <a href="https://osmocom.org/projects/quectel-modems/wiki">https://osmocom.org/projects/quectel-modems/wiki</a><br> <p> and the slides from that talk: <a href="https://fahrplan.events.ccc.de/congress/2016/Fahrplan/system/event_attachments/attachments/000/003/151/original/Dissecting_modern_%283G_4G%29_cellular_modems.pdf">https://fahrplan.events.ccc.de/congress/2016/Fahrplan/sys...</a><br> <p> </div> Fri, 22 Jun 2018 23:40:58 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/758011/ https://lwn.net/Articles/758011/ kronat <div class="FormattedComment"> <font class="QuotedText">&gt; Some chipsets (like quantenna's) actually wedge an entire linux stack into their chip.</font><br> <font class="QuotedText">&gt; qualcomms's LTE modems do also.</font><br> <p> Do you have a reference for these statements? I am investigating a similar problem in 3GPP networks. Thanks!<br> </div> Fri, 22 Jun 2018 09:52:49 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/758010/ https://lwn.net/Articles/758010/ cagrazia <div class="FormattedComment"> <font class="QuotedText">&gt; I'd be curious how fast UDP on the other operating system went, to know if it topped out at the same 100Mb/s.</font><br> <p> We did a very quick TCP and UDP test on a Windows 8 machine, with a kind of iperf tool (honestly, it was tricky and messy to configure): throughput results were oscillating between 90 and 95 Mbps, while the TCP RTT was very unstable as well as the UDP jitter.<br> </div> Fri, 22 Jun 2018 08:27:03 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/758007/ https://lwn.net/Articles/758007/ cagrazia <div class="FormattedComment"> <font class="QuotedText">&gt; Which USB/WiFi 802.11ab/g/n dongles were used?</font><br> <p> The exact chipsets we used in our tests are the Atheros AR9271 (ath9k_htc driver), the Atheros AR9580 (ath9k), and Atheros QCA9880v2 (ath10k). For space constraints, we presented here only the results coming from the ath9k_htc device, but we had a similar positive outcome with all the three chipsets.<br> </div> Fri, 22 Jun 2018 08:12:01 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757988/ https://lwn.net/Articles/757988/ mtaht <div class="FormattedComment"> I am pleased to say that hooks into qualcomm's 802.11ac firmware have appeared sufficient to mostly mitigate the bufferbloat problem they had there with the code at the mac80211 layer. Certainly I have hope their internal firmware will improve.<br> <p> If I had any one wish for "smart firmware", it would be only that there was enough smarts in the wifi and lte hardware/firmware to handle no more than 4ms worth of queuing and real time processing, and let the kernel handle the rest within its constraints for interrupt latency.<br> <p> BQL accomplished this for ethernet (sub-1ms there, actually)<br> <p> fq_codel for wifi ( <a href="https://www.usenix.org/system/files/conference/atc17/atc17-hoiland-jorgensen.pdf">https://www.usenix.org/system/files/conference/atc17/atc1...</a> ) gets it down to two aggregates, which can take up to ~5ms each, at any achieved "line" rate.<br> <p> We can do better than this with better control of txops on the AP, and certainly the algorithms above can exist, offloaded, in smarter hardware, which is happening on several chipsets I'm aware of.<br> <p> PS The new sch_cake actually can run ethernet, shaped to 1gbit, at lower latencies than anything that uses BQL - at a large cost in cpu overhead, but not as much as you might think - it works at that rate on a quad core atom, for example.<br> <p> <a href="http://www.taht.net/~d/cake/rrul_be_-_cake-shaped-gbit-quad-long-smoother.png">http://www.taht.net/~d/cake/rrul_be_-_cake-shaped-gbit-qu...</a><br> <p> vs sch_fq:<br> <p> <a href="http://www.taht.net/~d/cake/rrul_be_-_fq-quad-long-smoother-1.png">http://www.taht.net/~d/cake/rrul_be_-_fq-quad-long-smooth...</a><br> <p> The principal advantage of cake, here, even though now capable running at speeds and latencies like this, is to defeat other black box token bucket shapers on a link, however, at much lower rates, with a corresponding reduction in cpu cost to (at sub-100mbit) the level of "noise".<br> </div> Thu, 21 Jun 2018 19:05:48 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757979/ https://lwn.net/Articles/757979/ excors <div class="FormattedComment"> Power efficiency seems important too - you don't want to wake up Linux on the big high-performance main CPU just so it can choose to ignore a packet, when you could run that logic in the firmware on the already-awake and just-fast-enough CPU in the networking chip instead.<br> <p> And portability - you don't want to maintain multiple separate copies of your million lines of driver code (plus regression tests etc) for Linux, Windows, Apple, Fuchsia, the several obsolete Linux versions your customers still use, etc, when it's much easier to put almost all the code in firmware so the OS-specific driver is just a thin wrapper. Plus the Linux community will likely be much happier if you upstream that wrapper and leave the firmware opaque, than if you attempt to upstream your million lines of cross-platform-ish code that doesn't follow the Linux coding style and has an ugly abstraction layer for OS-specific bits.<br> <p> None of that stops you making the thick firmware open source, though.<br> </div> Thu, 21 Jun 2018 16:43:16 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757976/ https://lwn.net/Articles/757976/ mtaht <div class="FormattedComment"> Thinner firmware for 802.11ac would be nice, particularly in an age when the main<br> cpu chip is multicore and much faster than anything built into the wifi chip.<br> <p> but wifi has some really hard real-time constraints that require dedicated cpus onboard, and once you start doing that the temptation to wedge all your functionality there has thus far been overwhelming for vendors.<br> <p> So far as I know qualcom's 802.11ac devices use a proprietary R/T OS inside.<br> <p> Some chipsets (like quantenna's) actually wedge an entire linux stack into their chip.<br> <p> qualcomms's LTE modems do also.<br> </div> Thu, 21 Jun 2018 15:59:25 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757975/ https://lwn.net/Articles/757975/ mtaht <div class="FormattedComment"> From eric dumazet and toke on the bloat mailing list:<br> <p> From eric dumazet<br> On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote:<br> <font class="QuotedText">&gt; Dave Taht &lt;dave.taht@gmail.com&gt; writes:</font><br> <font class="QuotedText">&gt; </font><br> <font class="QuotedText">&gt;&gt; Nice war story. I'm glad this last problem with the fq_codel wifi code</font><br> <font class="QuotedText">&gt;&gt; is solved</font><br> <font class="QuotedText">&gt; </font><br> <font class="QuotedText">&gt; This wasn't specific to the fq_codel wifi code, but hit all WiFi devices</font><br> <font class="QuotedText">&gt; that were running TCP on the local stack. Which would be mostly laptops,</font><br> <font class="QuotedText">&gt; I guess...</font><br> <p> Yes.<br> <p> Also switching TCP stack to always GSO has been a major gain for wifi in my tests.<br> <p> (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and not having<br> GSO is considerably inflating the truesize/payload ratio)<br> <p> <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...</a><br> tcp: switch to GSO being always on<br> <p> I expect SACK compression to also give a nice boost to wifi.<br> <p> <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...</a><br> tcp: add SACK compression<br> <p> Lastly I am working on adding ACK compression in TCP stack itself.<br> </div> Thu, 21 Jun 2018 14:47:46 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757815/ https://lwn.net/Articles/757815/ josh <div class="FormattedComment"> <font class="QuotedText">&gt; The nominal transfer rate of the dongles is 150Mb/s, but what we saw on the screen was disappointing: an upload iperf connection, no matter which options were used, was able to reach only 40Mb/s. Using another operating system as a client, we were able to achieve 90Mb/s, leaving out a problem with the server. [...] To stress-test the equipment, we started a UDP transmission at a ludicrous speed. Not so surprisingly, we arrived almost at 100Mb/s.</font><br> <p> I'd be curious how fast UDP on the other operating system went, to know if it topped out at the same 100Mb/s.<br> </div> Tue, 19 Jun 2018 17:50:30 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757790/ https://lwn.net/Articles/757790/ johan <div class="FormattedComment"> I wish the kernel devs had better testing (in this case regression testing).<br> If they had they would likely have found issues like these a bit earlier.<br> Obviously it's a very hard task though considering how many devices there are to test.<br> </div> Tue, 19 Jun 2018 13:29:52 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757789/ https://lwn.net/Articles/757789/ shiftee <div class="FormattedComment"> Excellent article with a nice positive outcome.<br> <p> Dongles based on this chipset are available<br> <a href="https://www.thinkpenguin.com/gnu-linux/penguin-wireless-n-usb-adapter-gnu-linux-tpe-n150usb">https://www.thinkpenguin.com/gnu-linux/penguin-wireless-n...</a><br> and<br> <a href="https://www.olimex.com/Products/USB-Modules/MOD-WIFI-AR9271-ANT/">https://www.olimex.com/Products/USB-Modules/MOD-WIFI-AR92...</a><br> <p> If I remember correctly there was a FOSS enthusiast working for Atheros who convinced them to release the firmware code but he has since left the company<br> </div> Tue, 19 Jun 2018 08:04:01 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757786/ https://lwn.net/Articles/757786/ Beolach <div class="FormattedComment"> I don't know what specific dongles they used, but the main thing to look for is the chipset used, which as mentioned in the article were Atheros chipsets supported by the ath9k / ath9k_htc drivers. Even if you get a different brand of dongle, if it has an Atheros chipset supported by these drivers you should be able to get similar results. As the article mentioned, these ath9k chipsets are fully open-source, both w/ the driver software &amp; the firmware, which allows much easier &amp; faster improvements like this. Dave Taht also used ath9k chipsets for his CeroWRT Bufferbloat / Make-WiFi-Fast projects, for this same reason.<br> <p> Sadly, to the best of my knowledge there are no 802.11ac chipsets w/ both open-source drivers &amp; firmware - I believe even the ath10k 802.11ac chipsets have closed firmware blobs. :-( I would love to hear about any I've missed.<br> </div> Tue, 19 Jun 2018 06:56:19 +0000 TCP small queues and WiFi aggregation — a war story https://lwn.net/Articles/757783/ https://lwn.net/Articles/757783/ pabs <div class="FormattedComment"> Which USB/WiFi 802.11ab/g/n dongles were used?<br> </div> Tue, 19 Jun 2018 03:35:20 +0000