bufferbloat in the fediverse

Posted Nov 30, 2022 12:06 UTC (Wed) by crschmidt (guest, #162445)
In reply to: bufferbloat in the fediverse by mtaht
Parent article: Microblogging with ActivityPub

The number of Mastodon instances that have enough traffic to worry about network-level congestion is relatively small. Anything outside the top 100 largest instances likely isn't going to be interesting from that perspective. (For example, jwz's post is not a problem for the Mastodon servers, which are each only fetching one copy of a small amount of data, at all; the problem is that there are 4300 different servers that all make that same request to the _origin_, ie, jwz's webserver. Link fetching in the Mastodon/Plemora part of the Fediverse is effectively a user-controllable botnet. Oops.)

Most of Mastodon's performance problems for medium-size instances over the past month stemmed from the relatively naive configuration of its backend worker queues (where, by default, all tasks are running in the same queue, with no prioritization of handling inbound / outbound posts over lower priority background tasks).

There are exceptions to this: Large instances like Mastodon.social (881k users) and even small-ish but highly active servers like Hachyderm.io (30k users) have sufficient utilization to have real performance constraints ... but most of that isn't in serving the _users_ (which is relatively low cost), but rather in processing the overall flow of information from the broader Fediverse (especially things like caching media) into the server, as far as I have seen.

Put another way: network jitter really doesn't matter when the backend API that I'm fetching from is going to take 10-15 seconds to return 60KB of data; that amount of content could be delivered over a 56k modem in that time window.

bufferbloat in the fediverse

Posted Dec 1, 2022 3:15 UTC (Thu) by mtaht (subscriber, #11087) [Link] (10 responses)

I really disagree with that. Any sizable upload or download will induce noticable load on the network once that file begins to transfer. Moderating the impact of that with (as one example, disabling ecn and enabling bbr) is three lines in sysctl.conf or more simply, by
propagating a file into /etc/sysctl.d/10-lowerbloat.conf

net.core.default_qdisc=fq_codel
net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_ecn=0

why is it so hard to get folk to do that much?
Sure, measuring what happens via tcpdump, etc takes a bit of time, but nothing compared to how immediately I would hope these lines would help the fediverse.

bufferbloat in the fediverse

Posted Dec 1, 2022 9:30 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (4 responses)

It's hard because it's really not common knowledge. No one talks about it, unlike fq_codel, which was very much a topic picked up by a lot of blog posts, by media outlets, and most importantly, it was made the default by the either the kernel itself or by distros.

Defaults matter so much.

For example, I've never heard of the recommendation of disabling ECN in favor of enabling BBR. I tried a quick Google search, and there wasn't really a lot of good information out there that proposed exactly this together with a easy-to-digest rationale. Sure, there are somewhat scientific explanations of how BBR works, going into details about packet loss & stuff.

Nor are there articles discussing in which type of situation these settings matter (only on machines connected to the internet? all machines? why?).

I'd love to read more about this. Do you have any pointers?

Bufferbloat is a rather complex topic. I'm not surprised that most sysadmins don't know much if anything about it, even less how to fix it properly.

In short: it really, really isn't obvious that this is something that Jane Sysadmin should do.

bufferbloat in the fediverse

Posted Dec 1, 2022 14:57 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

There is original BBR commit message with links to ACM paper:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

bufferbloat in the fediverse

Posted Dec 1, 2022 16:19 UTC (Thu) by mtaht (subscriber, #11087) [Link] (2 responses)

The BBR folk at google use ECN differently than RFC3168 and disable negotation of it entirely... while apple clients tend to request it, and it's enabled by default across the rest of the internet. There was a patch rejected long ago that did the right thing here, and other patches to just make BBR obey RFC3168, also rejected... I am seeing a lot of BBR uptake (say, 11% of websites), without also disabling ECN negotiation, which means that fq_codel, cake, etc go around marking packets madly instead of dropping them, to no observable effect when BBR is in play.

Mess. My rightest answer would be to make BBRv1 do RFC3168-style ecn, perhaps not dropping the rate by half as per that spec, but by initiating a probe phase. Others differ.

As for recommendations as to good defaults, a lot of the non-controversial fixes have landed in the kernel and become defaults, tsq, bql, pacing, fq_codel, etc.

and, BBR has been shown to be a good step forward for many applications, and I do think it's probably a better transport than cubic for the kinds of long running, periodically bursty, autonomous applications like those in the fediverse, and especially over lte. It's also been shown to be problematic in multiple respects, but BBRv2 is hung up in testing, partially hinged on the non-backward compatible treatment of ECN in it.

It would be nice if more people were aware of these issues, taking packet captures, and worrying about the impacts on the network.

I'm sorry that the amount of "publicity" and discussions these problems have got is not easily visible on today's internet. Perhaps talking about it here will help, and a few more beleaguered sysadmins and users lean in. Google does not index mailing lists like ietf's tsvwg or bufferbloat.net's bloat list all that well, anymore, with over a decades worth of discussion on each.

bufferbloat in the fediverse

Posted Dec 1, 2022 18:09 UTC (Thu) by mtaht (subscriber, #11087) [Link]

The BBR mailing list.

I am not sure if this BBRv2 status is up to date

bufferbloat in the fediverse

Posted Dec 2, 2022 18:15 UTC (Fri) by intelfx (subscriber, #130118) [Link]

This should be in all the wikis and in form of clearly digestible HOWTOs with rationales for administrators of all skill levels.

With all the immense respect owed to the great work you're doing to tackle the bufferbloat problem — it's a really, really, really obscure problem and, the solutions are not really discoverable unless you make it your explicit objective.

bufferbloat in the fediverse

Posted Dec 1, 2022 10:04 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

It's hard because it's a change from defaults, and changes are always scary. If those changes are always a win, then they need to be the defaults, so that people use them automatically - I note that with defaults on Fedora Linux KDE spin, I get one of your three preferred changes by default:

$ sysctl net.core.default_qdisc net.ipv4.tcp_congestion_control net.ipv4.tcp_ecn
net.core.default_qdisc = fq_codel
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_ecn = 2

The trouble with asking people to make changes from the defaults is that it's never quite clear why something doesn't work as expected, unless you fully understand the changes. So why make a change when not changing works just fine from your perspective?

bufferbloat in the fediverse

Posted Dec 1, 2022 16:51 UTC (Thu) by mtaht (subscriber, #11087) [Link] (3 responses)

Well, there was an effort to enable cubic + ecn (vs bbr - ecn) also.

More recently I'm happy to have heard that "backpressure" for microservices, within a machine, has arrived in the kernel as of 6.0, which renders part of the issues debated above moot. ( cilium is bragging about this. ) On the other hand, all the developers of microservices I've met so far, seem to think backpressure exists for other things outside the box, or other protocols outside of tcp, and it *doesn't.

I've tried to engage with the cilium folk a couple times now.

bufferbloat in the fediverse

Posted Dec 2, 2022 14:16 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

I think, personally, that a core part of the issue is that most of the fixes for bufferbloat involve making metrics worse for a few edge cases (packet loss counters when at saturation, for example), in return for a big improvement that's not hugely visible in metrics for the vast majority of traffic.

And so, where you're asking people to change from the defaults, you're asking them to regress one or two metrics they've "always" paid attention to, without improving other metrics they pay attention to. This is obviously a bad things to do - why make the numbers worse? - and thus you struggle unless your changes are the defaults (at which point, when they look into their tweaks, they find they make the metric better at the expense of something they actually care about).

Basically the traditional confusion between a metric and an outcome :-(

bufferbloat in the fediverse

Posted Dec 2, 2022 14:53 UTC (Fri) by gioele (subscriber, #61675) [Link] (1 responses)

> I think, personally, that a core part of the issue is that most of the fixes for bufferbloat involve making metrics worse for a few edge cases (packet loss counters when at saturation, for example), in return for a big improvement that's not hugely visible in metrics for the vast majority of traffic.

Isn't that addressed by publicizing another contrasting metric?

"Yes packet loss counters went 2.7% up, but 95%-ile latency is down by 84.3%!"

bufferbloat in the fediverse

Posted Dec 2, 2022 15:37 UTC (Fri) by farnz (subscriber, #17727) [Link]

My experience of debloating a network that's not a bottleneck is that that's not actually what you get; packet loss goes up 4%, but 99-%ile throughput and latency are unchanged. And it's hard to explain that this is actually a win - that what I've actually done is get you to a point where instead of needing expensive network upgrades when traffic doubles, you can hold off until traffic triples - because that's something in the far future.

Fortunately, the network I debloated was my home network, so while I can see that no metrics have improved significantly, I'm happy that I'll be able to run on the existing hardware even as demand increases.