LWN: Comments on "Avoiding unintended connection failures with SO_REUSEPORT" https://lwn.net/Articles/853637/ This is a special feed containing comments posted to the individual LWN article titled "Avoiding unintended connection failures with SO_REUSEPORT". en-us Sat, 01 Nov 2025 09:30:49 +0000 Sat, 01 Nov 2025 09:30:49 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/855858/ https://lwn.net/Articles/855858/ bernat <div class="FormattedComment"> The idea of listen(0) was to then allow you to drain the remaining connections. I remember you proposing this simple solution but your patch was rejected because &quot;this should be done with BPF.&quot;<br> </div> Sun, 09 May 2021 06:49:47 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854968/ https://lwn.net/Articles/854968/ smurf <div class="FormattedComment"> That depends on whether you need kernel support for it. I could get by with an IPC socket to some other server to whom I can send the file descriptors returned from accept()ing these connections.<br> <p> Or maybe we want a &quot;sock_inject(listener,conn)&quot; syscall that adds an open socket into a listener&#x27;s queue. You could do to another process, just open /proc/‹serverpid›/fd/‹bound_socket›. In fact something like this is also required for migrating a server to a different host, so it&#x27;d not be a single-use syscall.<br> </div> Fri, 30 Apr 2021 05:53:05 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854922/ https://lwn.net/Articles/854922/ wtarreau <div class="FormattedComment"> Ah OK but the internal problem remains the same: the difficulty of rehashing the queues without losing entries. listen(0), setsockopt(), shutdown(SHUTRD) etc were all among valid candidates for me as soon as I&#x27;d have had a reliable way to move these queues around :-/<br> </div> Thu, 29 Apr 2021 17:48:01 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854694/ https://lwn.net/Articles/854694/ smurf <div class="FormattedComment"> You misinderstand (or I miswrote): my intent was to fix the kernel so that &quot;listen(fd,0)&quot; simply closes the queue for new arrivals. Then the process would accept() the remaining open connections (and somehow deal with them), and shut down when that blocks. No new syscall, new common queueing mechanism, or other intrusive shenanigans required.<br> </div> Wed, 28 Apr 2021 05:32:28 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854552/ https://lwn.net/Articles/854552/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; LWN&#x27;s server, for example, sweats hard when keeping up with the comment stream that accompanies any article mentioning the Rust programming language. But some organizations run truly busy servers and have to take some extraordinary measures to keep up with levels of traffic that even language advocates cannot create.</font><br> <p> It looks like our editor&#x27;s great mood made him temporarily stray from LWN&#x27;s legendary rigor and thoroughness and drop a piece of critical information highly relevant to this article: in such a difficult server situation, are the Rust, C or C++ advocates causing the most traffic?<br> <p> </div> Mon, 26 Apr 2021 22:39:40 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854551/ https://lwn.net/Articles/854551/ NYKevin <div class="FormattedComment"> That is fairly likely; I&#x27;m basing my assumptions on the &quot;lame duck state&quot; documented here: <a href="https://sre.google/sre-book/load-balancing-datacenter/">https://sre.google/sre-book/load-balancing-datacenter/</a><br> <p> But that probably wouldn&#x27;t work very well for frontends.<br> </div> Mon, 26 Apr 2021 22:04:00 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854540/ https://lwn.net/Articles/854540/ sodabrew <div class="FormattedComment"> You may be mistaken on Google&#x27;s lack of use for SO_REUSEPORT, given that SO_REUSEPORT was developed by Tom Herbert at Google and merged for the 3.9 kernel release: <a href="https://lwn.net/Articles/542629/">https://lwn.net/Articles/542629/</a> <br> <p> It&#x27;s certainly possible that the original needs have changed and become solved in other ways in the eight years that have passed, or that you&#x27;re working on a different product that doesn&#x27;t have the same requirements as the one for which this feature was developed, but either way, your statement that (paraphrased) &quot;Google doesn&#x27;t use SO_REUSEPORT&quot; ought to have a modifier of either &quot;...anymore because...&quot; or &quot;...in my group that&#x27;s doing something different.&quot;<br> </div> Mon, 26 Apr 2021 18:37:34 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854532/ https://lwn.net/Articles/854532/ wtarreau <div class="FormattedComment"> It&#x27;s essential to deal with high the SYN rates that happen during SYN floods (i.e. all the time on high traffic sites). You want that part to be ultra-scalable. The difference can be 1 vs 10 Mpps.<br> </div> Mon, 26 Apr 2021 16:13:46 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854531/ https://lwn.net/Articles/854531/ wtarreau <div class="FormattedComment"> doesn&#x27;t work well, see my response above.<br> </div> Mon, 26 Apr 2021 16:11:46 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854530/ https://lwn.net/Articles/854530/ wtarreau <div class="FormattedComment"> No, that was among my first attempts. It simply results in the listen queue being zero for that socket and connection requests being dropped. Still I kept that as a work around for the RST for a few days because it managed to cause less RST by rejecting SYNs earlier when detecting the queue was full. But that was not possible after the lockless SYN patches anyway so there was no hope in this direction. Plus this resulted in a huge CPU usage for the user application that had to call accept() in loops and was not able to group the accepts any more.<br> </div> Mon, 26 Apr 2021 16:10:20 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854427/ https://lwn.net/Articles/854427/ mm7323 <div class="FormattedComment"> Why not just let a process do something like call listen(fd, 0) to indicate that it doesn&#x27;t want any more connections queued to the socket. Then it can wait a few seconds for any handshaking to complete and use non-blocking accept() to empty it&#x27;s queue before exiting or reloading config gracefully.<br> <p> This doesn&#x27;t handle the crashing process scenario, but that&#x27;s a lost cause anyway - a crash after accept() was called could still result in a dropped connection or invalid or partial response and agrevated users.<br> <p> This suggestion is pretty much the same as using a BPF program to steer connections as suggested in the article. It just makes a simpler API to achieve the same.<br> </div> Mon, 26 Apr 2021 04:45:08 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854417/ https://lwn.net/Articles/854417/ Sesse <div class="FormattedComment"> I&#x27;m a bit confused. What good does it do that SYN processing is lockless, if you&#x27;re still going to serialize it into a single listener?<br> </div> Sun, 25 Apr 2021 21:56:52 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854388/ https://lwn.net/Articles/854388/ smurf <div class="FormattedComment"> Couldn&#x27;t the unhashing be accomplished (with some minimal kernel support of course) by calling listen(fd,0)? Then either process the remaining connections or hand them off to another process. <br> </div> Sun, 25 Apr 2021 08:09:08 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854372/ https://lwn.net/Articles/854372/ NYKevin <div class="FormattedComment"> I should probably also emphasize that I deal with things at a much higher level than &quot;the specific flags we pass to individual syscalls when setting up sockets,&quot; so while I believe what I have written is generally correct, my understanding might be incomplete or incorrect with respect to SO_REUSEPORT in particular.<br> <p> Nevertheless, there are definitely lots of people who want to push their software frequently, and making that easier in one fashion or another can only be a good thing.<br> </div> Sat, 24 Apr 2021 23:28:53 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854369/ https://lwn.net/Articles/854369/ flussence <div class="FormattedComment"> Why not? Linux already has a bunch of similar complexity to prevent timestamp counters wrapping after hundreds of days of uptime, and we all know that would&#x27;ve only happened to evil lazy sysadmins that don&#x27;t apply security updates and totally deserve it. /s<br> <p> If this was hardware the maker would&#x27;ve put out a product recall for such a high failure rate.<br> </div> Sat, 24 Apr 2021 23:08:00 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854360/ https://lwn.net/Articles/854360/ ms-tg <div class="FormattedComment"> Thanks for posting this, I find a lot of value in the “this is what I’m seeing in my industry” posts that LWN attracts related to specific kernel intricacies.<br> </div> Sat, 24 Apr 2021 19:07:03 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854349/ https://lwn.net/Articles/854349/ wtarreau <div class="FormattedComment"> Yes that could be an option, I considered it but lost my way by then. The accept queue code became tricky since the reintroduction of SO_REUSEPORT, and I seem to remember that one of the difficulty was to pick pending connections, and another one was to unhash some of the queues while they were in use without losing what was in them. For whatever reason I remember not figuring how to allow a program to still pick what was left in a queue with that queue not being visible to the rx path that distributes incoming requests. But these are old memories, and I remember that Eric was quite concerned about my fiddling there because he was about to finish to kill the SYN queue lock.<br> <p> Also it&#x27;s important to keep in mind that we cannot afford to lose even a tenth of a percent of performance there, because such tricks would only be used during process reloads, and the code path they&#x27;re affecting is the one being the most stressed during DDoSes.<br> <p> Ideally we should just remove a queue in two steps. First step it should simply be unhashed and second step it should be closed. From what I remember, pending entries were killed inside the unhashing code, but I could be saying crap.<br> </div> Sat, 24 Apr 2021 12:12:10 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854347/ https://lwn.net/Articles/854347/ gracinet <div class="FormattedComment"> This is interesting to me because SO_REUSEPORT is as far as I know the only way to do prefork multiprocessing in gRPC servers, which is something that&#x27;s typically wanted if implemented in Python, because of the global interpreter lock (GIL).<br> <p> In some cases the performance impact of the GIL can be overstated, it really depends on the workload. But then, many Python applications aren&#x27;t designed to be thread-safe anyway because it&#x27;s generally believed that the GIL would make the effort useless.<br> <p> A common case for restarting worker processes would be to reach some limitation, such as memory footprint. Lots of applications in the wild have at least minor memory leaks.<br> <p> <p> </div> Sat, 24 Apr 2021 11:05:21 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854345/ https://lwn.net/Articles/854345/ Cyberax <div class="FormattedComment"> How about this method?<br> <p> Have two queues for each listening socket, one normal and one for &quot;extraordinary&quot; requests. In case of a process death during the closure process take the queued connections and redistribute them across extraordinary queues.<br> <p> Since these queues are special and are used infrequently, you can use simple locking-based algorithms there.<br> <p> Ideally this can be done transparently in kernel, but it can also be done with some userspace assistance. Processes willing to &quot;mop up&quot; connections can open a new listening socket and communicate (via setsockopt/ioctl) that it should be used for connection migration.<br> </div> Sat, 24 Apr 2021 10:29:26 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854341/ https://lwn.net/Articles/854341/ wtarreau <div class="FormattedComment"> I tried to do exactly this several times in the past but failed. We had the same problem with haproxy where high traffic users were constantly seeing a few resets being emitted when one queue was unbound. I tried to figure how to detach pending connections from a queue and reinject them into other queues but never managed to, that are was too complex.<br> <p> I understand Eric&#x27;s concerns (and he already expressed them to me by then). It is possible that it&#x27;s not the best solution, but it addresses a real issue in field that needs to be addressed.<br> <p> We worked around it by passing listening file descriptors between the old and the new process during reloads. All this just to avoid a bind+unbind cycle! It comes with its own set of limitations, of course.<br> <p> Also SO_REUSEPORT is not just used for this, the initial purpose was to allow multiple processes to bind to the same port and avoid black out periods. It used to work fine in 2.2 and was removed in 2.4. I had to maintain the patch to reintroduce it till someone else proposed a variant in 3.9 which also implemented the multiqueue balancing. But this is an essential feature in highly available environments.<br> <p> </div> Sat, 24 Apr 2021 08:46:11 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854335/ https://lwn.net/Articles/854335/ NYKevin <div class="FormattedComment"> Speaking as a Google SRE, we restart servers &quot;to effect a configuration change&quot; all the damn time. We push new code into production literally every single week[1] unless there&#x27;s a holiday or our error budget is depleted. Every push involves (slowly, carefully[2]) restarting all running instances of the server. Now, in practice, SO_REUSEPORT is probably not the most relevant flag in the world for us, but that&#x27;s mostly just because we&#x27;ve already solved this problem (i.e. &quot;don&#x27;t drop in-flight requests&quot;) at other levels of abstraction, and so asking the kernel for help is less useful. But any shop that&#x27;s less aggressively containerized than us[3] would probably find this sort of thing Nice To Have, if they want to do frequent releases.<br> <p> [1]: This is my experience on one team managing a small number of services. It is not necessarily representative; I know for a fact that other teams often have wildly different release cadences.<br> [2]: <a href="https://sre.google/workbook/canarying-releases/">https://sre.google/workbook/canarying-releases/</a><br> [3]: <a href="https://sre.google/sre-book/production-environment/">https://sre.google/sre-book/production-environment/</a><br> </div> Sat, 24 Apr 2021 04:45:21 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854332/ https://lwn.net/Articles/854332/ Cyberax <div class="FormattedComment"> That&#x27;s not necessarily true. For example, we were using plenty of Let&#x27;s Encrypt certs on internal NAS-like servers. We could have used self-signed certs, but maintaining our own CA and installing it on all computers was a bigger hassle.<br> </div> Sat, 24 Apr 2021 02:43:56 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854331/ https://lwn.net/Articles/854331/ clugstj <div class="FormattedComment"> If you are using Let&#x27;s Encrypt, you server isn&#x27;t that busy.<br> </div> Sat, 24 Apr 2021 02:39:58 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854330/ https://lwn.net/Articles/854330/ Cyberax <div class="FormattedComment"> Every couple of weeks with Let&#x27;s Encrypt.<br> </div> Sat, 24 Apr 2021 02:38:10 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854329/ https://lwn.net/Articles/854329/ clugstj <div class="FormattedComment"> The article says &quot;the server is being restarted to effect a configuration change or to switch to a new certificate&quot;. How often does this happen?<br> </div> Sat, 24 Apr 2021 02:37:22 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854322/ https://lwn.net/Articles/854322/ HenrikH <div class="FormattedComment"> The sockets/connections that the patch set is about have not reached userspace yet when the process is restarted so there is nothing an application can do here. It&#x27;s about connections that have been assigned to a specific process by the kernel but where said process have not yet called accept() to get them when that process was restarted/stopped/crashed.<br> </div> Sat, 24 Apr 2021 01:07:46 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854315/ https://lwn.net/Articles/854315/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; IIRC, the purpose of SO_REUSEPORT wasn&#x27;t to permit multiple threads to dequeue connections; they could already do that, and if there was lock contention this was just a QoI issue. </font><br> SO_REUSEPORT is needed to allow multiple _processes_ to dequeue connections.<br> <p> The problem with accepting connections from multiple threads is that they are still effectively serialized, because all file operations take an implicit lock to allocate a file descriptor.<br> <p> </div> Fri, 23 Apr 2021 23:39:30 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854310/ https://lwn.net/Articles/854310/ wahern <div class="FormattedComment"> The article mentions that &quot;the TCP accept code has been reworked to run locklessly&quot;. IIRC, the purpose of SO_REUSEPORT wasn&#x27;t to permit multiple threads to dequeue connections; they could already do that, and if there was lock contention this was just a QoI issue. Rather, I think the primary issue was efficient polling and resolving the stampeding herd problem. The reason an incoming connection is immediately assigned to a specific queue is so that only that descriptor (or descriptors if dup&#x27;d) will signal readiness while still avoiding introducing stalling and fair dispatch dilemmas, especially in the context of polling as opposed to threads actually waiting inside accept. The classic way to implement a multi-threaded accept in userspace while avoiding thundering herds was to ensure only a single thread was waiting in accept or polling on the accept descriptor at any one time. SO_REUSEPORT simply moved assignment earlier in the pipeline while largely preserving semantics.<br> <p> BSD supported SO_REUSEPORT long before Linux did, albeit support for TCP seems to be undocumented. Rather than round-robin, though, only the most recent binding is assigned connections. When that goes away the previous one starts to see connections again. However, at least on macOS queued connections are still lost on close. I see FreeBSD added SO_REUSEPORT_LB which does round-robin; not sure if it suffers from the lost connection problem.<br> <p> </div> Fri, 23 Apr 2021 23:29:09 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854299/ https://lwn.net/Articles/854299/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; I wonder if the right thing might be to have SO_REUSEPORT switch to actually dup()ing the single socket to the caller&#x27;s fd. Is there per-socket information that is allowed to be different among the sockets listening on the same port that would cause behavior changes in that situation?</font><br> The problem is that you&#x27;ll be funneling all the connections through effectively one thread, because all the file operations take a process-wide lock. This is what SO_REUSEPORT was designed to avoid.<br> </div> Fri, 23 Apr 2021 20:58:39 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854286/ https://lwn.net/Articles/854286/ ibukanov <div class="FormattedComment"> I do not see how this is useful in practice. nginx has been supporting live updates without dropping any connection for ages with careful protocol to transfer the file descriptors to another process. systemd made that very straightforward to implement as the process can store descriptors before the restart in the pool provided by systemd. Those busy sites surely can implement something like that allowing to preserve not only listening sockets with or without SO_REUSEPORT but also accepted ones. <br> <p> As for crashing servers loosing incoming queue is the least of worries as the crash is clear sign that the system misbehaves. And for absolute robustness one can transfer important sockets to another process in the crash handler.<br> </div> Fri, 23 Apr 2021 19:15:38 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854280/ https://lwn.net/Articles/854280/ mss <div class="FormattedComment"> <font class="QuotedText">&gt; So, we are worrying about 1 in a billion connections to a busy web server failing?</font><br> Where did you get this number from? Is it stated somewhere in the patch series?<br> <p> A TCP connection could fail for many other reasons, too.<br> We definitely don&#x27;t want to add additional ones, as randomly failing server connections give poor user experience.<br> <p> </div> Fri, 23 Apr 2021 19:10:14 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854282/ https://lwn.net/Articles/854282/ smurf <div class="FormattedComment"> It&#x27;s not just web sockets this is useful for.<br> <p> The connection in question has already been accepted and there&#x27;s a server willing to work with it. Dropping it is just plain rude, esp. as the client has no way to decide whether the server crashed or not. This is bad. You need retry code in the client (which otherwise could just blindly assume that the server crashed). The retry introduces latency you might want to avoid.<br> <p> Also the client might be a load balancer which now thinks that your server just crashed. This is a bad idea, esp. if you use a sharded data set because the requests now go to &quot;cold&quot; machines.<br> </div> Fri, 23 Apr 2021 19:07:47 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854270/ https://lwn.net/Articles/854270/ xi0n <div class="FormattedComment"> AFAIK it’s associated with a particular socket but either way, it does look like legacy assumption. SO_REUSEPORT would likely be simpler in implementation (incl. possibly the patch set discussed here being unnecessary) if those queues were maintained on a per-bindpoint basis instead.<br> <p> (This would raise a question what to do with the second argument to listen(). Since the number given there is defined more like a hint than actual limit, it’s probably a minor issue, though).<br> </div> Fri, 23 Apr 2021 18:29:56 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854241/ https://lwn.net/Articles/854241/ iabervon <div class="FormattedComment"> I wonder if the right thing might be to have SO_REUSEPORT switch to actually dup()ing the single socket to the caller&#x27;s fd. Is there per-socket information that is allowed to be different among the sockets listening on the same port that would cause behavior changes in that situation?<br> <p> I assume the now-recommended userspace code would bind the address in a single process and pass the bound socket over a unix domain socket to other processes (or fork after binding it), and it seems like this situation wouldn&#x27;t be too hard to replicate even when userspace didn&#x27;t ask like that, assuming that there aren&#x27;t any visible differences.<br> <p> I guess the max queue length may need to be adjusted in order to enqueue the same number of incoming connections total, and that might be noticed by the other local processes?<br> </div> Fri, 23 Apr 2021 16:16:35 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854237/ https://lwn.net/Articles/854237/ Tomasu <div class="FormattedComment"> I wonder why the queue has to be associated with a process at all till it&#x27;s been &quot;accepted&quot;. Probably just legacy assumptions.<br> </div> Fri, 23 Apr 2021 15:42:30 +0000 Avoiding unintended connection failures with SO_REUSEPORT https://lwn.net/Articles/854236/ https://lwn.net/Articles/854236/ clugstj <div class="FormattedComment"> So, we are worrying about 1 in a billion connections to a busy web server failing? I can&#x27;t see this being worth the added complexity.<br> </div> Fri, 23 Apr 2021 15:35:13 +0000