LWN: Comments on "io_uring, SCM_RIGHTS, and reference-count cycles" https://lwn.net/Articles/779472/ This is a special feed containing comments posted to the individual LWN article titled "io_uring, SCM_RIGHTS, and reference-count cycles". en-us Thu, 04 Sep 2025 04:51:53 +0000 Thu, 04 Sep 2025 04:51:53 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/780626/ https://lwn.net/Articles/780626/ Alex.C <div class="FormattedComment"> For interest in sample code, here is a short code : <a rel="nofollow" href="https://github.com/acassen/socket-takeover">https://github.com/acassen/socket-takeover</a><br> <p> Use-case here was to provide a seamlessly takeover from one process to another for critical software upgrade (used for components on mobile core-network).<br> </div> Sun, 24 Feb 2019 14:52:51 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/780510/ https://lwn.net/Articles/780510/ scientes <div class="FormattedComment"> The is some pretty atrocious code in systemd-journald (that I wrote) that uses proc to get the capabilities of the logging process for every logged message. My concern that this was too slow for the hot path was ignores and it was merged. It would be nice if SCM_RIGHTS of similar can allow removing this horrible code.<br> </div> Fri, 22 Feb 2019 04:29:26 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/780010/ https://lwn.net/Articles/780010/ rweikusat2 <div class="FormattedComment"> Some minor additions: A process doesn't need to "connect to itself" to create an SCM_RIGHTS loop. Using to unconnected AF_UNIX sockets should work, too. There's also a socketpair systemcall which creates a pair of connected AF_UNIX sockets.<br> </div> Sun, 17 Feb 2019 21:10:09 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/780007/ https://lwn.net/Articles/780007/ andyc <div class="FormattedComment"> <a href="https://lwn.net/ml/linux-fsdevel/20190211085533.35190404@lwn.net/">https://lwn.net/ml/linux-fsdevel/20190211085533.35190404@...</a><br> </div> Sun, 17 Feb 2019 15:37:16 +0000 epoll https://lwn.net/Articles/779911/ https://lwn.net/Articles/779911/ dw <div class="FormattedComment"> but epoll's semantics are different, it's basically a "weak reference". Closing the FD causes the poller to unregister it, which makes sense, as the only alternative is epoll_wait() yielding events on a file for which no FD exists<br> </div> Fri, 15 Feb 2019 15:12:14 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779889/ https://lwn.net/Articles/779889/ ermo <div class="FormattedComment"> Yeah, maybe the Linux Foundation would be willing to sponsor some work by you and corbet, which could also become a series of articles here on LWN?<br> <p> Everyone wins?<br> </div> Fri, 15 Feb 2019 13:40:41 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779853/ https://lwn.net/Articles/779853/ rgmoore <blockquote>I don't know how to get from braindumps like that one to the set of coherent docs.</blockquote> <p>I don't know for sure either, but I would bet a good starting place would be somebody (like the Linux Foundation) hiring a technical writer to do most of the work for you. Documentation will continue to lag behind code until somebody is willing to pay real money to get it done. Thu, 14 Feb 2019 18:31:56 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779777/ https://lwn.net/Articles/779777/ kay <div class="FormattedComment"> corbet? ;)<br> </div> Thu, 14 Feb 2019 10:32:13 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779773/ https://lwn.net/Articles/779773/ Freeaqingme <div class="FormattedComment"> Have you considered simply putting all those notes online with a huge disclaimer that it may very well be outdated at the moment of publishing? At least some parts would probably still be relevant, and it may help people see and understand why certain strategies were used/changed through the times.<br> <p> As a bonus, someone may pick up on those notes and use them as a starting point to convert into perhaps more coherent, up2date, documentation.<br> </div> Thu, 14 Feb 2019 08:43:37 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779767/ https://lwn.net/Articles/779767/ unixbhaskar <div class="FormattedComment"> Thanks, a bunch Al ! sometimes people need this kind of explanations to a get a kick on their butt to do well. People (obviously including me!) don't know so many things and your commentary made so much good to the wider audience to understand the intricacies.<br> </div> Thu, 14 Feb 2019 04:11:04 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779747/ https://lwn.net/Articles/779747/ viro <div class="FormattedComment"> FWIW, gnarly locking issues are mostly in socket-related stuff. I'd been reading through net/unix/*.c for the last week or so and it looks like the code didn't get a serious review (which, alas, pretty much has to involve people who are *not* intimately familiar with it - amazing how much crap gets caught by asking yourself "why is this done (at all|that way)?" and trying to figure it out) for quite a while ;-/<br> <p> As for the documentation... TBH, I've lost count of how many times I'd sat down to put it together; the usual result is a series of tree-wide searches to verify the rules being described, followed by getting sidetracked to fix some bogosity caught by those. Sometimes in VFS proper, sometimes in filesystems, sometimes it's drivers or networking or ipc or kvm or... getting creative. With any luck the results do make the kernel better, but by the time the dust settles the original analysis needs to be re-verified (call graphs changes, locking conditions at relevant call sites, etc.). Lather, rinse, repeat - usually it's 2-4 cycles a year ;-/<br> <p> Result is an impressive pile of notes (coherent pieces of text interspersed with edited and annotated git grep output, call graphs, need-to-fix-that-bogosity-someday notes, CoC-violating rants, etc.)<br> <p> The thing is, it's not just VFS - _some_ stuff got encapsulated sanely, but quite a bit of data structures are played with by very odd places in the kernel in very odd ways. For example, I hadn't been able to find anyone who would admit understanding arch/ia64/kernel/perfmon.c, and that thing used to play with struct file life cycle in extremely irregular ways - had been quite a thorn for more than a decade until it got disabled in Kconfig (and seeing that nobody has complained since then, it'll hopefully go away, and good riddance).<br> <p> I don't know how to get from braindumps like that one to the set of coherent docs. Note that this one does not go into<br> * any kind of details on modifying descriptor tables and primitives for work with descriptors (iterating, etc.); relatively irrelevant for this thread, definitely needed in any documentation of descriptor tables.<br> * -&gt;flush() method and notifying file of getting disconnected from descriptors (the only relevance to that thread would be "no, it's not usable for anything in this case - you'll keep getting false positives from hell every time something calls system(3)"; for any documentation of struct file life cycle it would obviously need to be included)<br> * struct file lifecycle (all that is covered is basically from successful open to final fput(); alloc_file_...() and friends are not covered at all and neither are the things _after_ the final fput())<br> * use of struct files * as opaque ID for POSIX locks/leases/etc. and related merry horrors in network/cluster filesystems (belongs in discussion of struct files lifetime and places where it can and cannot be poked in)<br> * RCU-related issues (fortunately, fairly self-contained area)<br> * lifecycle for unix_sock and related locking (I'd been nowhere near up-to-date on that; digging through this code proves to be... fruitful, as in "interesting bugs keep turning up", some in places like aushit). Again, it's a separate topic, but it *is* getting involved here, especially now that Jens is copying gobs of that stuff into his code; we'll need to turn that into a small set of well-defined primitives, or that will be a source of massive PITA for years to come.<br> * higher-level discussion of the nature of objects involved (descriptors vs. opened files vs. files being accessed) - that one I probably can fish out of the pile, remove the unprintable parts and turn into a coherent text, but that material is a lot better covered by various textbooks, so I decided to skip it.<br> <p> So it was a mashup of at least three different pieces, with different level of details and rather uneven style; it's still useful as concentrated background information relevant to the problem at hand, but turning that into sane documentation is not an easy task ;-/ Taken together and turned into readable text it would grow into a counterpart of a couple of chapters in The Daemon Book. And that's a fairly small part of the interfaces - sure, it's the first one you get through on a lot of syscalls, but...<br> <p> I'll be glad to assist with getting such docs done (supplying missing pieces, answering questions regarding the relationship between the topics involved, etc.), but I'm afraid that I'm not up to doing it all on my own. Another thing to keep in mind: quite a few things can change, quite possibly - as the direct result of trying to document the situation. Freezing the kernel interfaces while the description gets written is not going to happen - not for something with that wide a surface. Especially since all that stuff is reachable for sufficiently enterprising driver willing to poke its tender bits into machinery (and recreate the Modern Times scene with trip through the gears, often enough).<br> </div> Wed, 13 Feb 2019 23:46:57 +0000 io_uring, SCM_RIGHTS, and reference-count cycles https://lwn.net/Articles/779743/ https://lwn.net/Articles/779743/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; It then counts how many references to each of those sockets come from SCM_RIGHTS datagrams attached to sockets in this set. Any socket that has references coming from outside the set is reachable and can be removed from the set.</font><br> A tracing garbage collector, in other words.<br> </div> Wed, 13 Feb 2019 20:35:23 +0000 epoll https://lwn.net/Articles/779740/ https://lwn.net/Articles/779740/ axboe <div class="FormattedComment"> Which is a lot less elegant imho, and introduces extra conditionals in the code.<br> </div> Wed, 13 Feb 2019 19:30:27 +0000 epoll https://lwn.net/Articles/779733/ https://lwn.net/Articles/779733/ abatters <div class="FormattedComment"> The act of registering one file with another file reminded me of epoll. Out of curiosity, I went to look at how epoll handles this problem. The answer is that epoll doesn't increment the reference count when a file is added to a epoll set. Instead, epoll hooks into the file cleanup path to automatically remove a file from all epoll sets when its reference count drops to 0. See:<br> <p> linux/fs/eventpoll.c::eventpoll_release_file()<br> </div> Wed, 13 Feb 2019 17:14:55 +0000