LWN: Comments on "How implementation details become ABI: a case study" https://lwn.net/Articles/614057/ This is a special feed containing comments posted to the individual LWN article titled "How implementation details become ABI: a case study". en-us Fri, 03 Oct 2025 10:00:29 +0000 Fri, 03 Oct 2025 10:00:29 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net How implementation details become ABI: a case study https://lwn.net/Articles/615925/ https://lwn.net/Articles/615925/ jzbiciak <div class="FormattedComment"> Ok, I have nothing witty or insightful to add. But I have to say that's the most awesome executable name I've seen in a long time. Bravo!<br> </div> Sun, 12 Oct 2014 08:11:54 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/615707/ https://lwn.net/Articles/615707/ viro <div class="FormattedComment"> Whoever it had been, the whole point is that this ship has long sailed. Perhaps the main lesson is that single-use APIs tend to suck. "It's just for fuser and lsof" should've been a major red flag. For _those_ this (deleted) thing was probably more convenient - no need to do anything with the string on the userland side. And it wouldn't be a problem, except that there's no miracles and it *had* gained other users. Worse, now we can't change it without breaking existing userland, as this story has demonstrated.<br> </div> Thu, 09 Oct 2014 22:53:34 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/615702/ https://lwn.net/Articles/615702/ weue <div class="FormattedComment"> Among others, "//deleted/PATH" (note the double slash) or "deleted/PATH" (relative path) would have been unambiguous.<br> <p> Whoever accepted "X (deleted)" is an idiot.<br> <p> </div> Thu, 09 Oct 2014 22:39:26 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/615531/ https://lwn.net/Articles/615531/ kevinm <div class="FormattedComment"> It wouldn't matter if the original interface had been specified ("The value returned by readlink(2) is for presentational purposes only and should not be parsed.") or not: people would still have ignored it and written scripts depending on the implementation details, and those scripts would still have to be supported.<br> <p> The Linux culture is not to say "that userspace depended on something we didn't intend to be ABI, so it's OK to break it". ABI is de facto, not de jure.<br> </div> Thu, 09 Oct 2014 12:53:28 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/615509/ https://lwn.net/Articles/615509/ PaXTeam <div class="FormattedComment"> <font class="QuotedText">&gt; Was it reported upstream</font><br> <p> did you read the forum post?<br> <p> <font class="QuotedText">&gt; and just as important, was the patch submitted to LKML?</font><br> <p> no because it was a quick hack, not something upstream (or even we) would want in the long run. in fact they've fixed it differently in the end (not their hack but Al's refcount approach) as you can read it in this very article.<br> </div> Thu, 09 Oct 2014 10:32:58 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/615474/ https://lwn.net/Articles/615474/ andza <div class="FormattedComment"> Was it reported upstream and just as important, was the patch submitted to LKML? Otherwise I can only agree with the parent that it's mostly irrelevant. <br> </div> Thu, 09 Oct 2014 05:32:22 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614985/ https://lwn.net/Articles/614985/ viro <div class="FormattedComment"> As I said, something more fancy shouldn't be using start-stop-daemon in the first place. I really wonder how many common daemons take care about the races around the upgrade; stop/move new one in place/start is more robust...<br> It's not just config; anything from helper binaries to permissions on directories, etc. can become a surprise for old daemon binary.<br> </div> Sun, 05 Oct 2014 16:08:13 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614979/ https://lwn.net/Articles/614979/ JGR <div class="FormattedComment"> Presumably a daemon designed to be upgraded whilst running in this way would load configuration and other files at startup or on HUP specifically to avoid these issues.<br> <p> Stopping the daemon, upgrading, then restarting it introduces a small period of downtime, which can be undesirable.<br> </div> Sun, 05 Oct 2014 11:10:19 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614973/ https://lwn.net/Articles/614973/ viro <div class="FormattedComment"> Nope - in that case there are 3 or 4 dentries involved (depending on whether /tmp is a mountpoint). There's root directory of root filesystem. Name's empty, parent - itself. There's /tmp on root filesystem. Name being "tmp", parent - root of root filesystem. There's (possibly) root directory of whatever is mounted on /tmp. Again, name is empty, parent - itself. And there's /tmp/foo - name is "foo", parent - /tmp on root fs or root of filesystem mounted on /tmp. And it's unhashed since that rename.<br> <p> We start with that dentry and vfsmount of either root fs or that mounted on /tmp and walk towards root. When we reach the root of current vfsmount we just proceed to dentry and vfsmount of mountpoint and continue from there. When we reach the process' root (sensu chroot(2)) or global root (vfsmount that isn't mounted on anything) we stop.<br> <p> In this case the names we see along the way are "foo" and "tmp", and or dentry is unhashed, so d_path() produces "/tmp/foo (deleted)". And that's what readlink() on these guys returns. See fs/dcache.c:d_path() and the stuff next to it.<br> <p> The name of dentry is just a single pathname component. That "/tmp/foo (deleted)" isn't stored anywhere - it's calculated on demand.<br> </div> Sun, 05 Oct 2014 05:05:19 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614971/ https://lwn.net/Articles/614971/ viro <div class="FormattedComment"> *snort*<br> <p> OK, I stand corrected - their script definitely breaks on that one... for 64bit host. Anyway, I think it's bloody obvious that what they are doing is an awful kludge.<br> <p> Note that keeping the daemon running through the update is a very dubious approach; sure, the binary itself will stick around until we exit, but the rest of files in the package will get replaced, so if that daemon ever rereads them (e.g. in response to incoming request of some sort, or being asked to stop), the old binary will find new config/data/helper binaries/whatnot, with potentially spectacular results. Much safer to stop the sucker before replacing any files and restart it afterwards. And anything that does something more fancy (e.g. re-exec itself and transfer the internal state across that in one way or another) has no business using start-stop-daemon anyway.<br> <p> IOW, the entire "replace files, then stop the old processes" is a bad idea. And prior to replacements there's no need whatsoever for that kind of kludges. Mind you, I'd rather do stat() on the binary we are after, then looked for /proc/*/exe with stat() giving the matching st_dev/st_ino. Without bothering with readlink() on those guys. Has an extra benefit of doing the right thing when you have multiple links to the same binary, with different processes using different names...<br> <p> What they were doing is awful for a lot of reasons; sadly, that's not the criterion used in such situation ;-/ It worked for a long time, there's real-world userland code relying on it, so we get to keep it working. It's not quite the same as bug-for-bug compatibility - if nothing breaks when we fix inconsistent behaviour, we can go for it even if it was possible to write something that would break. Ditto if the code being broken is a rootkit or rootkit equivalent (i.e. relies on exploiting a security hole, by accident or not). But "the code we broke would've broken in a lot of other cases anyway" isn't an acceptable excuse. <br> </div> Sun, 05 Oct 2014 04:47:15 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614970/ https://lwn.net/Articles/614970/ magcius <div class="FormattedComment"> Ah, from the "/tmp/foo (deleted)" string, I thought it was the full file path stored.<br> </div> Sun, 05 Oct 2014 03:46:10 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614969/ https://lwn.net/Articles/614969/ Cyberax <div class="FormattedComment"> One of our clients used to have a "paracetomoxyfrysebendroneomycind" daemon. Doing some drug discovery calculations, appropriately.<br> </div> Sun, 05 Oct 2014 03:26:25 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614967/ https://lwn.net/Articles/614967/ viro <div class="FormattedComment"> Yes, it would (not that they were in any real danger of stepping into that one - seriously, 32-character filename for a daemon binary? And that would not include the pathname - just the last component that long).<br> <p> But yes, that's one of the reasons why their use of that trick had been bogus.<br> </div> Sun, 05 Oct 2014 03:14:47 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614965/ https://lwn.net/Articles/614965/ magcius <div class="FormattedComment"> Wait, if the behavior in old kernels changed based on how long the filename was, doesn't that mean ALT Linux would have just broken if I called my daemon something else or put it somewhere in /opt/com.foo.MyVendor/v6/ instead?<br> <p> This interface even changes depending on the architecture and kernel build options.<br> </div> Sun, 05 Oct 2014 02:20:31 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614914/ https://lwn.net/Articles/614914/ alonz I wonder&#160;&ndash;&#160;is the extra effort in Al's new approach (= reference-counting the names) justified for any reason except solving this corner-case? (It does look like a pretty big hammer, which will add more complexity&hellip;) Sat, 04 Oct 2014 01:16:32 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614766/ https://lwn.net/Articles/614766/ viro <div class="FormattedComment"> Because that was years before /proc/*/fdinfo/*. Again, those symlinks had never been followed by traversing *any* string as a pathname. And they still do not.<br> <p> A bit more history: the original inspiration for that thing had been, as far as I can tell, either late Research Unix or Plan 9 and there it had been something different. For one thing, no symlinks in there - as in "no such thing as a symlink". For Plan 9 counterpart see<br> <a href="http://fxr.watson.org/fxr/source/port/devdup.c?v=PLAN9">http://fxr.watson.org/fxr/source/port/devdup.c?v=PLAN9</a><br> It's a single directory with a pair of entries for each descriptor in accessing process' descriptor table - as in 0, 0ctl, 1, 1ctl, etc.<br> opening 42 in there is equivalent to dup(42); opening 42ctl (can be done only for read) gives you a bunch of stats on file - basically, something similar to combination of our self/fdinfo/42 and readlink of self/fd/42.<br> <p> Note that it only covers what would, for us, been /dev/fd. I.e. only your own descriptor table. Moreover, unlike us they do, indeed, give the *same* opened struct file (Chan in their terms). As dup(2) would. We do not - we get a new open of the fs object behind the old open file. Among other things, it means different behaviour of lseek() - with dup() the current position is shared between old and new descriptors, with new open it's independent. They go for dup-like, we - for open-like.<br> <p> In part it's a historical accident, in part - the result of different API for -&gt;open(), in part - decision to make them look like symlinks and use readlink() to indicate which file it is.<br> <p> Another thing to keep in mind is that in Plan 9 it was *NOT* a part of procfs. Their procfs has &lt;pid&gt;/fd, but unlike ours it's a regular file, not a directory. With line per descriptor. So the needs of lsof-like stuff are covered by that. In our variant it was first a per-process directory, full of magic symlinks and then (many years later) - two directories; fd/* and fdinfo/*. The latter - with regular files in it, more or less similar to contents of dupfs &lt;n&gt;ctl or lines in procfs &lt;pid&gt;/fd.<br> <p> Note that back when it started we *couldn't* report the path. It simply hadn't been available. It's older than dcache by several years (0.99 vs. 2.1.40ish, resp.). So we had pseudo-symlinks resolving to (back then) inodes behind the open descriptors, with readlink(2) providing information for lsof and friends. Basically, st_dev + st_ino of the same inodes.<br> Following those "symlinks" had never depended on the resulting string, or, indeed, the numbers themselves.<br> <p> In 2.1.43 or so we got dentries (and after a while they even started to work), and with that we got a way to calculate those pathnames. Cheaply. As the result, those pseudo-symlinks (back then used only for lsof, fuser and their ilk) suddenly acquired a much prettier readlink(2) output. Which required a few utilities updated between 2.0 and 2.2. And it was a very specialized API, with very few users. Moreover, said users were already full of ifdefs - e.g. on *BSD those *had* to be suid-root; nosing in the kernel data structures requires that. And yes, it was that scary.<br> <p> That had all the makings of bad API - few users and lack of anything in that area that would be even remotely portable on other Unices. Indeed, comparison with peeking into kernel data structures from userland made for a _very_ low plank.<br> <p> Shortly after that somebody noticed that there had been no way to tell an unlinked file from something created in its place later. Result: appending " (deleted)" in the end of readlink(2) output. Kludgy, of course, but with so few (and specialized) users... Again, the string reported by readlink(2) has nothing whatsoever to resolving the sucker.<br> Moreover, any user *must* look at the string it gets; think what should be returned for pipes and sockets.<br> <p> Note, BTW, that back then *all* dentry names had been external ones and in all cases names were swapped on d_move(). In 2.1.116 it had been changed - short names (majority of them, obviously) got embedded into dentry itself, giving better locality and lower cache footprint in dcache lookups. At that point short names started to be copied.<br> <p> Nobody really considered "which unlinked file had this been" as even remotely sane question; indeed, how the hell can one tell how many files with that name had been created, opened and unlinked? OK, you know that it used to be /tmp/foo and got unlinked since then; which one of them?<br> <p> Alas, there's no API that won't be abused. Nevermind that it wasn't consistent, nevermind that results were actually useful for detecting anything reliably, it worked well enough for hell knows how many scripts.<br> Until they broke...<br> <p> In theory, we can't even make it consistent, lest some whiny wonder comes with a script that relied on "swap if either name is long" semantics - the filenames that got renamed being known to be long enough. And relying on mv /tmp/&lt;name1&gt; /tmp/&lt;name2&gt; leaving you with /tmp/&lt;name1&gt; (deleted) coming from readlink(2). I'll believe it when I see it; IMO it's extremely unlikely, though...<br> </div> Fri, 03 Oct 2014 07:03:15 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614743/ https://lwn.net/Articles/614743/ viro <div class="FormattedComment"> Following those suckers gives you vfsmount/dentry of deleted object. They are *not* followed by interpreting the result of -&gt;readlink() (that's why -&gt;follow_link() is an independent method). What you get opening those is *not* dup()-style extra reference to opened file; it's independently opened file over the same filesystem object (IOW, lseek() on one of those doesn't affect another). Filesystem might refuse opening it, of course; local filesystems usually do not, with network ones it's up to server. For sockets it explicitly fails; for pipes and FIFOs you get the same semantics you'd get from extra opener of named pipe.<br> <p> Note that it's more than just /proc/*/fd/* - e.g. /proc/*/exe and /proc/*/cwd are also like that.<br> <p> " (deleted)" is a kludge, and a damn unpleasant one. But as this story shows, it's not something we can kill ;-/ It's before my involvement, but I suspect that original motivation was lsof/fuser. And behaviour for rename(2) victim was an accident - check the history circa 2.1.40s; back when this " (deleted)" thing had been introduced *all* names had been external and d_move() swapped them in all cases. It was more about the unlinked ones... memcpy()-instead-of-swap had been introduced in 2.1.116 and it was more of "we need to put something there; just memcpy() it over".<br> <p> No, it really had been an accident. BTW, prior to 2.1.43 those magic symlinks had the same semantics on follow_link, but gave "[%04x]:%u"<br> with st_dev and st_ino on readlink(). Next came an attempt to put something that usually gave pathname corresponding to the fs object in question, shortly followed by " (deleted)" tacked on the end of unlinked ones.<br> </div> Fri, 03 Oct 2014 02:19:06 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614738/ https://lwn.net/Articles/614738/ giraffedata <p> How would it have made a difference if the interface were written down? Surely they would have written down that this unusual symbolic link value should give a name the file had when it had one, so this change would still be a bug and have to be fixed. <p> I guess I'm missing the article's thesis, that this is a case of "unintended behaviors can become part of the kernel's ABI over time." It looks to me like the designers deliberately put the former name of the file in the symbolic link value; they didn't mean to put arbitrary garbage in there, that just happened by accident of implementation to be the former name of the file. Fri, 03 Oct 2014 01:28:37 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614613/ https://lwn.net/Articles/614613/ tomgj <div class="FormattedComment"> Another sad case of "hack something together, and whatever ends up being exposed gets called the 'interface'". This as opposed to the "actually write down what the interface is" school.<br> <p> Tellingly, the article does not refer to any interface specifications for the interface elements that are relevant in this "user-visible behaviour". This is not a criticism of the article, but rather of the development process it describes: one where the philosophy of specified interfaces is not understood, and where no proper distinction is made between an actual interface, and a piece of implementation that happens to be visible.<br> <p> The test should be whether the interface consumer was depending on incidental behaviour, rather than specified behaviour, of the interface. However, since there is not in Linux a culture of adequately specifying interfaces in the first place, we are not in a position where this test could be applied.<br> <p> It is interesting how bad Linux is at this. The old, proprietary Unixes came with manuals properly describing the interface specifications of each system, even if they were not always compatible with one another. Windows does better than Linux on this front, as do the BSDs.<br> <p> It's a good job the original API design approach of Linux was to implement a more or less coherent API that had already been specified (POSIX). It's unfortunate this hasn't carried through into a culture of properly writing down specifications for new interface elements as (or rather, before) they're introduced.<br> </div> Thu, 02 Oct 2014 15:47:03 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614545/ https://lwn.net/Articles/614545/ quotemstr <div class="FormattedComment"> I had the same thought. The addition of "deleted" to the filename is very ugly, as in-band signaling tends to be. Why not note the deleted status in fdinfo instead?<br> </div> Thu, 02 Oct 2014 07:41:54 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614521/ https://lwn.net/Articles/614521/ flewellyn <div class="FormattedComment"> I agree. To me, it makes sense that the /proc/self/fd/* symlinks, as well as /proc/$NUM/fd/*, would keep symlinks to the old name of a deleted file, if the descriptor is still open. Standard Unix semantics are that an unlinked file is not actually gone until the last FD pointing to it is closed, so it makes sense that any processes which still have it open should still have an idea of what it was called before deletion.<br> <p> </div> Thu, 02 Oct 2014 03:24:47 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614471/ https://lwn.net/Articles/614471/ zblaxell <div class="FormattedComment"> Does this mean there are now (or always were) subtle security bugs on ALT Linux that can be triggered by a few rename calls? What happens on those systems if I run a binary named "sshd (deleted)"? If you open /proc/$foo/fd/$bar, does it give you a copy of the existing open file descriptor, or is there a race condition attack because it's literally following the symlink?<br> <p> All of this is head-shakingly ugly to me. If anything, this incident looks like a golden opportunity to flush out people building on top of accidental ABI and get them talking to kernel devs about what interfaces they really need to do what they're trying to do (judging from a quick read of the bug reports, cgroups would make more sense for these use cases).<br> </div> Wed, 01 Oct 2014 21:34:45 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614402/ https://lwn.net/Articles/614402/ doogie <div class="FormattedComment"> Did you read the messages on that forum? It's the same bug.<br> </div> Wed, 01 Oct 2014 17:02:30 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614393/ https://lwn.net/Articles/614393/ BenHutchings <div class="FormattedComment"> If it was accidental that the old name of a deleted inode was readable, then why did procfs specifically support readlink() on them? That case was very clearly intended to work at one point, even if it ended up broken for longer names.<br> <p> </div> Wed, 01 Oct 2014 16:53:06 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614385/ https://lwn.net/Articles/614385/ lkundrak <div class="FormattedComment"> How is that relevant?<br> </div> Wed, 01 Oct 2014 16:17:22 +0000 How implementation details become ABI: a case study https://lwn.net/Articles/614352/ https://lwn.net/Articles/614352/ PaXTeam <div class="FormattedComment"> Piotr originally reported the problem on the grsec forums:<br> <a href="https://forums.grsecurity.net/viewtopic.php?f=3&amp;t=4031">https://forums.grsecurity.net/viewtopic.php?f=3&amp;t=4031</a> and once identified, spender fixed it 3 weeks ago already.<br> </div> Wed, 01 Oct 2014 15:32:57 +0000