|
|
Log in / Subscribe / Register

How implementation details become ABI: a case study

How implementation details become ABI: a case study

Posted Oct 2, 2014 7:41 UTC (Thu) by quotemstr (subscriber, #45331)
In reply to: How implementation details become ABI: a case study by zblaxell
Parent article: How implementation details become ABI: a case study

I had the same thought. The addition of "deleted" to the filename is very ugly, as in-band signaling tends to be. Why not note the deleted status in fdinfo instead?


to post comments

How implementation details become ABI: a case study

Posted Oct 3, 2014 7:03 UTC (Fri) by viro (subscriber, #7872) [Link] (2 responses)

Because that was years before /proc/*/fdinfo/*. Again, those symlinks had never been followed by traversing *any* string as a pathname. And they still do not.

A bit more history: the original inspiration for that thing had been, as far as I can tell, either late Research Unix or Plan 9 and there it had been something different. For one thing, no symlinks in there - as in "no such thing as a symlink". For Plan 9 counterpart see
http://fxr.watson.org/fxr/source/port/devdup.c?v=PLAN9
It's a single directory with a pair of entries for each descriptor in accessing process' descriptor table - as in 0, 0ctl, 1, 1ctl, etc.
opening 42 in there is equivalent to dup(42); opening 42ctl (can be done only for read) gives you a bunch of stats on file - basically, something similar to combination of our self/fdinfo/42 and readlink of self/fd/42.

Note that it only covers what would, for us, been /dev/fd. I.e. only your own descriptor table. Moreover, unlike us they do, indeed, give the *same* opened struct file (Chan in their terms). As dup(2) would. We do not - we get a new open of the fs object behind the old open file. Among other things, it means different behaviour of lseek() - with dup() the current position is shared between old and new descriptors, with new open it's independent. They go for dup-like, we - for open-like.

In part it's a historical accident, in part - the result of different API for ->open(), in part - decision to make them look like symlinks and use readlink() to indicate which file it is.

Another thing to keep in mind is that in Plan 9 it was *NOT* a part of procfs. Their procfs has <pid>/fd, but unlike ours it's a regular file, not a directory. With line per descriptor. So the needs of lsof-like stuff are covered by that. In our variant it was first a per-process directory, full of magic symlinks and then (many years later) - two directories; fd/* and fdinfo/*. The latter - with regular files in it, more or less similar to contents of dupfs <n>ctl or lines in procfs <pid>/fd.

Note that back when it started we *couldn't* report the path. It simply hadn't been available. It's older than dcache by several years (0.99 vs. 2.1.40ish, resp.). So we had pseudo-symlinks resolving to (back then) inodes behind the open descriptors, with readlink(2) providing information for lsof and friends. Basically, st_dev + st_ino of the same inodes.
Following those "symlinks" had never depended on the resulting string, or, indeed, the numbers themselves.

In 2.1.43 or so we got dentries (and after a while they even started to work), and with that we got a way to calculate those pathnames. Cheaply. As the result, those pseudo-symlinks (back then used only for lsof, fuser and their ilk) suddenly acquired a much prettier readlink(2) output. Which required a few utilities updated between 2.0 and 2.2. And it was a very specialized API, with very few users. Moreover, said users were already full of ifdefs - e.g. on *BSD those *had* to be suid-root; nosing in the kernel data structures requires that. And yes, it was that scary.

That had all the makings of bad API - few users and lack of anything in that area that would be even remotely portable on other Unices. Indeed, comparison with peeking into kernel data structures from userland made for a _very_ low plank.

Shortly after that somebody noticed that there had been no way to tell an unlinked file from something created in its place later. Result: appending " (deleted)" in the end of readlink(2) output. Kludgy, of course, but with so few (and specialized) users... Again, the string reported by readlink(2) has nothing whatsoever to resolving the sucker.
Moreover, any user *must* look at the string it gets; think what should be returned for pipes and sockets.

Note, BTW, that back then *all* dentry names had been external ones and in all cases names were swapped on d_move(). In 2.1.116 it had been changed - short names (majority of them, obviously) got embedded into dentry itself, giving better locality and lower cache footprint in dcache lookups. At that point short names started to be copied.

Nobody really considered "which unlinked file had this been" as even remotely sane question; indeed, how the hell can one tell how many files with that name had been created, opened and unlinked? OK, you know that it used to be /tmp/foo and got unlinked since then; which one of them?

Alas, there's no API that won't be abused. Nevermind that it wasn't consistent, nevermind that results were actually useful for detecting anything reliably, it worked well enough for hell knows how many scripts.
Until they broke...

In theory, we can't even make it consistent, lest some whiny wonder comes with a script that relied on "swap if either name is long" semantics - the filenames that got renamed being known to be long enough. And relying on mv /tmp/<name1> /tmp/<name2> leaving you with /tmp/<name1> (deleted) coming from readlink(2). I'll believe it when I see it; IMO it's extremely unlikely, though...

How implementation details become ABI: a case study

Posted Oct 9, 2014 22:39 UTC (Thu) by weue (guest, #96562) [Link] (1 responses)

Among others, "//deleted/PATH" (note the double slash) or "deleted/PATH" (relative path) would have been unambiguous.

Whoever accepted "X (deleted)" is an idiot.

How implementation details become ABI: a case study

Posted Oct 9, 2014 22:53 UTC (Thu) by viro (subscriber, #7872) [Link]

Whoever it had been, the whole point is that this ship has long sailed. Perhaps the main lesson is that single-use APIs tend to suck. "It's just for fuser and lsof" should've been a major red flag. For _those_ this (deleted) thing was probably more convenient - no need to do anything with the string on the userland side. And it wouldn't be a problem, except that there's no miracles and it *had* gained other users. Worse, now we can't change it without breaking existing userland, as this story has demonstrated.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds