User: Password:
Subscribe / Log in / New account

A brief history of union mounts

A brief history of union mounts

Posted Jul 22, 2010 12:03 UTC (Thu) by obi (guest, #5784)
Parent article: A brief history of union mounts

One of the great tragedies of the UNIX file system interface is the enshrinement of readdir(), telldir(), seekdir(), etc. family in the POSIX standard...

I keep hearing how POSIX is broken in myriad ways; see for instance the recent discussion about locking at Lennart Poettering's blog.

Maybe we should start keeping a record of all these "Great Tragedies of POSIX". So maybe one day we can do something about it. Or are we really stuck with POSIX for all eternity?

(Log in to post comments)

A brief history of union mounts

Posted Jul 22, 2010 13:43 UTC (Thu) by neilbrown (subscriber, #359) [Link]

Yes, we need "Why POSIX made the Editor Grumpy". It could be a nice long series - with lots of guest editors probably.

But telldir/seekdir isn't something that POSIX got wrong. Maybe the cookie size (32bit) is a bit small, but that is as easy to overcome as the Y2K038 bug.

Either you require readdir to return all the entries in a directory in one hit, or you need a stable pointer into the directory so you can ask for the 'next' chunk.
The stable pointer doesn't *have* to be exposed to user-space, but if you are going to have any hope of supporting a network filesystem like NFS, then it has to be exposed to the network protocol, so it has to exist.

It isn't hard to design a directory layout which allows stable indexes - it just requires a bit of fore-thought.

It *is* hard to synthesis such pointers from a union of two directories as you cannot predict or control the pointers you get from each. However it is possible to create a stable solution.

Given that the current union-mount proposal requires "white-out" objects to be created in the on-top filessytem to make objects from the below filesystem disappear, it would not be unreasonable to instead require 'white-in' objects which make objects from the below filesystem appear.

This would require a 'copy-up' of the directory when it is read (though more typically, when the directory is changed) which is a bit more harsh than the 'copy-up' that is required of files on e.g. a chmod. But it would give reliable semantics and in many real cases would not be a real burden.

To be a little more explicit: The common case with union mount is (I expect) that you union-mount an empty filesystem on top of a read-only filesystem, and then make changes. Each time you make changes in a new directory you need would to copy-up that directory and all parents that have not yet been copied-up. The copy-up involves creating a white-in object in the top directory for each object in the bottom. (It is a little more complicated than white-out as you want to store the 'DT_*' type of the underlying object). Then further changes simply happen to the top level directory.
A readdir simply uses the top-level directory.
Any lookup which hits a white-in object (or name) continues the lookup in the underlying filesystem.

(unfortunately the margin is too small to contain my elegant implementation....)

A brief history of union mounts

Posted Jul 22, 2010 21:20 UTC (Thu) by nix (subscriber, #2304) [Link]

It could be a nice long series - with lots of guest editors probably.
Hell yes. And the suck extends to fairly simple areas. Just saying 'EINTR' and 'short reads' is enough to make anyone who's ever written even a trivial C program on a Unix platform wince. (What do you mean I need a horrible-looking for loop to read a file reliably?!)

A brief history of union mounts

Posted Jul 23, 2010 2:51 UTC (Fri) by neilbrown (subscriber, #359) [Link]

?? You don't get EINTR on read from a regular file - only pipes, sockets, char devices and similar things.

But in general I agree - signals make it very hard to write correct programs.

A brief history of union mounts

Posted Jul 24, 2010 18:16 UTC (Sat) by nix (subscriber, #2304) [Link]

You do get EINTR on read from a regular file if you're unlucky enough to have that file on a network device (e.g. NFS with intr turned on). And before you say 'don't do that then', before very recently we had a choice of turning intr on or losing the whole mount point and very shortly afterwards often the whole machine if the network went down. (And, yes, I have encountered both short reads and EINTR in NFS-based regular file reads on both Linux and Solaris. So it does happen.)

(Also, POSIX doesn't ban getting EINTR on reads from a regular file, so prudence dictates expecting it.)

A brief history of union mounts

Posted Jul 25, 2010 4:20 UTC (Sun) by neilbrown (subscriber, #359) [Link]

Fair point, though that is really an NFS issue rather than a general Posix issue. And NFS has a lot more than just that to answer for.

Posix has a concept of 'slow' and 'not slow' reads where 'slow' reads can result in a short read or EINTR, and disk IO is explcitly not a slow read. So if your file is on disk you cannot get EINTR.
I guess being on disk on another machine doesn't count. :-(

A brief history of union mounts

Posted Jul 31, 2010 20:27 UTC (Sat) by nix (subscriber, #2304) [Link]

I've heard this over and over again, but I've looked through the POSIX specs and I can't find it. No mention of slow reads, no mention that some devices are guaranteed not to get EINTR, no mention in the rationale either.

Now perhaps this is a de facto universal implementation detail, but as far as I can see it isn't in POSIX itself. (Maybe I just haven't looked in the right place?)

A brief history of union mounts

Posted Aug 1, 2010 10:01 UTC (Sun) by neilbrown (subscriber, #359) [Link]

It seems you are right.

appears to allow any read to be interrupted, and says in the "informative" section "The issue of which files or file types are interruptible is considered an implementation design issue. This is often affected primarily by hardware and reliability issues." which is singularly unhelpful.

I was basing my statements on "man 7 signal" which does talk about "slow" devices. Clearly this isn't normative....

As you say, POSIX by itself is enough to make one wince...

A brief history of union mounts

Posted Aug 4, 2010 22:45 UTC (Wed) by nix (subscriber, #2304) [Link]

Quite so :/

Even 'man 7 signal' says clearly that 'The details vary across Unix systems; below, the details for Linux', and that's not terribly useful really for the vast majority of software. (I suppose you can rely on it in mdadm ;} )

A brief history of union mounts

Posted Jul 23, 2010 20:01 UTC (Fri) by vaurora (guest, #38407) [Link]

Excellent idea - you've just described fallthru dentries. :)

The implementation of fallthrus is pretty small, around a hundred lines in main VFS and then you reuse the whiteout infrastructure in the client file systems.

A brief history of union mounts

Posted Jul 24, 2010 2:10 UTC (Sat) by neilbrown (subscriber, #359) [Link]

> Excellent idea - you've just described fallthru dentries. :)

Yes.... after writing that I went back through the original article, noticed 'fallthru' this time, and felt a bit sheepish.

I don't quite see either how you would implement fallthru using whiteout though, or why you would still want whiteout if you were using copy-up + fallthru..

It is a pity that a block-based COW solution is so inefficent - it is such a simple solution that would address many of the use-cases (not the NFS-as-underlying-filesystem case of course).

A brief history of union mounts

Posted Jul 25, 2010 4:18 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

Sure, POSIX has flaws. But at least it's not Win32. When you're frustrated by POSIX, take a deep breath and say that three times.

In the free world, there's no arbitrary 64-handle wait limit. You can rename and delete filenames while the files they point to are still open. Shared libraries are versioned. You have a variety of fork and spawn options instead of the horror that is CreateProcess. You have a real pty interface --- it's impossible to emulate a win32 console, or even make isatty do the right thing.

You don't have eldrich horrors like WSAAcceptEx. The kernel doesn't steal your threads to do its own book-keeping. For the love of god, you don't have CreateRemoteThread. IO redirection actually works --- programs seldom open /dev/tty directly, unlike the Windows world, where the canonical way to get colored output is to write it directly to the console buffer.

In POSIXland, you don't have drive letters. You don't need to be root to create a symlink. And you don't use backslashes for directory separators. (At least most Windows software accepts forward slashes too.) /dev/null exists only in /dev; you can create a file called "null" anywhere else. Try creating a file called "prn" anywhere on a Windows system.

Best of all, in POSIXland, programs accept a true vector of command-line arguments. In Windows, each program receives one string that's actually the whole command line and uses its own quoting and wildcard rules to interpret it. Naturally, this approach yields wildly unpredictable results.

So, of course POSIX can be improved. But it's hardly "broken in myriad ways" ; you can only say that from a position of merciful privilege.

A brief history of union mounts

Posted Jul 25, 2010 7:11 UTC (Sun) by bronson (subscriber, #4806) [Link]

Great post! A little perspective can be nice at times.

A brief history of union mounts

Posted Jul 31, 2010 20:41 UTC (Sat) by nix (subscriber, #2304) [Link]

It's notable how many of those faults are *completely unchanged* from the days I was doing DOS, back in the DOS 3.3 days.

MS really has strangled itself in the name of backward compatibility with a broken original system. Sure, we try to be compatible with older Unixes, but at least Unix was a sane base to build on.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds