Shared inodes

Posted Jun 16, 2011 19:44 UTC (Thu) by martinfick (subscriber, #4455)
Parent article: Debating overlayfs

What we really need from a union type filesystem is for identical inodes in the bottom layer, to somehow show up as the same inode with respect to the memory management subsystem, no matter where they appear in upper layers. This needs to happen even if these show up in different top layer mount points to be truly beneficial. This would be a huge boon for sharing memory amongst processes in separate containers which run the same underlying executable. The containers could share a readonly bottom layer and yet have individual writable top layers in their individual namespaces, preventing them from clobbering the other container's files while still sharing memory efficiently on common executables. Of course, I am not sure how that could actually be done... :(

Shared inodes

Posted Jun 16, 2011 21:46 UTC (Thu) by ndye (guest, #9947) [Link]

Of course, I am not sure how that could actually be done... :(

Neither do I, and you paint the benefits well . . .

. . . but now your headache has gone viral.
;-)

Shared inodes

Posted Jun 17, 2011 6:38 UTC (Fri) by neilbrown (subscriber, #359) [Link] (3 responses)

You are in luck - overlayfs provides exactly what you want. Assuming I am understanding you correctly.

When you access (e.g. open) a file (not a directory) in a read-only mode which doesn't exist in the upper layer, you get exactly the file from the lower layer. If you fstat the file descriptor it will look exactly like the lower-layer file - st_dev, st_ino and all. It really is the lower-level file.

So much so that if someone else opens the file for 'write', it will get copied into the upper layer and they will get a handle on the file in the upper layer which they can then change, but you will still have a handle on the lower level file which, of course, will not see those changes.

Shared inodes

Posted Jun 17, 2011 16:14 UTC (Fri) by martinfick (subscriber, #4455) [Link] (2 responses)

> Assuming I am understanding you correctly.

So with overlayfs, if I have 1000 containers each with their own upper layer mounted separately on top of the same lower layer, and each one of them runs the same copy of apache, will the linux MM system share most of the memory for those apache executables, as much as if they all ran off of the same file in the lower layer directly?

If so, this will be a major boon for "virtualisation" on linux, extremely memory efficient and lightweight containers. This would allow linux containers in the mainline to share some of the ideas and similar benefits to the linux vserver project's "unification".

Shared inodes

Posted Jun 19, 2011 22:58 UTC (Sun) by Sho (subscriber, #8956) [Link] (1 responses)

Don't shared subtrees get you a long part of the way, too?

Shared inodes

Posted Jun 19, 2011 23:43 UTC (Sun) by neilbrown (subscriber, #359) [Link]

Shared subtrees are certainly part of the solution - and an important part.

If the Linux/Unix file hierarchy had been design with sufficient foresight (which would have been total impractical in reality) then you probably could do it all with shared subtrees. Those files that might need to be configure per-machine or per-instance would be in one subtree (a bit like /var maybe) and all the other files would be elsewhere. The one subtree would be copied for each instance, the rest would be shared.

But we don't have such a forward looking design .. and it is entirely possible that differing needs are such that such a design would be impossible. So configuration files are often mixed in with non-configuration files. A solution is needed which makes copies of the first type, but shares the second type.

One could imagine a forest-of-symlinks which could map all 'configuration' files into one subtree, but symlinks don't always (ever?) provide perfect semantics. If you update a config file by writing a new copy then renaming it, you break the symlink.

You could do the symlinks in the other direction: with symlinks for all the files that you want to share, but that would have it's own problems I suspect.

So overlayfs complements shared subtrees and allows you to selectively have some files shared and some files private within the same directory. And it achieved this almost transparently.

Shared inodes

Posted Feb 25, 2012 3:18 UTC (Sat) by scientes (guest, #83068) [Link]

What about vhashify http://linux-vserver.org/util-vserver:Vhashify ?
IOW hard-links on steroids.
Now, making this work in full-virtualization environments is not exactly the same problem....and certainly can't be as elegant.