More notes on reiser4

[Posted September 1, 2004 by corbet]

The article on reiser4 which appeared here last week drew a number of comments. One comment from Hans Reiser took LWN to task for not having started with a kernel tarball which was created from a reiser4 filesystem to begin with. It seems that reiser4 is highly sensitive to the order in which files are created, and using the wrong order does not show the filesystem in its best light.

Here is last week's table, with a new line for tests done starting with a reiser4-built tarball:

Filesystem	Test
Filesystem	Untar	Build	Grep	find (name)	find (stat)
ext3	55/24	1400/217	62/8	10.4/1.1	12.1/2.5
reiser4	67/41	1583/386	78/12	12.5/1.3	15.2/4.0
reiser4 (new)	57/35	1445/393	58/9.9	8.4/1.3	11.1/4.0

The results do show a significant difference in performance when the files are created in the right order - and the differences carry through all of the operations performed on the filesystem, not just the untar. In other words, the performance benefits of reiser4 are only fully available to those who manage to create their files in the right order. Future plans call for a "repacker" process to clean up after obnoxious users who insist on creating files in something other than the optimal order, but that tool is not yet available. (For what it's worth, restoring from the reiser4 tarball did not noticeably change the ext3 results).

Last week, the discussion about reiser4 got off to a rather rough start. Even so, it evolved into a lengthy but reasonably constructive technical conversation touching on many of the issues raised by reiser4.

At the top of the list is the general question of the expanded capabilities offered by this filesystem; these include transactions, the combined file/directory objects (and the general representation of metadata in the filesystem namespace), and more. The kernel developers are nervous about changes to filesystem semantics, and they are seriously nervous about creating these new semantics at the filesystem level. The general feeling is that any worthwhile enhancements offered by reiser4 should, instead, be implemented at the virtual filesystem (VFS) level, so that more filesystems could offer them. Some developers want things done that way from the start. If there is a consensus, however, it would be along the lines laid out by Andrew Morton: accept the new features in reiser4 for now (once the other problems are addressed) with the plan of shifting the worthwhile ones into the VFS layer. The reiser4 implementation would thus be seen as a sort of prototype which could be evolved into the true Linux version.

Hans Reiser doesn't like this idea:

Look guys, in 1993 I anticipated the battle would be here, and I build the foundation for a defensive tower right at the spot MS and Apple are now maneuvering towards. Help me get the next level on the tower before they get here. It is one hell of a foundation, they won't be able to shake it, their trees are not as powerful. Don't move reiser4 into vfs, use reiser4 as the vfs. Don't write filesystems, write file plugins and disk format plugins and all the other kinds of plugins, and you won't be missing any expressive power that you really want....

Somehow, over the years, Hans has neglected to tell the developers that he was, in fact, planning to replace the entire VFS. That plan looks like a difficult sell, but reiser4 could become the platform that is used to shift the VFS in the directions he sees.

Meanwhile, the reiser4 approach to metadata has attracted a fair amount of attention. Imagine you have a reiser4 partition holding a kernel tree; at the top of that tree is a file called CREDITS. It's an ordinary file, but it can be made to behave in extraordinary ways:

$ tree CREDITS/metas
CREDITS/metas
|-- bmap
|-- gid
|-- items
|-- key
|-- locality
|-- new
|-- nlink
|-- oid
|-- plugin
|   |-- compression
|   |-- crypto
|   |-- digest
|   |-- dir
|   |-- dir_item
|   |-- fibration
|   |-- file
|   |-- formatting
|   |-- hash
|   |-- perm
|   `-- sd
|-- pseudo
|-- readdir
|-- rwx
|-- size
`-- uid
 
1 directory, 24 files

You can also type "cd CREDITS; cat ." to view the file. (One must set execute permission on the file before any of this works).

What appears to be a plain file also looks like a directory containing a number of other files. Most of these files contain information normally obtained with the stat() system call: uid is the owner, size is the length in bytes, rwx is the permissions mask, etc. Some of the others (bmap, items, oid) provide a window into how the file is represented inside the filesystem. This is all part of Hans Reiser's vision of moving everything into the namespace; rather than using a separate system call to learn about a file's metadata, just access the right pseudo file.

One branch of the discussion took issue with the "metas" name. Using reiser4 means that you cannot have any file named metas anywhere within the filesystem. Some people would like to change the name; ideas like ..metas, ..., and @ have been tossed around, but Hans seems uninclined to change things.

Another branch, led by Al Viro, worries about the locking considerations of this whole scheme. Linux, like most Unix systems, has never allowed hard links to directories for a number of reasons; one of those is locking. Those interested in the details can see this rather dense explanation from Al, or a translation by Linus to something resembling technical English. Linus's example is essentially this: imagine you have a directory "a" containing two subdirectories dir1 and dir2. You also have "b", which is simply a link to a. Imagine that two processes simultaneously attempt these commands:

Process 1		Process 2
`mv a/dir1 a/dir2/newdir`		`mv b/dir2 b/dir1/newdir`

Both commands cannot succeed, or you will have just tied your filesystem into a knot. So some sort of locking is required to serialize the above actions. Doing that kind of locking is very hard when there are multiple paths into the same directory; it is an invitation to deadlocks. The problem could be fixed by putting a monster lock around the entire filesystem, but the performance cost would be prohibitive. The usual approach has been to simply disallow this form of aliasing on directory names, and thus avoid the problem altogether.

In the reiser4 world, all files are also directories. So hard links to files become hard links to directories, and all of these deadlock issues come to the foreground. The concerns expressed by the kernel developers - which appear to be legitimate - is that the reiser4 team has not thought about these issues, and there is no plan to solve the problem. Wiring the right sort of mutual exclusion deeply into a filesystem is a hard thing to do as an afterthought. But something will have to be done; Al Viro has made it clear that he will oppose merging reiser4 until the issue has been addressed, and it is highly unlikely that it would go in over his objections (Linus: "This means that if Al Viro asks about locking and aliasing issues, you don't ignore it, you ask 'how high?'")

One way of dealing with the locking issues (and various other bits of confusion) would be to drop the "files as directories" idea and create a namespace boundary there. Files could still have attributes, but an application which wished to access them would use a separate system call to do so. The openat() interface, which is how Solaris solves the problem, seems like the favored approach. Pushing attributes into their own namespace breaks the "everything in one namespace" idea which is so fundamental to reiser4, but it would offer compatibility with Solaris and make many of the implementation issues easier to deal with. On the other hand, applications would have to be fixed to use openat() (or be run with runat).

Another contingent sees the reiser4 files-as-directories scheme as the way to implement multi-stream files. Linux is one of the few modern operating systems without this concept. The Samba developers, in particular, would love to see a multi-stream implementation, since they have to export a multi-stream interface to the rest of the world. There are obvious simple applications of multi-stream files, such as attaching icons to things. Some people are ready to use the reiser4 plugin mechanism and go nuts, however; they would like to add streams which present compressed views of files, automatically produce and unpack archive files, etc. Linus draws the line at that sort of stuff, though:

Which means that normally we really don't _want_ named streams. In 99% of all cases we can use equally good - and _much_ simpler - tool-based solutions.

Which means that the only _real_ technical issue for supporting named streams really ends up being things like samba, which want named streams just because the work they do fundamentally is about them, for externally dictated reasons. Doing named streams for any other reason is likely just being stupid.

Once you do decide that you have to do named streams, you might then decide to use them for convenient things like icons. But it should very much be a secondary issue at that point.

Yet another concern has to do with how user space will work with this representation of file metadata. Backup programs have no idea of how to save the metadata; cp will not copy it, etc. Fixing user space is certainly an issue. The fact is, however, that, if reiser4 or the VFS of the future changes our idea of how a file behaves, the applications will be modified to deal with the new way of doing things. Meanwhile, it has been pointed out that reiser4-style metadata is probably easier for applications to work with than the current extended attribute interface, which is also not understood by most applications.

The discussion looks likely to continue for some time. Regardless of the outcome, Hans Reiser will certainly have accomplished one of his goals: he has gotten the wider community to start to really think about our filesystems and how they affect our systems and how we use them.

Index entries for this article
Kernel	Filesystems/Reiser4
Kernel	Named streams

mount -t reiser4 -o posix

Posted Sep 2, 2004 6:44 UTC (Thu) by larryr (guest, #4030) [Link]

I think it would be ok to have a mount flag which says I want POSIXy semantics when using POSIXy system calls like open/close/rename/link/symlink, and I am willing to lose the ability to access added intuitive but nevertheless non-POSIX behavior through those system calls.

From what I have seen of the VFS layer, it looked pretty tightly coupled to POSIXy semantics, and not easy to shunt past to let the filesystem decide the semantics for itself, which sounds to me like a nice alternative to changing or replacing the VFS layer.

Larry

Why not just not support hard links?

Posted Sep 2, 2004 7:08 UTC (Thu) by walles (guest, #954) [Link] (23 responses)

Since hard links seem so problematic within ReiserFS, why not just remove support for them in ReiserFS entirely?

Also, as I haven't used a hard link in all my life, can anybody tell me about any real-life situation where hard links matter?

Why not just not support hard links?

Posted Sep 2, 2004 8:52 UTC (Thu) by Klavs (guest, #10563) [Link] (5 responses)

Well vserver uses them to save space. If you have 10 vservers (virtual servers) on the same filesystem, mirrored from the same original - instead of having f.ex. 10*3gb of files - you would only have 3gb of files. And since vserver1 can't know of vserver2 or anything outside of the vserver - soft links would be bad.

The other nice thing, that I'm not sure if symlinks could implement (seing past the issues of symlinks to outside a vserver not being a good idea :) - is the new vserver fileattribute "Immutable-unlink" which means you can make all files immutable+Unlink in all vservers(two flags as far as I remember Immutable and Immutable-Unlink flag) - and if one vserver tries to change a file marked Immutable-Unlink - it will simply get a new file - and the old hardlink will be removed - meaning the other 9 vservers still share the same file.

To me this is clever use of hardlinks - but I'm no filesystem guru :)

Space savings should use COW

Posted Sep 2, 2004 9:48 UTC (Thu) by walles (guest, #954) [Link] (4 responses)

I think space savings would be better done by supporting copy-on-write semantics within the file system, which is just what you describe as Immutable-unlink. I imagine COW should be less error-prone than hard links, but just like you I'm no fs guru (or I wouldn't be asking these questions :-).

Something along these lines have been implemented at http://www.ext3cow.com/ (which is a random project I found on Google, I don't know anything about it except what the web page says).

So do you (or anybody else for that part) know of any other uses for hard symlinks that don't have anything to do with space savings?

My use of hard links

Posted Sep 2, 2004 13:23 UTC (Thu) by utoddl (guest, #1232) [Link] (1 responses)

I have a script that makes good use of hard links -- not for space saving so much as time saving, but it saves a lot of space as well. I keep a copy of my RedHat/Fedora/whatever ISO images, and occasionally use wget to grab all the updates into another directory. These updates contain all 19 gazillion versions of the updated packages -- way more than will fit on a CD -- when what I really want of course is the latest version of each. So I use cpio to make a hard link duplicate tree of all those updates (i.e. real directories, hard linked files). That's pretty quick, 'cause it's not moving any data, just creating dir entries. Then my script throws out everything I don't want from that tree -- all the older versions of a given rpm -- and I'm left with a small enough set of rpms to fit onto a CD. I add my own favorite goodies that aren't on the distro (config files, utilities, etc.), and burn an ISO from that. That gives me the original distro CDs plus an extra CD with all current updates and my favorites all on CDs I can carry around so I can install on and update the various boxes I play with at home, work, friends' and family's houses, wherever. (Heck, I'll stick a copy of the scripts here if anybody wants to play with 'em.)

Using hardlinks for this was a natural. Having said that, I recall only using hardlinks once before, a long time ago, and that was specifically for space savings.

My use of hard links

Posted Sep 2, 2004 15:50 UTC (Thu) by fergal (guest, #602) [Link]

I think you could have used symlinks for this too. Then mkisofs -f which tells it to follow symlinks (it might break something that really was meant to be a symlink though).

Space savings should use COW

Posted Sep 10, 2004 0:46 UTC (Fri) by roelofs (guest, #2599) [Link] (1 responses)

So do you (or anybody else for that part) know of any other uses for hard symlinks that don't have anything to do with space savings?

Assuming you really meant "hard links," I use them to avoid accidentally deleting local mail files. Since such files get updated every day, usually several times a day, even daily incremental backups aren't sufficient to recover from an accidental deletion. But with hard-linked copies in a separate directory (e.g., ../.backups/foo.backup, etc.), you're safe. (And when you really do want to nuke the file, just truncate it to zero bytes--I wrote a trivial "trunc" utility that simply uses truncate() or ftruncate() for this purpose.) Of course, I suppose I could simply keep the "real" copy in the same hidden directory and use local symlinks to append to it...but with hard links you save one letter in every ls(1) command (i.e., -L). :-)

I've also occasionally used hard links to save a temporary MPEG or PDF file downloaded by Netscape; under some conditions, the temporary file will disappear as soon as the download is complete, but if you make a hard link at any point prior to that, the link will remain. For multi-megabyte downloads, that can be convenient, albeit not absolutely critical...

Greg

Space savings should use COW

Posted Sep 11, 2004 15:44 UTC (Sat) by khim (subscriber, #9252) [Link]

And this too can be covered by COW files. The fact is in almost all cases where I think about hardlink I found what I really need is COW file and hardlink is just poor substitute.

Why not just not support hard links?

Posted Sep 2, 2004 9:40 UTC (Thu) by rjw (guest, #10415) [Link] (6 responses)

Mainly useful for things like chrooting nowadays AFAIK.

Why not just not support hard links?

Posted Sep 2, 2004 10:03 UTC (Thu) by walles (guest, #954) [Link] (5 responses)

Please forgive my ignorance, but what do hard links have to do with chrooting? How exactly are hard links used together with chroot jails?

Why not just not support hard links?

Posted Sep 2, 2004 11:26 UTC (Thu) by hensema (guest, #980) [Link] (4 responses)

You cannot symlink to files outsides a chroot. So, if you want to create a chroot jail without hard links, then you'd have to copy all files you need inside the chroot, effectively duplicating those files.

However, with hard links, you only need one instance of a file on disk, which saves space.

Note that hard linking from inside a chroot to main system files (such as /bin/bash) is not a very smart thing to do, as chrooted users can then modify exactly the files you wanted to prevent them from modifying. So you always need two copies of a file.

Why not just not support hard links?

Posted Sep 2, 2004 12:04 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

So, in other words, security concerns make the space-saving of hardlinks in a chroot environment useless, since duplication is necessary anyway.

Why not just not support hard links?

Posted Sep 2, 2004 12:52 UTC (Thu) by maniax (subscriber, #4509) [Link] (1 responses)

Using hardlinks for chroot jails is a bad idea. Firstly, you don't have a good way to protect the file, and if you do some modifications on the chroot environment's structure, you'll have to update all it's users (or, if the tool that update software in it uses unlink() and then open(), you'll have to update the users of the env. on every update).
Just use bind mounts, which will save more space, and make it possible to have the evironment mounted somewhere read-write for updates, and somewhere read-only, for use.

Why not just not support hard links?

Posted Sep 2, 2004 16:35 UTC (Thu) by Ross (guest, #4065) [Link]

If the file is not writeable, there is no problem. If the process running
in the jail is under uid 0, then you aren't gaining anything by the jail
anyway.

Why not just not support hard links?

Posted Sep 2, 2004 19:52 UTC (Thu) by oak (guest, #2786) [Link]

Good point would be that it's easier to get security updated versions of
the libraries etc inside the chroots. Same can of course achieved easier
with mount --bind's from chroot "template" directory, but with hardlinks
you can pick and choose what you put input chroots from the template
directory structure.

Slightly different suggestion

Posted Sep 2, 2004 16:39 UTC (Thu) by Ross (guest, #4065) [Link] (1 responses)

How about this?

rule 1: you can't create hard links to a file with streams
rule 2: you can't create streams in a file with more than one directory entry (link)

Problem solved. You get both features on a per-file basis. You just can't
use them both at the same time.

Slightly different suggestion

Posted Sep 2, 2004 18:34 UTC (Thu) by bronson (subscriber, #4806) [Link]

As streams become more and more popular, your proposed solution becomes more and more problematic. Besides, I think that in reiser4 all files have streams.

Why not just not support hard links?

Posted Sep 2, 2004 16:42 UTC (Thu) by piman (guest, #8957) [Link]

Every regular file is a hard link to itself. . and .. are hard links to the directories you get when you cd to them.

Why not just not support hard links?

Posted Sep 2, 2004 19:24 UTC (Thu) by xtifr (guest, #143) [Link] (4 responses)

A hard link is basically just something (usually a name) that points directly to an inode, rather than linking indirectly to another name. So pretty much every file on your system (except for the symlinks) is a hard link. (Actually, even symlinks are usually hard links to inodes containing the symbolic reference, so every symlink is a hardlink - but I believe there are filesystems where symbolic references can be stored in the directory structure, so this is not a hard-and-fast rule. But it is a common one.)

If you mean that you never use more than one hard link per inode (not counting the automatic "." and ".." hardlinks that all directories have), well, even that's pretty tricky - when a process opens a file, it actually creates a new hard link, internal to the process (not associated with any name on the filesystem). So, if you forbid multiple hard links, you lose the ability to open files (unless you delete them as you open them), which would make the files a bit useless. :)

Also, as others have mentioned, hard links are slightly smaller and faster than symlinks. This may not matter to you but it does matter to some people, especially people working with small embedded systems, and the insane performance fanatics who take a wasted CPU cycle as a personal affront (I'll try not to mention any Gentoo fans by name here.:)

Using symlinks also requires you to have a primary, privileged name (the main hard link). Sometimes this isn't convenient. For example, I'm not entirely sure how I want to organize my music: by artist or by genre. Currently I have two directory trees populated with hard links to the same music files. If I used symlinks, one of those trees would have to be privileged, and would be very hard to get rid of if I decided I didn't need it any more.

Why not just not support hard links?

Posted Sep 2, 2004 19:44 UTC (Thu) by walles (guest, #954) [Link] (3 responses)

> when a process opens a file, it actually creates a new hard link, internal
> to the process (not associated with any name on the filesystem)

I'm not sure I'm following this. The article text says that hard links to directories are forbidden, but libc still has an opendir() call. If what you say above is correct, how come opendir() doesn't have to delete directories upon opening them?

> Using symlinks also requires you to have a primary, privileged name

What do you mean with "privileged" name? And why would such files be hard to get rid of?

Why not just not support hard links?

Posted Sep 3, 2004 16:41 UTC (Fri) by giraffedata (guest, #1954) [Link]

He's using a rather expansive definition of "hard link." Usually, "hard link" refers only to a reference to a file from a directory entry. The kind of reference you get when you open a file isn't called a hard link. (It's just called a reference).

Incidentally, most of this thread is using "hard link" in a too restrictive way. When you create a file 'foo', you create one hard link to the file (from the directory entry with name 'foo'). When you ln foo bar, you create a second hard link to the file. Note that Unix files themselves do not have names -- not text ones anyway; they are traditionally named by inode number.

Directories have lots of hard links. There's the one from the parent directory, the one from '.', and all the ones from the subdirectories' '..' entries. In some modern models, '.' and '..' aren't actually considered directory entries, but you'll still see them -- for historical purposes -- in the directory's link count (e.g. from ls -l).

But you can't make an arbitrary hard link to a directory. Only the specific ones described above are allowed to exist.

Why not just not support hard links?

Posted Sep 3, 2004 16:46 UTC (Fri) by hppnq (guest, #14462) [Link]

What do you mean with "privileged" name? And why would such files be hard to get rid of?

I suppose what is meant is, that with symbolic links there is a distinction between the actual file and the link: removing the symbolic link leaves the file intact, while removing the file leaves you with a link pointing nowhere. This, of course, is because there are two separate inodes (the entities that keep the metadata): a symbolic link has its own inode. With hard links, you merely remove one of the references to the file (this is a number in the inode that is increased whenever a hard link is created), leaving it intact until the last link is removed. In this respect, all link names are "equal" then.

Getting back to your original question: this is also one of the reasons why you would want to use hard links. (In practice, symbolic links are almost always preferable. You should really know what you're doing when using hard links.)

Why not just not support hard links?

Posted Sep 4, 2004 21:38 UTC (Sat) by Ross (guest, #4065) [Link]

Because another process can't use that opened directory to traverse the
filesystem (even /proc/self/cwd is just a symlink). The same thing with
open files. They prevent the item from being removed from the disk, but
they don't mess with the namespace.

Why not just not support hard links?

Posted Sep 4, 2004 22:39 UTC (Sat) by jmshh (guest, #8257) [Link]

Here is another scenario, where hard links are useful:

I had do make changes to a config file for a program. There was no source available, the program crashed on reading my version, and the crash handler removed any temporary stuff.

So I started the program, made a hard link to the temporary output and waited for the crash. Now I could see how far the program got and immediately spotted the error.

Disclaimer: Free software makes this unnecessary, good programs provide at least a debug option that makes temporary stuff survive, and even better programs give useful error messages. But one can't always choose the environment.

hard links are a necessity

Posted Sep 11, 2004 13:35 UTC (Sat) by job (guest, #670) [Link]

Hard links is basically the same thing as a file name, so what you're
saying is to limit files to having only one name. This would be to put an
artificial restriction in the file system that people are not used to
(except for the win32 crowd, their system is crippled to start with).

As an example I use hard links for my mp3 folder. Song.mp3 is placed both
in mp3/genres/Pop and mp3/artists/Artist, that way I can browse my
collection according to both artist and genre. But the ability to have
several names (and paths) to one file is useful in lots of other places.

No, this has to be solved in a better way.

Using // to access file attributes

Posted Sep 2, 2004 12:50 UTC (Thu) by exco (guest, #4344) [Link] (4 responses)

Whatever is choosen to access file attributes it will broke some application. So why use // ?

CREDITS//gid

It sure will broke badly written application but it does not remove usable filename possibilities.

Using // to access file attributes

Posted Sep 2, 2004 13:55 UTC (Thu) by elanthis (guest, #6227) [Link] (1 responses)

Or just use a different system call, which makes sense, since we *already* have a system calls for file-system attributes...

Using // to access file attributes

Posted Sep 2, 2004 19:01 UTC (Thu) by pflugstad (subscriber, #224) [Link]

Because if you can just use a special syntax to acess the extended attributes, that makes them immediately accessible to the shell, scripts of various types, and programs that don't know how to access the attributes natively.

For example, clearcase is a (commercial) version control system. It implements something like this by allowing you to append @@ to any file within it and from there see all the versions and branches that are available. This is very powerful, as you can do things like using tab completion in BASH, direct scripting, etc. It's very nice.

Using // to access file attributes

Posted Sep 9, 2004 18:31 UTC (Thu) by nobrowser (guest, #21196) [Link] (1 responses)

Apologize to RMS for implying Emacs is "badly written" :-)

Using // to access file attributes

Posted Sep 11, 2004 15:55 UTC (Sat) by khim (subscriber, #9252) [Link]

There are no need: RMS freely admit there are a lot of cruft in Emacs and there are plans to reimplement everything in more sane ways for few years now

More notes on reiser4

Posted Sep 2, 2004 13:58 UTC (Thu) by ballombe (subscriber, #9523) [Link] (2 responses)

Aren't '.' and '..' the canonical example of hard link to directory ?
How does the kernel handle locking for those two case ?

More notes on reiser4

Posted Sep 2, 2004 15:54 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

Yeah, but those two are easily recognized, and can be special-cased, if
necessary... Basically, the only issue with those is first canonicalizing
your pathname (a la realpath()), and then names will be the same, no matter
what combinations of "." and ".." you use to reach the real destination...
However, with arbitrarily named hard-links, you don't have the ability to
recognize them... Or, rather, I should say you don't have the ability to
canonicalize them into any single specific format, since there is no "one
true name" for them... Ie: if you have "X" and "Y" as hard-links to the
same exact file, which one is the canonical name?? There's no way to decide
that... But, since everyone knows about the special cases of "." and "..",
those can be handled specially, with little trouble...

More notes on reiser4

Posted Sep 11, 2004 15:57 UTC (Sat) by khim (subscriber, #9252) [Link]

Simple. It mostly ignores them when they are in present on disk (some filesystems do not even have "." and ".." on disk!) and handles them specially when in memory.

puzzled

Posted Sep 2, 2004 15:52 UTC (Thu) by fergal (guest, #602) [Link] (4 responses)

Why is it that the first example here is a problem but the other 2 arent't?

mv a/dir1 a/dir2/newdir	 	mv b/dir2 b/dir1/newdir

mv a/dir1 a/dir2/newdir	 	mv a/dir2 a/dir1/newdir

cd a                            cd b
mv dir1 dir2/newdir	 	mv dir2 dir1/newdir

puzzled

Posted Sep 2, 2004 16:27 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

Well, I don't pretend to know much about kernel internals, but I would assume
the reason your case #2 isn't a problem is because the files canonicalize
to the same exact pathnames in both processes, and presumably there must
be some kernel-level synchronization of such things as renaming, based on
the pathnames... So, basically, either one or the other of those processes
will succeed, and the second will fail, because the target director no longer
exists at that point, because the first process beat it to the punch... But,
as I understand it, the issue with the arbitrarily-named hard-links is that
there's no way to recognize that "a/dir1" is the exact same thing as "b/dir1",
and hence no way to possibly synchronize these things, and prevent them from
clashing with each other... And, as such, I would think your example #3 is
also a problem...

Re: puzzled

Posted Sep 2, 2004 17:23 UTC (Thu) by larryr (guest, #4030) [Link] (2 responses)

I think maybe the problem is that unix style filesystem semantics assume a tree structure, meaning one parent edge/entry for each vertex/node, but having a hard link to a directory violates that assumption. I think if it was considered typical for a directory to have multiple parent pointers, and there were consistent conventions for performing atomic locking on all the parents of a directory at once, there might be no problem. But if "the parent" of a node is assumed by the implementation to be "the node corresponding to the path component to the left of the path component referencing this node", locking "the parent" of "/x/a/dir1" could be different from locking "the parent" of "/x/b/dir1".

Larry

Re: puzzled

Posted Sep 3, 2004 1:57 UTC (Fri) by vonbrand (subscriber, #4458) [Link] (1 responses)

No, the problem is precisely that if a diretory has several parents, you need to make sure nobody messes around with one of the paths while you are screwing up another one. For that you'd have to be able to find all parents and lock them before doing anything... and that will have a heavy cost if done naïvely. Al (and Linus) is asking how do they solve these problems. If the ReiserFS guys have good answers, symlinks (as second-class citizens) could be on the way out. I for one doubt they have the answers (or are able to come up with them). Time will tell.

Re: puzzled

Posted Sep 3, 2004 15:36 UTC (Fri) by larryr (guest, #4030) [Link]

No, the problem is precisely that if a diretory has several parents[... ]you'd have to be able to find all parents and lock them before doing anything... and that will have a heavy cost if done naïvely.

I wrote:

if it was considered typical for a directory to have multiple parent pointers, and there were consistent conventions for performing atomic locking on all the parents of a directory at once, there might be no problem.

Larry

More notes on reiser4

Posted Sep 2, 2004 21:51 UTC (Thu) by iabervon (subscriber, #722) [Link] (1 responses)

There are a number of different things going on. First is that the things Reiser4 puts in the attribute space of a file seem to me to be virtual files, in the sense that they replicate existing metadata in the filesystem in a different namespace. Obviously, "echo test > CREDITS/metas/rwx" shouldn't act like that were an ordinary file, and "mv CREDITS/metas/uid CREDITS/metas/gid" doesn't make any sense. Likewise, "ln CREDITS/metas/rwx MAINTAINERS/metas/rwx" seems like it shouldn't be expected to succeed. There's just more expressive power available in the filesystem interface than can be logically supported. The operations which are, from the point of view of the VFS, problematic seem to me to be exclusively ones which Reiser4's use of an extended namespace over files should prohibit anyway.

It seems to me like prohibiting all the tricky operations in the attribute space would be fine. It's not like Reiser4 gets rid of directories and makes everything equivalent to attributes of the filesystem root.

It should be fine to also prohibit file/.. (which would cause problems with hard-linked files; the same file can be in multiple directories).

I think the more serious problem is how to distinguish the attributes of a directory (which is information about the directory) from the contents of the directory (which is really information about the contents). Personally, I think dir/.../attribute is best, since all of the cases in which I've heard of "..." being used, it hasn't been for files in directories.

More notes on reiser4

Posted Sep 3, 2004 18:05 UTC (Fri) by hppnq (guest, #14462) [Link]

Personally, I think dir/.../attribute is best, since all of the cases in which I've heard of "..." being used, it hasn't been for files in directories.

For what it's worth, I think Tivoli Storage Manager uses ... as a wildcard in filenames.

Trying to fit an hierarchical database inside the kernel?

Posted Sep 9, 2004 14:14 UTC (Thu) by leandro (guest, #1460) [Link]

Looks like ReiserFS is getting too complicated and running into issues trying to extend POSIX functionality without breaking neither POSIX compatibility nor kernel internals.

I propose all this is due to it actually amounting to trying to fit a hierarchical database management system inside the kernel.

Now the reason why hierarchical databases lost to SQL is that SQL, even if not being really relational, by implementing some of the relational ideas was much simpler.

So the US$1M question is, at this point wouldnt designing all data structures around the relational model make more sense? Or at least all data storage.

Obviously this doesnt amount to Oracle in the kernel, because SQL is much more complex than a relational implementation would be.

More notes on reiser4

Posted Sep 10, 2004 11:12 UTC (Fri) by dash2 (guest, #11869) [Link]

Comment from a non-technical user (non-technical at this level anyway).

There is a well-known software development management meme out there called "beware of the guy in a room". (I think someone at Microsoft blogged about this recently.) That is, beware of the "genius" who sits on his own coding without communicating with the rest of the team: his ideas may be brilliant, but he may be more interested in implementing his ideas than in the project's needs.

I think maybe Hans Reiser is like a very high level "guy in a room". He is clearly very smart, his ideas are deep - I love reading the namesys website even when I don't get it - but he's very much in love with those ideas, rather than being a pragmatist who just wants to make things work. Which is cool, because open source needs such guys, but... don't let them in the driving seat!

Just my 2c, no disrespect meant to anyone.

Log structured filesystems had similar benchmarking issues....

Posted Sep 10, 2004 22:18 UTC (Fri) by tytso (subscriber, #9993) [Link]

Some folks at Berkeley tried an abortive attempt at a related idea, called log-structured filesystems, and it similarly had the requirement for the filesystem to be periodically groomed using a "log cleaner" in order to repack and reoptimize the filesystem. For small benchmarks where the log cleaner doesn't need to be run during the duration of the test, the results can look much better than under real-world use, where the cost of the log cleaner has to be included overhead of the filesystem.

The benchmark I would suggest be tried against reiser4 is compiling a kernel tree from scratch. If you have to tar and untar the kernel under reiser4 first, that's fine. But then unmount and remount the filesystem (so none of the source files are in the page cache), and then try to do kernel compile. My guess is that the results would be extremely enlightening --- and this would certainly be a fair and representative use scenario which every kernel developer would be familiar with, and indeed use every day.