LWN.net Logo

Filesystems aged like sharp cheese

Filesystems aged like sharp cheese

Posted Jul 7, 2006 15:27 UTC (Fri) by Max.Hyre (subscriber, #1054)
In reply to: Obtaining real-world aged file systems by wilck
Parent article: The 2006 Linux Filesystems Workshop (Part III)

[Darn it, someone else comes up with the same idea while I'm writing my comment. I'll post this anyway, in hopes it's got some useful additions.]

Another issue is the fact that file system performance is usually only tested on fairly young, unfragmented file systems. The file systems development community should work with researchers on better ways of creating aged file systems quickly.
Seems to me the best way to create aged filesystems quickly is to have them already prepared, waiting to be used. Ask people to send in backups of live old systems, and create a library of them---you need a few, go pick them up from the filesystem store.

Obviously this has two drawbacks: it's a bit much to have a decent number of examples when each one is multiple terabytes, and people would probably object to passing their data out for world-wide use.

We could address both by having the donors use a program which saves only the metadata including its position on disk. Remove all the file contents, rename the files to something original like `1', `2', ..., and the result should be both a good bit smaller and free of private information. The restore program (not a standard restore, which arranges things neatly as it goes) would put the metadata in the same location on the new disk, thus giving the same fragmentation, ordering of filenames in directories, &c.

Of course some organizations [say, the NSA] might consider even file sizes and directory structures, to say nothing of timestamps, too sensitive to release. Think traffic analysis. Actually, timestamps could probably be omitted, too.

In fact, the dehydrated FS need only contain that metadata needed for the testing, and everything else omitted, to be faked at reconstitution time.

Need a file system? Pull one off the shelf, reconstitute by doing a special restore which writes all data as nulls (or doesn't even write it---just take whatever happens to be lying around in the data blocks), generate new values for incidental metadata, such as timestamps, and voila! You're ready to go.

The skeletal systems could even be classified by characteristics: need to test against a ton of small files? Use one of these. Want ugly fragmentation? Use one of those.

Of course, there would need to be a large set to draw from, to avoid optimizing FSes for a fixed, though realistic, set of instances. New old ones would be continually solicited to avoid ossification. On the other hand, they could be used like PRNGs: when you want to test different implementations against the same data, you've got it.

Or has this already been put into place, and I just haven't noticed?


(Log in to post comments)

/And/ another thing...

Posted Jul 7, 2006 15:40 UTC (Fri) by Max.Hyre (subscriber, #1054) [Link]

It just occurred to me that renaming files to numbers could mess things up if the directory structure depends on the filename lengths. So the rename would have to be something the same length as the original (leading zeroes?). Which means trouble when the original name is shorter than the new name....

The fix is left as an exercise for the reader. :-)

/And/ another thing...

Posted Jul 15, 2006 11:41 UTC (Sat) by nix (subscriber, #2304) [Link]

The rename might well need to come up with something which *hashes* to the same value as the original. Good luck making *that* work for more than one file a year. :)

Filesystems aged like sharp cheese

Posted Jul 20, 2006 23:49 UTC (Thu) by efexis (guest, #26355) [Link]

The problem is when you want to test a new feature, such as a new algorithm for deciding where to place new blocks on the disk... the filesystem has to be created using this code. Grabbing an old filesystem that wasn't created using this code is completely useless.

You need to replay all the actions, all the file creates, writes, moves, deletes etc, in an order they would actually happen, to see the result.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds