Filesystems aged like sharp cheese
Posted Jul 7, 2006 15:27 UTC (Fri) by Max.Hyre
In reply to: Obtaining real-world aged file systems
Parent article: The 2006 Linux Filesystems Workshop (Part III)
[Darn it, someone else comes up with the same idea while I'm
writing my comment. I'll post this anyway, in hopes it's
got some useful additions.]
Another issue is the fact that file system performance
is usually only tested on fairly young, unfragmented file
systems. The file systems development community should
work with researchers on better ways of creating aged file
Seems to me the best way to create aged filesystems quickly
is to have them already prepared, waiting to be used. Ask
people to send in backups of live old systems, and create a
library of them---you need a few, go pick them up from the
Obviously this has two drawbacks: it's a bit much to
have a decent number of examples when each one is multiple
terabytes, and people would probably object to passing their
data out for world-wide use.
We could address both by having the donors use a program
which saves only the
metadata including its position on disk.
Remove all the file contents, rename the files to
something original like `1', `2', ..., and the result
should be both a good bit smaller and free of private
information. The restore program (not a standard
restore, which arranges things neatly as it goes) would
put the metadata in the same location on the new disk,
thus giving the same fragmentation, ordering of filenames
in directories, &c.
Of course some organizations [say, the NSA] might
consider even file sizes and directory structures, to say
nothing of timestamps, too
sensitive to release.
analysis. Actually, timestamps could probably be omitted,
In fact, the dehydrated FS need only contain that
metadata needed for the testing, and everything else
omitted, to be faked at reconstitution time.
Need a file system? Pull one off the shelf, reconstitute
by doing a special restore which writes all data as nulls
(or doesn't even write it---just take whatever happens to
be lying around in the data blocks), generate new values
for incidental metadata, such as timestamps, and voila!
You're ready to go.
The skeletal systems could even be classified by
characteristics: need to test against a ton of small files?
Use one of these. Want ugly fragmentation? Use one of
Of course, there would need to be a large set to
draw from, to avoid optimizing FSes for a fixed, though
realistic, set of instances. New old ones would be
continually solicited to avoid ossification. On the other
hand, they could be used
like PRNGs: when you want to test different
implementations against the same data, you've got it.
Or has this already been put into place, and I just
to post comments)