Obtaining real-world aged file systems

Posted Jul 7, 2006 2:46 UTC (Fri) by dlang (guest, #313)
In reply to: Obtaining real-world aged file systems by wilck
Parent article: The 2006 Linux Filesystems Workshop (Part III)

the problem with aged file systems is that you may have a file system (say ext2) that has been in use for years, with lots of stuff added at different times.

now you have a new filesystem (newfs) that you want to test.

if you just copy everything from the old filesystem to the new one you end up with a optimaly layed out newfs filesystem since all the files were created at one time, useually with each file being created in one operation (no fragmentation). The resulting performance is vastly different then if the same contents had been put there the same way they were put on the origional ext2fs filesystem.

so the next step is to not record the filesystem, but record the operations on the filesystem (each write, delete, create, etc) and replay those against the new filesystems.

but since modern filsystems delay allocations for several seconds when something is done to them the replays end up with many of the operations canceling out in memory (never hitting disk) and so the result still doesn't match.

the work-around for this is to do lots of sync's to force the filesystem to actually perform the writes to disk instead of short-circuiting them.

Obtaining real-world aged file systems

Posted Jul 7, 2006 2:53 UTC (Fri) by dlang (guest, #313) [Link]

One thought that hit me just after posting the last comment,

now that there is the new timer base in the kernel, and we are nearing useability with the 'tickless' patches, how about setting up a special kernel with a custom timesource that's driven by the process writing the aged filesystem?

it knows the timestamp for all the origional filesystem actions, and it can find out all outstanding timers, after each filesystem action have it set a timer action for the time of the next filesystem action and then use the tickless capabilities to advance time up to that point (steping through each of the other timers on the way). this way the modified kernel and it's filesystem code would think that the replay took place in the same real-time that the origional actions that are being replayed took place in, andy delayed actions will take place appropriately (benifiting things if the delay would have helped in the origional, but happening between actions if they wouldn't have)

as long as the kernel isn't busy doing other stuff at the same time this replay should be very fast, and this avoids the haphazard benifits of trying to insert lots of sync calls.

David Lang