Posted Nov 26, 2009 9:06 UTC (Thu) by joern (subscriber, #22392)
Going back to _any_ point in time only works with near-infinite storage. If you ever write more data than fits your medium, something previously deleted will be overwritten.
The garbage collector makes things worse. As you might know, it has to copy around existing data in order to free larger segments. A corollary is that with a medium that is N% full, the worst case is that each segment is also N% full and the garbage collector will waste N% of the bandwidth, leaving 100-N% for the user to do real work. If your filesystem is 90% full, performance may drop to 10%.
So if the filesystem were to give any sort of guarantee that (some) old data could be recovered, the amount of available free space would drop, hurting performance. Definitely not a free feature.
Maybe I should add that performance does not necessarily drop as dramatically. If the filesystem did a good job when writing the data, it will have seperated long-living data from short-living data. To do so perfectly requires a chrystal ball, but even by pure accident, data that is written at the same time is often also deleted at the same time. Therefore the worst case of every segment being exactly N% full is exceedingly rare and real performance will better than the calculated worst case above.
Posted Nov 26, 2009 9:46 UTC (Thu) by johnflux (guest, #58833)
Posted Nov 26, 2009 11:28 UTC (Thu) by joern (subscriber, #22392)
But if you can guarantee that the total amount of data ever written does not exceed the size of your medium, historical data would indeed stay around. You cannot access deleted data through the filesystem code itself, but you could write a dump tool that goes back in time.
How that dump tool would behave once the garbage collector has to reuse old segments is another question. ;)
Posted Dec 3, 2009 13:40 UTC (Thu) by forthy (guest, #1525)
AFAIK, in NILFS, which is another log structured file system, it comes
for free - the garbage collector will reclaim those snapshots over time,
though, but there's an easy way to access those snapshots, as long as they
For "rotating rust", I'm far from sure that current file systems are
optimized for disks at all. First of all, the "best" block size for a
random access medium is one where transfer time=access time. A
contemporary hard disk can read about 1MB or seek to another location, so
block size should be 1MB (increasing to 2MB next year or so ;-). This
should certainly contain a whole bunch of small files when used for small
files, but still, access granularity below 1MB doesn't make sense.
Second, metadata: For all "normal" use, main memory is big enough to keep
all metadata in main memory. Keep a log on the disk where you write your
metadata transactions sequentially, and on unmount (and when the log
exceeds a certain size), dump the whole metadata on disk. Would make live
a lot easier for the spinning rust.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds