> causes files to disappear at random points in time.
Having files disappear spontaneously makes sense to me. A 'file' is a natural unit of caching. There is a clear distinction between the 'file' and the 'name', so that you can unlink a cache file even while it is in use, and the process using it will not lose out.
Having arbitrary blocks in the middle of a file disappear spontaneously it not something that I am so comfortable with. There is no 'natural unit' (so John had to invent 'ranges' and worry about semantics for merging etc) and there is no 'object/name' distinction so you have to think carefully about races between access and discard.
I would really like it if the whole 'volatile data' thing could be done with files. Files get marked as 'volatile' and the filesystem can unlink them as desired. One problem is that open/mmap/close is a whole lot slower than any single systemcall, and definitely slower than a simple memory access that might (but usually doesn't) cause SIGBUS.
Maybe an madvise style interface that works for ranges in anonymous memory, and some sort of per-file interface for filesystems when a shared cache is required.
Posted Nov 7, 2012 8:04 UTC (Wed) by dgc (subscriber, #6611)
[Link]
> Having arbitrary blocks in the middle of a file disappear
> spontaneously it not something that I am so comfortable with.
Fundamentally, HSMs make blocks disappear from files spontaneously. And those blocks come back when you try to read them. IOWs, the filesystem is basically a namespace with a great big data cache in front of some kind of slower storage.
Volatile ranges turn HSM space management on it's head - instead of moving data to tape when you run out of space, we can do it pre-emptively and mark the duplicated data ranges as volatile. When the filesystem runs out of space, it can just punch out the volatile ranges and everything continues quickly rather than blocking waiting for the HSM to move data out to tape.
Then when you add range based hot data tracking as teh method of selecting what parts of the files are copied to tape and marked volatile, you've got quite a neat way of automatically managing the filesystem space that doesn't impact performance when space runs low or the HSM moves frequently accessed data to tape mistakenly...
Big picture - we've got lots of infrastructure on the way for doing interesting things with our storage stack - the only thing missing is the application that ties them all together....
> There is no 'natural unit' (so John had to invent 'ranges' and worry
> about semantics for merging etc)
There doesn't need to be a natural unit. In reality, it is a filesystem block, but having a tracking structure is necessary regardless of unit. Using the mapping tree proved impractical for various reasons, and the simplest solution was to use it's own tree. Volatile ranges on files are not bad because we have no generic range tree library in the kernel that could be used for tracking them....
> and there is no 'object/name'
> distinction so you have to think carefully about races between access
> and discard.
Same for any method of tracking volatile ranges... :)
> I'm not sure that one size can fit all.
Probably not - the anonymous memory usage is recent, but I think it's separate to filebacked volatile regions which is what John's original proposal was for. Lumping them together as equivalent functionality is not really correct....