The return of SEEK_HOLE
The return of SEEK_HOLE
Posted May 4, 2011 19:01 UTC (Wed) by dlang (guest, #313)In reply to: The return of SEEK_HOLE by chad.netzer
Parent article: The return of SEEK_HOLE
step 1 use SEEK_HOLE to find holes the filesystem knows about
step 2 read the remainder of the file through userspace to look for additional holes (or holes that SEEK_HOLE didn't report.
examining a range of memory to find if it's exclusively zero seems like the type of thing that is amiable to optimisation based on the particular CPU in use. Since the kernel is already optimised this way it would seem to be better to leverage this rather than require multiple userspace tools to all implement the checking (with the optimisations)
the full details of what extents are used for a file seems like it isn't the right answer, both because it's complex, but also because it's presenting a lot of information that isn't useful (i.e. you don't care if a block of real data is in one block, or fragmented into lots of blocks), but at the same time it seems a bit wasteful to find the holes by doing a separate system call for each hole boundary.
Posted May 4, 2011 19:54 UTC (Wed)
by chad.netzer (subscriber, #4257)
[Link]
Perhaps, but it's almost certainly I/O bound, not CPU.
If you *really* want to aggressively replace long runs of zeros with holes, in existing files (ie. make them sparser), a background userspace scrubber could be employed; although doing it in-place without forcing a copy (new inode) is tricky. At least some Linux filesystems have, or will have, the ability to "punch holes":
http://permalink.gmane.org/gmane.comp.file-systems.xfs.ge...
The return of SEEK_HOLE