|
|
Subscribe / Log in / New account

The return of SEEK_HOLE

The return of SEEK_HOLE

Posted May 4, 2011 18:27 UTC (Wed) by chad.netzer (subscriber, #4257)
In reply to: The return of SEEK_HOLE by dlang
Parent article: The return of SEEK_HOLE

> if de-duplication logic forces holes to be replaced with a block of 0's (even a shared one), the authors of that code should be fired

It was a pure hypothetical, but for example some systems can convert an online volume to de-duped mode and back, all while serving files from it. I could see (in such cases of intermediate online filesystem conversions, or other hypothetical situations) that a filesystem could choose to not honor, or incorrectly report the SEEK_HOLE values. In such cases, the API would allow backups to still work, just less efficiently. So, my point is that the SEEK_HOLE API is not bound by any particular filesystem constraint.

> if the purpose of this is to allow backups and copies to deal with holes efficiently, it seems like it would be good to be able to tune how aggressively to look for holes

You don't want the filesystem to "look" for holes; it just knows them outright, if it supports them, based on what data blocks are actually stored. The "looking" for all potential holes can already be (and is) done in userspace for any filesystem, at the cost of examining a lot of zeros. Anyway, that's my view.


to post comments

The return of SEEK_HOLE

Posted May 4, 2011 19:01 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

it just seems conceptually wrong to me that finding holes (or potential holes) should be a two step process.

step 1 use SEEK_HOLE to find holes the filesystem knows about

step 2 read the remainder of the file through userspace to look for additional holes (or holes that SEEK_HOLE didn't report.

examining a range of memory to find if it's exclusively zero seems like the type of thing that is amiable to optimisation based on the particular CPU in use. Since the kernel is already optimised this way it would seem to be better to leverage this rather than require multiple userspace tools to all implement the checking (with the optimisations)

the full details of what extents are used for a file seems like it isn't the right answer, both because it's complex, but also because it's presenting a lot of information that isn't useful (i.e. you don't care if a block of real data is in one block, or fragmented into lots of blocks), but at the same time it seems a bit wasteful to find the holes by doing a separate system call for each hole boundary.

The return of SEEK_HOLE

Posted May 4, 2011 19:54 UTC (Wed) by chad.netzer (subscriber, #4257) [Link]

> examining a range of memory to find if it's exclusively zero seems like the type of thing that is amiable to optimisation based on the particular CPU in use.

Perhaps, but it's almost certainly I/O bound, not CPU.

If you *really* want to aggressively replace long runs of zeros with holes, in existing files (ie. make them sparser), a background userspace scrubber could be employed; although doing it in-place without forcing a copy (new inode) is tricky. At least some Linux filesystems have, or will have, the ability to "punch holes":

http://permalink.gmane.org/gmane.comp.file-systems.xfs.ge...


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds