Inserting a hole into a file
Last March, we looked at a proposal for a new fallocate() option to collapse a range of blocks within a file. The FALLOC_FL_COLLAPSE_RANGE flag was added to the 3.15 kernel; its counterpart, FALLOC_FL_INSERT_RANGE, has been proposed by the same developer: Namjae Jeon. It would provide a way to open up a range of blocks within a file, without requiring an expensive data copy.
The example use case that Jeon has used for both new flags is the removal (using FALLOC_FL_COLLAPSE_RANGE) or insertion (using FALLOC_FL_INSERT_RANGE) of advertisements into large video files. While that particular example may not resonate with everyone, there are other uses for quickly removing and inserting chunks of data in the middle of large files. For example, doing non-linear editing on various types of media (video, in particular) may benefit from reducing the amount of data copying needed. The requirement that the ranges be block-aligned, though, could limit the overall usefulness of both flags.
The fallocate() system call provides a means for programmers to alter the allocation of blocks for a file—essentially to give the filesystem more information about the programmer's plans for the file so that better allocation decisions can be made. Over time, additional features have been added to fallocate(), including the ability to punch holes in or to zero-out ranges of a file.
There are quite a few similarities between FALLOC_FL_INSERT_RANGE and FALLOC_FL_COLLAPSE_RANGE. Both must be the only flag passed to fallocate() (other options allow ORing in multiple flags), require that the offset and length specified are multiples of the filesystem's logical block size, and both are only implemented for the XFS and extent-based ext4 filesystems. Also, they are restricted to working within the existing file, so the range covered by offset + length must not stretch beyond the current end of file (EOF).
For inserting a range, the basic algorithm is the same for both XFS and ext4. Once the offset and length parameters are validated (i.e. block-aligned and not past EOF), the file size is increased by the length. The extent containing the logical block number for offset is then examined to see if that block number is the first in the extent. If not, the extent is split so that it starts with the block number corresponding to offset. Then, starting with that extent, all extents from there to the EOF are shifted over (i.e. to the right) by the length, which leaves behind a hole located at the offset with the specified length.
Once that is done, callers can fill that hole by writing whatever data they want into it—hopefully not just ads. Reading from that region before writing to it will return zeroes, as with other holes punched in files.
Beyond the changes to the kernel filesystem layer (which are minimal), XFS, and ext4 (which are more extensive), Jeon has also added a number of test cases to xfstests. There are simple tests of the insert range feature, as well as more complicated tests that do multiple inserts or inserts coupled with collapse operations to try to stress both of these features. In addition, he has added support for an "finsert" command to the xfs_io program from xfsprogs.
Jeon's patch set is up to version 8 at this point; there have been lots of suggestions for changes along the way, but little in the way of fundamental opposition. Given that the collapse range capability was added, it would seem likely that insert range will follow along before too long.
Index entries for this article | |
---|---|
Kernel | fallocate() |
Posted Jan 22, 2015 9:32 UTC (Thu)
by bokr (guest, #58369)
[Link] (1 responses)
Posted Jan 22, 2015 12:36 UTC (Thu)
by mtanski (guest, #56423)
[Link]
Posted Jan 22, 2015 15:53 UTC (Thu)
by jtaylor (subscriber, #91739)
[Link] (4 responses)
What I'd still love now is a way to dump one file into a newly created hole in another file without needing any copies. Though I assume this is probably tricky as it would depend on capabilities of the filesystem, e.g. whether it can do copy-on-write or or not.
Posted Jan 23, 2015 8:28 UTC (Fri)
by jezuch (subscriber, #52988)
[Link]
rsync! Unfortunately the requirement for block alignment is severely limiting that particular use case.
Posted Feb 4, 2015 13:14 UTC (Wed)
by nye (subscriber, #51576)
[Link] (2 responses)
I feel like this implementation is unlikely to gain much traction in any but the most specialised of environments, because the abstraction leaks too badly.
The requirement for an application to understand things like the block size of the underlying filesystem seems like a textbook example of unwarranted chumminess with the implementation, and I'm honestly surprised the feature hasn't seen more pushback, especially with the experience of things like fanotify where a feature was added with the appearance of being conceptually general-purpose but actually specific shortcomings that limit it to a handful of use cases.
Posted Feb 5, 2015 18:31 UTC (Thu)
by flussence (guest, #85566)
[Link] (1 responses)
Posted Feb 9, 2015 11:09 UTC (Mon)
by nye (subscriber, #51576)
[Link]
But FLAC can't know at creation time that the metadata will only ever be edited on a filesystem/OS combo that supports this feature, so it's unlikely to make the assumption that the pre-allocation is not needed.
This is a part of what it means to be too specialised: entirely non-portable, relying on assumptions that can be broken by moving a file even within one system. That doesn't necessarily have to be a bad thing, and it's not that I think this is a *bad* feature, just that in practice it will have to be used transparently and opportunistically as an internal operation, and probably by applications that either don't produce an output file, or already have a final 'rendering' pass or similar, where they can take the opportunity to defragment the result while they're at it.
If it didn't rely on the ability to safely leave small holes in the final files, it would be a lot easier for programs to use the API opportunistically where supported, but since it's not transparent I believe those opportunities will be few and far between.
Posted Jan 22, 2015 17:54 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
What is the logic behind this restriction when it comes to *inserting* new blocks into a file? As long as the offset is valid, it seems like one should be able to insert any number of blocks.
Interaction with mmap?
Interaction with mmap?
very useful
Video editing is a an obvious example, but there are also uses for plenty other applications. E.g. in scientific environments the need to put something into the middle of a large files comes up often.
Many commonly used data formats allow for padding between logical sections so that block alignment requirement is not a big problem.
very useful
very useful
>Video editing is a an obvious example, but there are also uses for plenty other applications. E.g. in scientific environments the need to put something into the middle of a large files comes up often.
very useful
very useful
Inserting a hole into a file