Making FIEMAP and delayed allocation play well together
The cp application, it seems, has recently been taught to use FIEMAP to find holes in files. The idea is to optimize the copying of such files by not even reading the holes; that way, the need to zero-fill pages (in the kernel) and compare them against pages full of zeros (in user space) can be eliminated. It seems like a better way of doing things.
Somewhere along the way, Chris Mason got word that cp was corrupting files on btrfs filesystems. The problem, naturally enough, was that FIEMAP was reporting holes where none should exist. The root cause was that FIEMAP was not prepared to deal with regions of a file which have been written to, but which do not actually have blocks assigned yet. The delayed allocation mechanism used by most contemporary filesystems will create exactly that kind of situation, so this is not a theoretical concern.
Chris fixed the problem for btrfs, then decided to see how other filesystems handled the same situation. From his report, xfs handled things well, but ext4 had similar bugs in situations where delayed allocation and real holes came together in the same file. Certain types of bugs, it seems, are likely to turn up in more than one context.
Chris's fix should get into 2.6.38 before the final release; chances are
good that an ext4 fix will be fast-tracked as well. Expect stable kernel
backports too. In the meantime, be careful when copying recently-written
files with new versions of cp on those filesystems.
Index entries for this article | |
---|---|
Kernel | FIEMAP ioctl() |
Posted Feb 24, 2011 15:26 UTC (Thu)
by dberkholz (guest, #23346)
[Link]
Posted Feb 24, 2011 21:20 UTC (Thu)
by dougg (guest, #1894)
[Link] (4 responses)
Posted Feb 24, 2011 21:26 UTC (Thu)
by corbet (editor, #1)
[Link] (3 responses)
Posted Feb 24, 2011 21:57 UTC (Thu)
by razb (guest, #43424)
[Link] (1 responses)
Posted Feb 25, 2011 16:32 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Feb 24, 2011 22:19 UTC (Thu)
by dougg (guest, #1894)
[Link]
Making FIEMAP and delayed allocation play well together
Making FIEMAP and delayed allocation play well together
Anyway I have a different angle. Will FIEMAP work when a file is opened O_DIRECT? What about when the file is a partition or a disk (with or without O_DIRECT)? When a SCSI disk is opened O_DIRECT the FIEMAP ioctl could map through to the SCSI GET LBA STATUS command. Most likely I'm just dreaming.
An extent is a group of blocks in a file laid out contiguously on disk by the filesystem. It's a filesystem concept, which is what is needed to answer your questions. O_DIRECT shouldn't change anything. If your file descriptor is for a partition or a block device, there's no filesystem, so FIEMAP will make no sense. And FIEMAP cannot possibly map to a low-level SCSI operation, since there is no filesystem knowledge at that level.
Making FIEMAP and delayed allocation play well together
Making FIEMAP and delayed allocation play well together
Making FIEMAP and delayed allocation play well together
Making FIEMAP and delayed allocation play well together