Copy on write has huge benefits in space savings, not just in disk space, but more importantly in memory, particularly for virtualized systems. For example, the vserver project already implements a solution to this which allows many virtual servers to share the same files securely. This means that if you have 1000 servers running the same copy of apache, not only can you have only one copy on disk, but the kernel will also only keep one copy in memory (of the shared stuff like program text, of course). While you could achieve a similar sharing with hard links, this would be less secure since a breach in one system would allow the file to be modified in all the other systems. With COW, this is avoided.
Posted May 5, 2009 21:29 UTC (Tue) by flewellyn (subscriber, #5047)
[Link]
I see. That IS beneficial. Thanks very much.
The two sides of reflink()
Posted May 6, 2009 14:25 UTC (Wed) by wilreichert (subscriber, #17680)
[Link]
How is this different from deduplication at the filesystem level?
The two sides of reflink()
Posted May 6, 2009 15:09 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
it sounds like it's one mechanism to use for deduplication.
The two sides of reflink()
Posted May 6, 2009 17:03 UTC (Wed) by elanthis (guest, #6227)
[Link]
To the filesystem, a cp isn't a copy -- it's one process reading from one file and writing to another. Figuring out that that is supposed to be a copy is very non-trivial and expensive, especially when taking into account metadata operations which aren't part of the regular file stream. I'm not sure it's even plausible to do without a second pass, e.g. a "combine files" daemon, which would still just be extra overhead.
If on the other hand cp says "this is a copy" to the kernel then the filesystem can just do the right thing. Of course, other applications will need to be modified to take advantage of the new feature, but such is the truth of most progress.
The two sides of reflink()
Posted May 7, 2009 21:03 UTC (Thu) by anton (guest, #25547)
[Link]
For shared stuff like program text, all servers could use the same
binaries (through mount, mount -t bind, or hard links), so that's not
a good justification for reflinks, either (and if you don't trust the
other servers not to write to the file, why would you trust them with
access to the device at all?). Writable files that would mostly or
completely be the same on both VMs would be a better example, but no
concrete example comes to my mind.
The two sides of reflink()
Posted May 7, 2009 21:13 UTC (Thu) by martinfick (subscriber, #4455)
[Link]
"why would you trust them with access to the device at all?"
You don't, usually the host system mounts a portion of the filesystem into a separate chroot for each guest server. The guests typically then have a limited root capability that does not included making device nodes so they really do not have access to the device, only the filesystem.
The two sides of reflink()
Posted May 10, 2009 18:42 UTC (Sun) by anton (guest, #25547)
[Link]
The guests typically then have a limited root capability
that does not included making device nodes so they really do not have
access to the device, only the filesystem.
With the limits on the root capabilities, the binaries can surely be
made read-only even for the guest roots, so no reflinks are needed for
the binaries.
The two sides of reflink()
Posted May 10, 2009 19:09 UTC (Sun) by martinfick (subscriber, #4455)
[Link]
The guests typically then have a limited root capability that does not
included making device nodes so they really do not have access to the
device, only the filesystem.
With the limits on the root capabilities, the binaries can surely be made
read-only even for the guest roots, so no reflinks are needed for the
binaries.
Sure, but if you make the binaries read only you no longer have
independent guest systems that can be administered without knowledge of
the host or other guests. In other words, if I now want to upgrade the
apache server in one guest, I can't since the binary is read only to my
guest root user. With COW, no problem, as a guest admin I do not even
know that my apache binary is shared with others. It is only relevant to
the host (the host unifies the various guest binaries, not the guest).