User: Password:
Subscribe / Log in / New account

The two sides of reflink()

The two sides of reflink()

Posted May 6, 2009 8:15 UTC (Wed) by amw (subscriber, #29081)
Parent article: The two sides of reflink()

Why should my reflinked file be counted against my quota at all - I'm not using any more storage (at least not initially).

(Log in to post comments)

The two sides of reflink()

Posted May 6, 2009 8:52 UTC (Wed) by lmb (subscriber, #39048) [Link]

The answer to the quota question points out that this call is only the first step.

There is, of course, also a need for reflinkstat() or whatever it is going to be called - one needs to find out how many blocks are part of a specific COW, and also reflinkdiff(), which enumerates the (meta-)data blocks which differ from the link target (needed for efficient rsync/backup).

Then, it also becomes possible to use quota to account just the difference, i.e. the actual space used by the reflink.

A further complication arises when we look at using reflink() on directories, which would of course also be quite desirable (for snapshots, multi-use chroots etc). That will be an interesting direction to explore ;-)

The two sides of reflink()

Posted May 9, 2009 23:52 UTC (Sat) by butlerm (guest, #13312) [Link]

ZFS (for example) uses an COW design where it isn't remotely practical to
figure out which files / snapshots share which blocks in a reasonable amount
of time. I imagine BTRFS is similar.

However both aforementioned filesystems store block checksums, so perhaps a
more practical means would be to add an interface that returns the block
checksums of a file range (if not the block offsets) to user space, and
generate a candidate duplication list from that.

The two sides of reflink()

Posted May 6, 2009 9:02 UTC (Wed) by epa (subscriber, #39769) [Link]

Because if you make a reflink to a file, and your quota is unaffected, you might later find that when simply changing the contents of the reflink (without making it any larger) your quota is exhausted. Userspace really doesn't expect that a seemingly harmless operation like changing one byte in the middle of a file could suddenly exhaust quota or free disk space.

That said, it makes no sense to account disk quota conservatively while lying about the amount of free space really available. The two should be treated the same, so if reflinking a large file has no effect on the reported free space, it shouldn't cost quota either.

The two sides of reflink()

Posted May 6, 2009 9:22 UTC (Wed) by nix (subscriber, #2304) [Link]

Any userspace that does not expect write() in the middle of a file to
potentially fail with -ENOSPC is broken. Such write()s can fail even now
thanks to sparse files. It is true that currently userspace can rely on
the second write() in a ftell()/write()/fseek()/write() sequence not
failing, but this seems a rather thin thing to rely on, to me.

The two sides of reflink()

Posted May 6, 2009 10:42 UTC (Wed) by epa (subscriber, #39769) [Link]

Thanks for pointing this out. So if currently, creating a 10 gigabyte sparse file does not subtract 10 gigs from your quota nor from the free space report, giving the possibility that writing to the middle of an existing file can run out of space, then making a reflink to a 10 gig file should be treated the same way. There is precedent.

The two sides of reflink()

Posted May 6, 2009 12:33 UTC (Wed) by vonbrand (guest, #4458) [Link]

Even worse, if I reflink() a file of yours, and yout then change it (or delete it, whatever) suddenly my quota goes up without any action on my part.

quota behaviour of reflink()

Posted May 6, 2009 11:21 UTC (Wed) by pjm (subscriber, #2080) [Link]

One issue is what happens when one of the copies is removed, especially if there's more than one owner involved. Presumably the result must be that removing a file can increase someone else's quota usage. One wonders then what should happen if that other user has already exhausted their quota, and what bugs may be triggered by not expecting whichever policy is chosen.

The reasons for different quota behaviour would be if it results in different (and more desirable) user behaviour, or if it helps system administrators choose better quota limits, i.e. if it results in less frequent filesystem-full situations for a given amount of user productivity.

How much are quotas used these days, and for what uses ? Can people comment on the usefulness of different quota policies in the context of specific use cases?

As to whether different quota behaviour would result in different user behaviour (e.g. encouraging taking steps for files to be reflink'ed rather than copied), I wonder how many quota'd users would have the necessary knowledge for it to change their behaviour.

quota behaviour of reflink()

Posted May 6, 2009 12:47 UTC (Wed) by utoddl (subscriber, #1232) [Link]

Interesting points, but remember: quota can be impacted by unrelated actions by other processes owned by the same or other users at any time, so user space needs to respond to space-related issues regardless of what quotas existed "moments ago". In fact, quota can be affected when the user takes no actions at all; the admin can change quotas, file systems can be resized, etc., and a user who was not over quota may suddenly be so even with no changes to his files.

Space-reporting tools can report variations of (1) actual space used by extant allocated blocks (modulo sparse files), (2) space free, (3) space that would be used in the case where a naive copy were made to another file system -- all of which are valid and different numbers. The "simple" question of how much storage is used is in fact complicated, our desire for simple answers notwithstanding.

The two sides of reflink()

Posted May 16, 2009 0:37 UTC (Sat) by efexis (guest, #26355) [Link]

There's lots of discussion of a similar issue known as "over committing", and also in the memory limiting cgroup code, where I think a lot can be learnt on different solutions to the problem.

One example - if you share somebody's file and it doesn't count as your own space, then the original owner deletes their copy, the space is now soley allocated to you, and so should count against your quota? This act could push you way over your quota; what if the file alone is bigger than your complete quota?

What if two people are sharing a file that you delete? Should what's on your quota be divided equally amongst the two of them? If you want to be fair, surely the thing to do when you share a file from someone else is add half of its size to your own quote, and remove half of it from theres. If you share the file, you should share the cost?

If it goes straight onto your quota when you share it, you simply don't have to worry about any of these - but you also do at the same time lose certain benefits of sharing the data. With VMs, over-committing can often be specified as a %, maybe a similar option for sharing files... sharing a file could save you a certain % of its size from your quota, which then means if you suddenly become the sole owner, you only have added the remaining % rather than the full 100%. The % chosen would be linked to the average reference count of all blocks that you own, as this shows the likelyhook of any block being or becoming soley yours.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds