User: Password:
Subscribe / Log in / New account

Avoinding disk-full problems

Avoinding disk-full problems

Posted Jun 1, 2012 11:53 UTC (Fri) by pjm (subscriber, #2080)
Parent article: Atime and btrfs: a bad combination?

Methinks it's not the grep that uses the 2.2GB, but the snapshot operation. Perhaps doing a snapshot should require enough space for rewriting the inodes, and should fail with ENOSPC if it isn't available. Surely that's better than grep or ls failing with ENOSPC.

(Log in to post comments)

Avoinding disk-full problems

Posted Jun 1, 2012 12:37 UTC (Fri) by ablock (guest, #84933) [Link]

That would destroy the idea behind snapshots. A snapshot should always be extremely cheap if you don't change anything. If we reserve the space at creation time, the whole idea of snapshots gets lost.

Avoinding disk-full problems

Posted Jun 1, 2012 15:07 UTC (Fri) by drag (subscriber, #31333) [Link]

Yes. I want to be over allocate disk space.

Avoinding disk-full problems

Posted Jun 1, 2012 17:21 UTC (Fri) by faramir (subscriber, #2327) [Link]

If I understand how this works, it is only the most recent snapshot for which this "massive reads generate massive writes" event can happen. So any preallocation only has to exist for the most recent snapshot.

As to snapshots being "cheap", that can refer to two different things: storage requirements OR execution time. Preallocation might not be cheap
in terms of storage, but it should be fairly cheap in execution time.
If I'm right about only needing preallocation for the most recent snapshot, you might be able to just handoff the preallocated space from snapshot to snapshot reducing the execution cost. As for reserving space for this, it would be kind of like the X% reserved for "root" which many filesystems have/had (which was often actually done to reduce fragmentation). The preallocation here would be to preserve functionality (i.e. working atimes) even when the filesystem was "full".

Now as to why BTRFS needed such a large amount of space relative to the size of the filesystem, in the example, that would seem to be an issue with the design of BTRFS and how it interacts with traditional Unix filesystem functionality. As others have suggested storing atimes separately (perhaps just for the "trunk") might work. Perhaps better would be to give the trunk a "current" atime allocation in addition to the standard one. Updates would go to both the current data structure (after copying) as well as the atime only structure up until the disk was full. OTOH, this is getting complicated. No something to figure out in a couple of minutes.

Avoinding disk-full problems

Posted Jun 2, 2012 18:52 UTC (Sat) by drag (subscriber, #31333) [Link]

> As to snapshots being "cheap", that can refer to two different things: storage requirements OR execution time.

I think it refers to _both_.

Plus it's tough to pre-alocate when you have no idea how much space you are actually going to have to end up using.

Avoinding disk-full problems

Posted Jun 1, 2012 18:04 UTC (Fri) by cwillu (guest, #67268) [Link]

It's the changed metadata from the atime update, just like the article says. The snapshot requires negligible space to complete.

Avoinding disk-full problems

Posted Jun 2, 2012 0:48 UTC (Sat) by droundy (subscriber, #4559) [Link]

I'd prefer to see snapshotting able to generate read-only noatime file systems (or segments of a filesystem). The primary use cases of snapshots don't require that you are able to make modifications, and keeping the old atime could actually be more useful than updating the atime, if the snapshot is being kept for archival purposes, assuming there is some useful information in the atime to start out with.

Avoiding disk-full problems

Posted Jun 7, 2012 6:10 UTC (Thu) by butlerm (guest, #13312) [Link]

It is not the snapshot that is creating the problem. As a rule, the atime values for the snapshot are frozen when the snapshot is taken. The problem is that the snapshot and the trunk inodes initially share the same storage on the disk, so when the atime for all the trunk inodes is updated, a completely new copy of each inode must be created, and the old versions are not freed because they are part of the snapshot.

Avoiding disk-full problems

Posted Jun 7, 2012 8:33 UTC (Thu) by dgm (subscriber, #49227) [Link]

That's something one should consider when making snapshots because, eventually, each snapshot _can_ come to duplicate the whole data.

People tend to forget that and think that COW is cheaper that full copy in terms of space. It's only initially, and maybe as long as the data remains unchanged.

Avoiding disk-full problems

Posted Jun 8, 2012 3:09 UTC (Fri) by pjm (subscriber, #2080) [Link]

We all agree on physically what's happening, and I'm sure we agree that in truth it's not just reading or snapshotting by itself that uses extra space, it's the combination of a read and a preceding snapshot.

The only question is what to do about the possibility of there not being enough space to rewrite the inode. Some possibilities include:

  • Return ENOSPC on read. (The undesirable prospect alluded to in the article.)
  • Let the read go ahead but don't update the atime (even the in-memory atime?) if there's no space left. (I gather that this is the current solution.)
  • Let the read go ahead but scribble over the snapshot's atime.
  • Exclude atime's from snapshots. (What does that mean? I.e. what atime do people see when doing ls -ltu in the snapshot?)
  • Laptop mode (lossy atimes): Never initiate a write just for the sake of updating an on-disk atime, but still copy the in-memory atime to disk if we're writing the inode for some other reason.
  • Never store atime on disk in the first place, but still have accesses update the in-memory atime, like in romfs, cramfs etc. (What value would the in-memory atime get initialized to when reading the inode from disk? 1970, or some function of ctime and mtime?)
  • Mandatory noatime: the atime that stat(2) sees (and hence find, ls, mutt etc.) is just the creation time.
  • Reserve enough space for atime to be reliable. E.g. have the superblock record the number of inodes that we are "in debt": initially 0 at filesystem creation, and snapshot sets it to the (then-current) number of inodes, and a copy-on-write of an inode decreases it by one. This debt is tied to the amount of free space left, influencing whether an allocation or snapshot operation returns ENOSPC. Snapshotting is still a cheap operation both in time (no immediate write necessary, and one or two integers in the superblock to update in write-behind) and disk space: a million snapshots a year still only requires as much disk space as the writes that occur between snapshots, except with the difference that we also reserve space for inode writes to occur in the future. This is a once-off reservation, there's no additional cost between one snapshot a year or one million.

I don't want to advocate one solution over another, and I'm pretty happy with what I'm told is the current approach, I'm just listing some of the options.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds