Atime and btrfs: a bad combination?

By Jonathan Corbet
May 31, 2012

Unix and Unix-like systems have traditionally recorded the time of last access for each file in the system. This practice has fallen partially out of favor over the last decade for a simple reason: writing the last-accessed time ("atime") takes up a lot of I/O bandwidth when lots of files are being read; see this article from 2007, for example. The worst of the atime-related problems have long since been mitigated by moving to the "relatime" mount option by default; relatime only updates atime a maximum of once per day for unchanging files. But now it seems that atime recording can be especially problematic with the btrfs filesystem, and relatime may not help much.

One of the core design features of btrfs is its copy-on-write nature. Blocks on disk are never modified in place; instead, when it becomes necessary to commit a change, the affected block is copied and rewritten into a new location. Copy-on-write applies to metadata as well as data; if a file's metadata (such as its last-accessed time) is changed, the block containing that metadata will be copied to a new spot. So, on btrfs, an operation that reads a lot of files (creating a tar archive, say, or a recursive grep) can, through atime updates, cause the copying and rewriting of a lot of metadata blocks.

Needless to say, performance is not improved this way, but that is not where the big problem comes in. As Alexander Block pointed out, the real problem has to do with the interaction between atime, copy-on-write, and snapshots.

Btrfs provides a fast snapshotting feature that can create a copy of the state of the filesystem at a specific time. When a snapshot is created, it shares all data and metadata with the "trunk" filesystem. Should a file be changed, the resulting copy-on-write operation separates the trunk from the snapshot, keeping both versions of the data available. So snapshots can be thought of as being nearly free as long as the filesystem remains relatively quiet. Snapshots will share data and metadata, so they do not require a lot of additional space.

Atime updates change the situation, though. If somebody takes a snapshot of a filesystem, then performs a recursive grep on that filesystem, the last-access time of every file touched may be updated. That, in turn, can cause copy-on-write operations on each file's inode structure, with the result that many or all of the inodes in the filesystem may be duplicated. That can increase the space consumption of the filesystem considerably; Alexander posted an example where a recursive grep caused 2.2GB of free space to disappear. That is a surprising result for what is meant to be a read-only operation.

Once upon a time, when disk capacities were measured in megabytes, it was said that the only standard feature of Unix systems was the message of the day telling users to clean up their files. Atime was often used by harried system administrators trying to recover some disk space; they would scan for infrequently-accessed files and, depending on how desperate the situation was and how powerful their users were, either send lists of unused files to users or simply back those files up to tape and delete them. It is somewhat ironic that a feature meant (among other things) to help keep disk space free has now, on next-generation filesystems, become part of the problem.

It's worth noting that the relatime option (which only updates atime once per day unless the file has been modified since the last atime update) is of little help here. It only takes one atime update to force an inode to be rewritten and unshared with any snapshots. So the fact that such updates are limited to one per day offers little in the way of consolation.

Users are also unlikely to be consoled by one other aspect of the problem pointed out by Alexander: since reading data can consume space in the filesystem, read operations might fail with "no space available" errors on an overflowing filesystem. That may make it difficult or impossible to fix the problem by copying data out of a full filesystem. By the time that happens, a typical user could be forgiven for thinking that, perhaps, they don't need last-accessed time tracking at all.

Along those lines, Alexander suggested that it might be a good idea to default to "noatime" (which turns off atime recording entirely) for btrfs mounts, even if that means that btrfs would then behave differently than other filesystems. That idea was quickly shot down for a simple reason: there are still applications that actually need the atime information to function correctly. The classic example is the mutt email client which, in the absence of atime, cannot tell whether a mailbox contains unread mail. Programs that clean up temporary directories (tmpreaper or tmpwatch, for example) will fail without atime. There are also hierarchical storage systems that, like the Unix system administrator of old, use atime to determine when to move files to slower storage. So atime needs to stick around, lest users run into a different kind of unpleasant surprise.

For now, the only recourse for users who run into (or are worried about) this problem is to explicitly mount their filesystems with the "noatime" option. Further ahead, it might be possible to make some tweaks to btrfs to mitigate the problem; Boaz Harrosh suggested disabling atime updates when the free space falls below a certain threshold or moving the atime data into a separate data structure. But nobody appears to be working on such solutions now. So it may be that, as usage of btrfs grows, users will occasionally be surprised that reading a file can consume space in their filesystems.

Index entries for this article
Kernel	Btrfs
Kernel	Filesystems/Btrfs

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 4:30 UTC (Fri) by jzbiciak (guest, #5246) [Link] (17 responses)

I seem to recall posting a crazy idea about moving atime out to its own separately managed data structure and keeping it out of the inode entirely. (I can't find the comment now, of course.)

But still... atime is broken. It turns reads into writes and is generally just nasty. Furthermore, most things just don't need it.

Here's a totally radical, unsellable idea:

Disable all atime updates by default everywhere.
Add an extended attribute saying "do atime updates on this file only" (or relatime, if you so choose).
For systems that truly need accurate atime everywhere: Create a new mechanism that all filesystems can use just for storing atime. Create a backing store that is highly optimized for atime updates and nothing else. Provide an option to roll atime updates into the filesystem if necessary, but in most cases, allow this parallel structure to manage atime. Everyone else that doesn't need atime updates can ignore that kludgy thing.

Even in the absence of that crazy idea, it still sounds like having atime in the inode allows for trivial bandwidth and storage-size amplification attacks. If you could factor out atime updates to some dedicated on-disk structure that relied more on versioning semantics than COW, you could at least fix btrfs' immediate problem without totally ditching atime. With 8 byte atime and 8 byte inode numbers (let's say), The 2.2GB quoted in the article is enough space to store 128M atime updates if they were stored like a journal.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 4:51 UTC (Fri) by neilbrown (subscriber, #359) [Link] (3 responses)

No, please don't disable 'atime'.

I don't use it a lot, but I certainly do use it from time to time to see what files are being accessed. Not a killer feature, but a valuable one.

I'm a big fan of keeping atime in a separate data structure. The liveness properties, stability requirements, and necessary precision are very different from other values in the inode and keeping it together with them is a simplification, not a requirement.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 5:06 UTC (Fri) by jzbiciak (guest, #5246) [Link]

My real point (which your comment echoes) was that atime is so different than just about anything else, that if you want to keep it, it really deserves to be treated rather differently than everything else also. And, as the example in the article shows, atime can have real negative consequences even if you largely ignore it most of the time.

It seems to me the other option, if you don't fix atime, is to mitigate it with hacks (relatime -- which doesn't work well for the attack against btrfs shown in the article) or outright disable it everywhere or almost everywhere.

My comment above was perhaps slightly over the top. Sorry for any confusion.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 14:10 UTC (Fri) by jezuch (subscriber, #52988) [Link] (1 responses)

> I don't use it a lot, but I certainly do use it from time to time to see what files are being accessed. Not a killer feature, but a valuable one.

Hah. I disable ataime on all my filesystems and the only use I have for it is a side effect: it functions as creation time, which is much more valuable for me than access time :)

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 15:13 UTC (Fri) by jamesh (guest, #1159) [Link]

Provided no one goes and updates the atime via utime() or touch.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 7:53 UTC (Fri) by MrWim (subscriber, #47432) [Link] (4 responses)

Interesting. It seems to me that the infrastructure for this already exists so for any particular use case this could be implemented in userspace using inotify and the flag IN_ACCESS. Presumably tmpwatch and tmpreaper could use a mechanism like this listening for files in tmp (tmpwatchd?) or perhaps a daemon could be written which you could use to request atime like information be collected for a particular directory heirarchy. There would be the potential that mutt could use the same mechanism.

One nice property about this solution is that reads being writes are now explicit and if disk runs out read isn't going to fail but the failure mode can be implemented in the watching daemon.

Potentially you could take the dconf like design where a convenient atime API is provided such that atime can be read synchronously by mapping the atime "database" into the process that is reading it read-only whereas the atimed process would be the only process with write access to this file.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 12:45 UTC (Fri) by ablock (guest, #84933) [Link]

I really like that idea. This would allow linux to completely get rid of atime in the filesystem code, or at least to default mount with noatime.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 13:13 UTC (Fri) by jzbiciak (guest, #5246) [Link] (2 responses)

Of course, here's where it breaks: How do you export that over NFS? I guess nfsd would also need to talk to that infrastructure also.

My suggestion about doing it at the kernel level is that you retain the userspace ABI, and there's never an application that breaks because you've radically changed where atime gets monitored and recorded.

I guess you could provide a FUSE-like mechanism to hook userspace back up to 'stat'.

Atime and btrfs: a bad combination?

Posted Jun 3, 2012 22:39 UTC (Sun) by MrWim (subscriber, #47432) [Link] (1 responses)

To export it over NFS you would just export the atime database over NFS. The daemon would only run on the server and clients would (read-only) access the database file directly. This would work just as well as the local case (i.e. well if your applications are atimed aware and not at all for non-atimed aware apps which care about atime).

As you say a FUSE filesystem could be offered to preserve the stat() interface but it would probably be less work to get the existing apps which use atime to use some new library. The only examples I ever hear of are tmpwatch and mutt.

Atime and btrfs: a bad combination?

Posted Jun 7, 2012 12:29 UTC (Thu) by MrWim (subscriber, #47432) [Link]

On second thought this is probably way more complicated than it needs to be. In particular this generic atimed approach introduces a whole bunch of synchronization complexity. It would probably be easier to patch mutt to update atime explicitly and introduce a bespoke tmpwatchd explicitly for the /tmp case.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 9:41 UTC (Fri) by bergwolf (guest, #55931) [Link] (1 responses)

Enabling atime on per file/directory basis sounds good. mutt is already broken anyway in case of relatime mount. Adding such an interface would allow applications like mutt to work again without having to tweak whole file system performance.

Atime and btrfs: a bad combination?

Posted Jun 5, 2012 10:25 UTC (Tue) by mgedmin (subscriber, #34497) [Link]

How is Mutt broken by relatime? I'm using it this way every day, with mbox files on ext4 mounted with relatime. New mail notification works reliably.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 18:48 UTC (Fri) by Yorick (guest, #19241) [Link] (4 responses)

Your suggestions sound eminently sensible to me, although I would settle for the first item and call it a day. More fundamentally, the article author seems to think that it is a problem that atime is slow on btrfs. Quite the contrary: it is excellent news, especially since it seems to be caused by the basic btrfs design principles (so it is hard to "fix").

In fact, once most people agree that there is no reason whatsoever not to mount everything with noatime, we can drop it altogether and start reaping the benefits. All operations are faster, the code becomes simpler, and we can put the now free space in inodes (both on disk and in memory) to more productive use. It is difficult to see any costs here—what would break? finger?

Then, once that has been taken care of, we can go on dealing with some other part of the baroque Unix legacy. Remove 99 % of the TTY options, perhaps? We can start slowly, by taking away the one that converts lower to upper case, and see if anyone notices.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 21:40 UTC (Fri) by jzbiciak (guest, #5246) [Link]

<stty olcuc>

WE CAN START SLOWLY, BY TAKING AWAY THE ONE THAT CONVERTS LOWER TO UPPER CASE, AND SEE IF ANYONE NOTICES.

I'M SURE ANYONE WHO MIGHT COMPLAIN WILL DO SO VERY LOUDLY.

<STTY -OLCUC>

Atime and btrfs: a bad combination?

Posted Jun 4, 2012 12:59 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

And with every change you lose a bit of your userbase. Before you know it you end up with not very much userbase left at all.

Atime and btrfs: a bad combination?

Posted Jun 4, 2012 14:50 UTC (Mon) by hummassa (subscriber, #307) [Link]

You made me smile.

One of our problems as developers is exactly that: one does not simply take features away. Lots of systems made me bury them exactly by trying to take "my" features (the ones I used and cared for and needed) away.

Atime and btrfs: a bad combination?

Posted Jun 5, 2012 12:19 UTC (Tue) by Yorick (guest, #19241) [Link]

I'm going to assume you mean atime specifically, and not olcuc or anything else (you would have a hard time arguing for that one).

To remove old cruft, a good start is quarantine. Simply don't implement atime in new file systems (btrfs); people who need it for their business-critical fingerd can run UFS or ext2 or something else. The important part is that we don't let use of a bad feature to spread, since that is only going to make it harder to get rid of.

Instead of making code worse for everyone for the (dubious) benefit of a vocal minority of cavemen, deal with the problem head-on. Give them a chance to adapt - help them all you can - but set a firm date for when the coddling stops.

Atime and btrfs: a bad combination?

Posted Jun 5, 2012 11:42 UTC (Tue) by roblucid (guest, #48964) [Link]

No atime is NOT broken!!
The problem is Filesystem Designers not implementing it well.

Rather than whinge about it on LKML (Kernel developers & casual enthusiasts aren't the ppl who find atimes useful) implementing atime's better ought to be the focus of discussion. Ppl think about advanced features like snap-shotting and ignore the basic POSIX requirements.

atime doesn't need synchronous update guarantees, in real use the fuzzy relatime (better 23hr min update than 24, to be predictable on daily jobs) would likely be adequate. If your FS can't stand some inode info updates, during reading, then it is what is broken, not the spec.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 6:05 UTC (Fri) by nestal (subscriber, #66970) [Link] (4 responses)

Can the distros just disable atime by default, and enable it once the user install apps like mutt? The distros should know whether a user want atime.

If you "really" want atime, 2.2GB of space is like a small price to pay :P

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 7:20 UTC (Fri) by ptman (subscriber, #57271) [Link] (2 responses)

That was just for one invocation of grep. Think about losing that much each time you execute grep (or per day that you do that, if using relatime).

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 9:12 UTC (Fri) by Otus (subscriber, #67685) [Link]

> That was just for one invocation of grep. Think about losing that much
> each time you execute grep (or per day that you do that, if using
> relatime).

That could only happen if you did daily snapshots as well, right?

Otherwise the COW source data should be freed as unreferenced.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 10:30 UTC (Fri) by cwillu (guest, #67268) [Link]

It's once per snapshot, not once per grep.

Atime and btrfs: a bad combination?

Posted Jun 7, 2012 14:16 UTC (Thu) by slashdot (guest, #22014) [Link]

You can also just fix Mutt to use an extended attribute.

Using atime is broken anyway since the fact that a file was accessed doesn't mean that the user read the mail (e.g. it could be the user grepping the mailbox).

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 7:41 UTC (Fri) by bakterie (guest, #37541) [Link] (4 responses)

Does ZFS suffer from the same problem, and if not, how did they solve it?

Atime and btrfs: a bad combination?

Posted Jun 2, 2012 13:03 UTC (Sat) by Ringding (guest, #34316) [Link]

Yes, it does. It won't allow you to "use" more than 63/64th of the space available in a pool though, and I believe this is one of the reasons for that decision. Surprisingly though, the additional space for the atime updates is only accounted for when I unmount the filesystem.

Atime and btrfs: a bad combination?

Posted Jun 7, 2012 16:18 UTC (Thu) by quanstro (guest, #77996) [Link] (2 responses)

not in the same way. zfs does de-dup, so the
10 snapshots in the example would take up just
one extra copy of the metadata, not 10 after
changing the live data.

it would seem to me that this could be fixed
without de-dup by simply making the copy into
the live file system rather than all the snapshots.
that would be O(1) for a block not O(snapshots).

(it must be doing that right, otherwise how would
we generate 10x the original metadata)

one would think that this is a general problem with
btrfs snapshots, and not specific to atime.

Atime and btrfs: a bad combination?

Posted Jun 7, 2012 23:23 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (1 responses)

But wouldn't each copy of the 10 snapshots have different atime (since it was grepped at different times) which would prohibit dedup?

Atime and btrfs: a bad combination?

Posted Jun 10, 2012 16:52 UTC (Sun) by quanstro (guest, #77996) [Link]

i don't think so. if the atime differed in all the snapshots
to begin with, then the recursive grep would only
trigger a copy into the last snapshot.

it still makes more sense to me to copy into the
working tree rather than the snap. that way all
snaps can continue to share blocks as best they can.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 8:39 UTC (Fri) by geertj (guest, #4116) [Link] (1 responses)

Maybe atime could be updated in-place, if atime is the only metadata update to an inode? The implication would be that atime in snapshots would change. But maybe that's something that could be accepted as a compromise?

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 9:43 UTC (Fri) by bergwolf (guest, #55931) [Link]

Updating atime in a snapshot breaks its design and is dangerous if snapshot is taken for backup purpose.

Avoinding disk-full problems

Posted Jun 1, 2012 11:53 UTC (Fri) by pjm (guest, #2080) [Link] (9 responses)

Methinks it's not the grep that uses the 2.2GB, but the snapshot operation. Perhaps doing a snapshot should require enough space for rewriting the inodes, and should fail with ENOSPC if it isn't available. Surely that's better than grep or ls failing with ENOSPC.

Avoinding disk-full problems

Posted Jun 1, 2012 12:37 UTC (Fri) by ablock (guest, #84933) [Link] (3 responses)

That would destroy the idea behind snapshots. A snapshot should always be extremely cheap if you don't change anything. If we reserve the space at creation time, the whole idea of snapshots gets lost.

Avoinding disk-full problems

Posted Jun 1, 2012 15:07 UTC (Fri) by drag (guest, #31333) [Link]

Yes. I want to be over allocate disk space.

Avoinding disk-full problems

Posted Jun 1, 2012 17:21 UTC (Fri) by faramir (subscriber, #2327) [Link] (1 responses)

If I understand how this works, it is only the most recent snapshot for which this "massive reads generate massive writes" event can happen. So any preallocation only has to exist for the most recent snapshot.

As to snapshots being "cheap", that can refer to two different things: storage requirements OR execution time. Preallocation might not be cheap
in terms of storage, but it should be fairly cheap in execution time.
If I'm right about only needing preallocation for the most recent snapshot, you might be able to just handoff the preallocated space from snapshot to snapshot reducing the execution cost. As for reserving space for this, it would be kind of like the X% reserved for "root" which many filesystems have/had (which was often actually done to reduce fragmentation). The preallocation here would be to preserve functionality (i.e. working atimes) even when the filesystem was "full".

Now as to why BTRFS needed such a large amount of space relative to the size of the filesystem, in the example, that would seem to be an issue with the design of BTRFS and how it interacts with traditional Unix filesystem functionality. As others have suggested storing atimes separately (perhaps just for the "trunk") might work. Perhaps better would be to give the trunk a "current" atime allocation in addition to the standard one. Updates would go to both the current data structure (after copying) as well as the atime only structure up until the disk was full. OTOH, this is getting complicated. No something to figure out in a couple of minutes.

Avoinding disk-full problems

Posted Jun 2, 2012 18:52 UTC (Sat) by drag (guest, #31333) [Link]

> As to snapshots being "cheap", that can refer to two different things: storage requirements OR execution time.

I think it refers to _both_.

Plus it's tough to pre-alocate when you have no idea how much space you are actually going to have to end up using.

Avoinding disk-full problems

Posted Jun 1, 2012 18:04 UTC (Fri) by cwillu (guest, #67268) [Link]

It's the changed metadata from the atime update, just like the article says. The snapshot requires negligible space to complete.

Avoinding disk-full problems

Posted Jun 2, 2012 0:48 UTC (Sat) by droundy (subscriber, #4559) [Link] (3 responses)

I'd prefer to see snapshotting able to generate read-only noatime file systems (or segments of a filesystem). The primary use cases of snapshots don't require that you are able to make modifications, and keeping the old atime could actually be more useful than updating the atime, if the snapshot is being kept for archival purposes, assuming there is some useful information in the atime to start out with.

Avoiding disk-full problems

Posted Jun 7, 2012 6:10 UTC (Thu) by butlerm (subscriber, #13312) [Link] (2 responses)

It is not the snapshot that is creating the problem. As a rule, the atime values for the snapshot are frozen when the snapshot is taken. The problem is that the snapshot and the trunk inodes initially share the same storage on the disk, so when the atime for all the trunk inodes is updated, a completely new copy of each inode must be created, and the old versions are not freed because they are part of the snapshot.

Avoiding disk-full problems

Posted Jun 7, 2012 8:33 UTC (Thu) by dgm (subscriber, #49227) [Link]

That's something one should consider when making snapshots because, eventually, each snapshot _can_ come to duplicate the whole data.

People tend to forget that and think that COW is cheaper that full copy in terms of space. It's only initially, and maybe as long as the data remains unchanged.

Avoiding disk-full problems

Posted Jun 8, 2012 3:09 UTC (Fri) by pjm (guest, #2080) [Link]

We all agree on physically what's happening, and I'm sure we agree that in truth it's not just reading or snapshotting by itself that uses extra space, it's the combination of a read and a preceding snapshot.

The only question is what to do about the possibility of there not being enough space to rewrite the inode. Some possibilities include:

Return ENOSPC on read. (The undesirable prospect alluded to in the article.)
Let the read go ahead but don't update the atime (even the in-memory atime?) if there's no space left. (I gather that this is the current solution.)
Let the read go ahead but scribble over the snapshot's atime.
Exclude atime's from snapshots. (What does that mean? I.e. what atime do people see when doing ls -ltu in the snapshot?)
Laptop mode (lossy atimes): Never initiate a write just for the sake of updating an on-disk atime, but still copy the in-memory atime to disk if we're writing the inode for some other reason.
Never store atime on disk in the first place, but still have accesses update the in-memory atime, like in romfs, cramfs etc. (What value would the in-memory atime get initialized to when reading the inode from disk? 1970, or some function of ctime and mtime?)
Mandatory noatime: the atime that stat(2) sees (and hence find, ls, mutt etc.) is just the creation time.
Reserve enough space for atime to be reliable. E.g. have the superblock record the number of inodes that we are "in debt": initially 0 at filesystem creation, and snapshot sets it to the (then-current) number of inodes, and a copy-on-write of an inode decreases it by one. This debt is tied to the amount of free space left, influencing whether an allocation or snapshot operation returns ENOSPC. Snapshotting is still a cheap operation both in time (no immediate write necessary, and one or two integers in the superblock to update in write-behind) and disk space: a million snapshots a year still only requires as much disk space as the writes that occur between snapshots, except with the difference that we also reserve space for inode writes to occur in the future. This is a once-off reservation, there's no additional cost between one snapshot a year or one million.

I don't want to advocate one solution over another, and I'm pretty happy with what I'm told is the current approach, I'm just listing some of the options.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 15:03 UTC (Fri) by josefwhiter (guest, #39238) [Link]

So we already do what Boaz suggested. When I did the ->update_time() stuff I specifically ignore errors to atime, so the user should never see ENOSPC when doing reads. Even prior to the ->update_time() work btrfs would just emit a warning to dmesg and nothing else would happen.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 18:34 UTC (Fri) by jmorris42 (guest, #2203) [Link]

Sounds like the ability to create a snapshot with noatime enabled would solve most of the problems. Is that feasable? Or heck, if you are wanting a point in time snapshot for a backup just make it completely read only and forget about it.

Atime and btrfs: a bad combination?

Posted Jun 1, 2012 19:20 UTC (Fri) by ballombe (subscriber, #9523) [Link] (6 responses)

Atime is used by popularity-contest (popcon.debian.org) to find packages that were recently used. It seems the least intrusive option. It is sad to see it being deprecated without better alternative being provided.

Atime and btrfs: a bad combination?

Posted Jun 2, 2012 7:49 UTC (Sat) by liljencrantz (guest, #28458) [Link] (5 responses)

Quite the contrary, atime is horribly bad idea that should never have been allowed to survive the seventies. Converting most reads into a read and a write is such an obviously horrible idea that one should not even have to argue about it - it just needs to go away.

As such, I think it's extremely sad to see that relatively modern software, like popcon, is *still* being written that makes the misguided mistake of using this feature.

Atime and btrfs: a bad combination?

Posted Jun 3, 2012 10:19 UTC (Sun) by ballombe (subscriber, #9523) [Link] (3 responses)

I note that nowhere in your post are you suggesting an alternative, so basically you are arguing that popcon should not have been written at all.

Atime and btrfs: a bad combination?

Posted Jun 3, 2012 11:00 UTC (Sun) by liljencrantz (guest, #28458) [Link] (1 responses)

I'm suggesting that the usefulness of popcon is smaller than the damage of atime, so even if there is no alternative implementation strategy for popcon, it would still be better to drop atime.

Atime and btrfs: a bad combination?

Posted Jun 3, 2012 14:57 UTC (Sun) by Yorick (guest, #19241) [Link]

Quite right—for every odd feature there will always be someone having found a use for it and object to it being taken away. But the cost of its existence is often carried by everyone: in performance, code complexity, security, ease of use, conceptual simplicity, and so on. For atime, it should stand clear that the occasional benefits stand in no proportion to those costs.

We see this every time a sensible proposal comes forth to dump an old misfeature that causes way more grief than enjoyment. Control characters in file names, for example...

Atime and btrfs: a bad combination?

Posted Jun 3, 2012 19:48 UTC (Sun) by nybble41 (subscriber, #55106) [Link]

Wouldn't the audit interface offer one alternative? It's already used for tracking file accesses for readahead optimization.

The downside, of course, would be that some daemon would have to run in the background to collect the audit data. However, that could still involve less overhead than updating atimes on every filesystem read.

Atime and btrfs: a bad combination?

Posted Jun 5, 2012 9:31 UTC (Tue) by zack (subscriber, #7062) [Link]

> Converting most reads into a read and a write is such an obviously horrible idea that one should not even have to argue about it

The "converting reads into write" argument is very nice (and lyric), but it's not particularly compelling. atime is not changing the intrinsic nature of reads. atime is a logging/accounting mechanism like many others that an OS kernel implements. It is in the nature of accounting to consume space and that happens (inevitably) also when you log actions that, per se, wouldn't have consumed any space. I don't see any inherent flaw in that, it is "just" a matter of difficult trade-offs about where to store the information and how to minimize their size when space is tight.

A compromise?

Posted Jun 7, 2012 9:35 UTC (Thu) by Mity (guest, #85011) [Link]

Assuming keeping reliable atime in snapshots is less important, a compromise solution could be implemented this way:
(1) Specifying that atimes of files in snapshot are undefined.
(2) Hack btrfs so that sole atime change does not trigger write-on-copy.

I.e. the unchanged files in the snapshot would silently share the atime with the live file as long as the live file is not really explicitly written.