Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Posted Jun 1, 2012 4:30 UTC (Fri) by jzbiciak (guest, #5246)Parent article: Atime and btrfs: a bad combination?
I seem to recall posting a crazy idea about moving atime out to its own separately managed data structure and keeping it out of the inode entirely. (I can't find the comment now, of course.)
But still... atime is broken. It turns reads into writes and is generally just nasty. Furthermore, most things just don't need it.
Here's a totally radical, unsellable idea:
- Disable all atime updates by default everywhere.
- Add an extended attribute saying "do atime updates on this file only" (or relatime, if you so choose).
- For systems that truly need accurate atime everywhere: Create a new mechanism that all filesystems can use just for storing atime. Create a backing store that is highly optimized for atime updates and nothing else. Provide an option to roll atime updates into the filesystem if necessary, but in most cases, allow this parallel structure to manage atime. Everyone else that doesn't need atime updates can ignore that kludgy thing.
Even in the absence of that crazy idea, it still sounds like having atime in the inode allows for trivial bandwidth and storage-size amplification attacks. If you could factor out atime updates to some dedicated on-disk structure that relied more on versioning semantics than COW, you could at least fix btrfs' immediate problem without totally ditching atime. With 8 byte atime and 8 byte inode numbers (let's say), The 2.2GB quoted in the article is enough space to store 128M atime updates if they were stored like a journal.
Posted Jun 1, 2012 4:51 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (3 responses)
I don't use it a lot, but I certainly do use it from time to time to see what files are being accessed. Not a killer feature, but a valuable one.
I'm a big fan of keeping atime in a separate data structure. The liveness properties, stability requirements, and necessary precision are very different from other values in the inode and keeping it together with them is a simplification, not a requirement.
Posted Jun 1, 2012 5:06 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
It seems to me the other option, if you don't fix atime, is to mitigate it with hacks (relatime -- which doesn't work well for the attack against btrfs shown in the article) or outright disable it everywhere or almost everywhere.
My comment above was perhaps slightly over the top. Sorry for any confusion.
Posted Jun 1, 2012 14:10 UTC (Fri)
by jezuch (subscriber, #52988)
[Link] (1 responses)
Hah. I disable ataime on all my filesystems and the only use I have for it is a side effect: it functions as creation time, which is much more valuable for me than access time :)
Posted Jun 1, 2012 15:13 UTC (Fri)
by jamesh (guest, #1159)
[Link]
Posted Jun 1, 2012 7:53 UTC (Fri)
by MrWim (subscriber, #47432)
[Link] (4 responses)
One nice property about this solution is that reads being writes are now explicit and if disk runs out read isn't going to fail but the failure mode can be implemented in the watching daemon.
Potentially you could take the dconf like design where a convenient atime API is provided such that atime can be read synchronously by mapping the atime "database" into the process that is reading it read-only whereas the atimed process would be the only process with write access to this file.
Posted Jun 1, 2012 12:45 UTC (Fri)
by ablock (guest, #84933)
[Link]
Posted Jun 1, 2012 13:13 UTC (Fri)
by jzbiciak (guest, #5246)
[Link] (2 responses)
Of course, here's where it breaks: How do you export that over NFS? I guess nfsd would also need to talk to that infrastructure also.
My suggestion about doing it at the kernel level is that you retain the userspace ABI, and there's never an application that breaks because you've radically changed where atime gets monitored and recorded.
I guess you could provide a FUSE-like mechanism to hook userspace back up to 'stat'.
Posted Jun 3, 2012 22:39 UTC (Sun)
by MrWim (subscriber, #47432)
[Link] (1 responses)
As you say a FUSE filesystem could be offered to preserve the stat() interface but it would probably be less work to get the existing apps which use atime to use some new library. The only examples I ever hear of are tmpwatch and mutt.
Posted Jun 7, 2012 12:29 UTC (Thu)
by MrWim (subscriber, #47432)
[Link]
Posted Jun 1, 2012 9:41 UTC (Fri)
by bergwolf (guest, #55931)
[Link] (1 responses)
Posted Jun 5, 2012 10:25 UTC (Tue)
by mgedmin (subscriber, #34497)
[Link]
Posted Jun 1, 2012 18:48 UTC (Fri)
by Yorick (guest, #19241)
[Link] (4 responses)
In fact, once most people agree that there is no reason whatsoever not to mount everything with noatime, we can drop it altogether and start reaping the benefits. All operations are faster, the code becomes simpler, and we can put the now free space in inodes (both on disk and in memory) to more productive use. It is difficult to see any costs here—what would break? finger?
Then, once that has been taken care of, we can go on dealing with some other part of the baroque Unix legacy. Remove 99 % of the TTY options, perhaps? We can start slowly, by taking away the one that converts lower to upper case, and see if anyone notices.
Posted Jun 1, 2012 21:40 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
<stty olcuc>
I'M SURE ANYONE WHO MIGHT COMPLAIN WILL DO SO VERY LOUDLY.
<STTY -OLCUC>
Posted Jun 4, 2012 12:59 UTC (Mon)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Jun 4, 2012 14:50 UTC (Mon)
by hummassa (subscriber, #307)
[Link]
One of our problems as developers is exactly that: one does not simply take features away. Lots of systems made me bury them exactly by trying to take "my" features (the ones I used and cared for and needed) away.
Posted Jun 5, 2012 12:19 UTC (Tue)
by Yorick (guest, #19241)
[Link]
To remove old cruft, a good start is quarantine. Simply don't implement atime in new file systems (btrfs); people who need it for their business-critical fingerd can run UFS or ext2 or something else. The important part is that we don't let use of a bad feature to spread, since that is only going to make it harder to get rid of.
Instead of making code worse for everyone for the (dubious) benefit of a vocal minority of cavemen, deal with the problem head-on. Give them a chance to adapt - help them all you can - but set a firm date for when the coddling stops.
Posted Jun 5, 2012 11:42 UTC (Tue)
by roblucid (guest, #48964)
[Link]
Rather than whinge about it on LKML (Kernel developers & casual enthusiasts aren't the ppl who find atimes useful) implementing atime's better ought to be the focus of discussion. Ppl think about advanced features like snap-shotting and ignore the basic POSIX requirements.
atime doesn't need synchronous update guarantees, in real use the fuzzy relatime (better 23hr min update than 24, to be predictable on daily jobs) would likely be adequate. If your FS can't stand some inode info updates, during reading, then it is what is broken, not the spec.
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
WE CAN START SLOWLY, BY TAKING AWAY THE ONE THAT CONVERTS LOWER TO UPPER CASE, AND SEE IF ANYONE NOTICES.
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
Atime and btrfs: a bad combination?
The problem is Filesystem Designers not implementing it well.