User: Password:
|
|
Subscribe / Log in / New account

btrfs fscked up, too?

btrfs fscked up, too?

Posted Mar 16, 2009 13:54 UTC (Mon) by forthy (guest, #1525)
In reply to: btrfs fscked up, too? by masoncl
Parent article: Garrett: ext4, application expectations and power management

I still think Ted misreads the standard. fsync is about durability, rename is about atomicy. That's two different things, fsync is not necessary to make rename atomic, because POSIX file system metadata operations are already atomic. Atomic metadata operations are a poor man's transaktion, but reordering them and data operations breaks that promise, even though only during a crash (outside the scope of POSIX).

Note that collecting lots of atomic operations and performing them all in one go is not necessarily breaking the order of all these updates. A true log-structured file system collects all operations in order, and writes them in one go - atomic and delayed. btrfs should share most of these properties, even though the internal design is quite different. As it shows, implementing it "right" is not costly. Thanks, Chris, for being responsive.

What we might need further are real transactions. Now, real transactions are harder; with a filesystem like btrfs which a snapshot facility, we get a step closer (but only one step). It's not that easy, in a real transaction monitor, you create a private "snapshot" at the start of a transaction, perform the transaction, and then commit this snapshot. If the commit finds a conflict (e.g. a file changed during the transaction has been changed by somebody else in the meantime), the transaction will be aborted. Also, if another transaction has already been merged and changed a file that is read accessed during the transaction, this transaction will be aborted, too.


(Log in to post comments)

btrfs fscked up, too?

Posted Mar 16, 2009 18:48 UTC (Mon) by tytso (✭ supporter ✭, #9993) [Link]

Your mistake is assuming that the atomicity of rename() is about anything other than the directory pathnames. If you read the rename specification, you will see that it is talking explicitly about directory entries, and nothing at all about the the contents of the inodes involved. For example, here's just a tiny sample from the rename(2) specification:
If the old argument points to the pathname of a file that is not a directory, the new argument shall not point to the pathname of a directory. If the link named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall remain visible to other processes throughout the renaming operation and refer either to the file referred to by new or old before the operation began. Write access permission is required for both the directory containing old and the directory containing new.

To understand the history of the comment in Rationale section of Posix's rename() specification regarding atomicity, it's helpful to understand how rename functionality had been implemented in V7 Unix --- via a combination the link() and unlink() commands. Back in the bad old days, it was possible while renaming a directory to end up having two links to a directory, if the system crashed after link()'ing the new name of the directory, and before the old name of the directory was unlink()'ed.

But to say that this atomicity requirement, which was only about the functionality of rename(2) system call being atomic, would somehow extend to a open-write-close-rename sequence, is a gross misreading of the POSIX specification. And given that I implemented POSIX TTY Job Control from the specification back in the 0.12 days of Linux in fall of 1991, I rather suspect that I have a bit more experience reading the POSIX specification than you do...

btrfs fscked up, too?

Posted Mar 17, 2009 1:43 UTC (Tue) by bojan (subscriber, #14302) [Link]

When I tried to suggest exactly this in another thread, bullshit was called: http://lwn.net/Articles/323430/. So, thank you very much for posting this explanation here.

btrfs fscked up, too?

Posted Mar 17, 2009 10:22 UTC (Tue) by forthy (guest, #1525) [Link]

Sorry, you still refer to the fact that POSIX "allows" to replace all files with pumpkins in case of a crash (especially for squashfs, this is the "obviously right" action ;-). That's not the issue, that's not what we discuss here - we are talking about file systems doing something actually reasonable in case of a crash, which is similar to well specified behavior under normal operation. If you rename a file during operation, and at the same time open that file in another process, and read it, you either read the old data or the new data, but no empty files, no garbage files, no pumpkins (unless, of course, the file deliberately contains a pumpkin image). It is obvious that file metadata is closely tied to actual file data.

BTW reading: I've implemented two Forth compilers from a standard back in the early 90s, and I've been doing my best to implement reasonable behavior in those corner cases where the standard says "an ambiguous condition exists...". The Forth community was quite picky back then about all those ambiguous conditions in the standard, because many people were used to well-defined behavior from their particular systems they used - however, this well-defined behavior might not have been portable. The result of the discussion was that I first started to make one of these two Forth compilers a "model implementation", which had well-defined behavior on those parts where the standard was just sloppy without proper reason. This continued over time, and now the community is revising the standard, and we are now trying to be more precise and less ambiguous (the draft standard document now even includes a test suite). So now, I'm not just reading standard documents, I'm writing them.

What resulted of this activity on my side is a different view upon standard documents, and how to read them. Standards encode common practice. People have not always been careful when implementing things. A standard document is a compromise between different systems. If you implement your system, it's not your job to find excuses for unreasonable behavior, it's your job to find reasonable ways to deal with ambiguous conditions. And if you are really good at it, it's your job to implement these things in a way that can serve as model for others (it's always the duty of those who are good to serve as example). Take the compiler example again: If a symbol encountered by the compiler is neither a number nor a pre- or user-defined function or variable, this is an ambiguous condition. The compiler is "allowed" by the standard to transform the user into a pumpkin (by magic, of course), teaching him a final lesson about proper programming. The reasonable action on a syntax error however is to print a message which states file, source line and position within that line, plus a meaningful error message about the problem. No language standard will define this action. Yet, most compilers in the world (regardless of the language) stick to that behavior, and even use a similar output format to make IDEs happy.

I hope you now understand why I say you didn't read POSIX, but you duck behind it. With "reading" I mean: Try to find out what best practice would be in a case where POSIX indeed does not really define how it should be. And "best practice" is both what your users will be happy with and what serves as good example for other file system writers (pumpkins are no option). If you raise the bar of expectation, do it.


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds