User: Password:
|
|
Subscribe / Log in / New account

Improving ext4: bigalloc, inline data, and metadata checksums

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 4:49 UTC (Sun) by mjg59 (subscriber, #23239)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by dlang
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

Doesn't that assume that you can perform a series of atomic operations that will result in a consistent filesystem? If that's not true then you still need to be able to indicate the beginning of a transaction, the contents of that transaction and the end of it. If all of that hits the journal first then you can play the entire transaction, but if you were doing it directly to the filesystem then a poorly timed crash might hit an inconsistent point in the middle.


(Log in to post comments)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 5:05 UTC (Sun) by dlang (subscriber, #313) [Link]

that's true, but the trade-off is that you avoid writing the data to the journal, and then writing to the journal again to indicate the the transaction is finished.

if what you are writing is metadata, it seems like it shouldn't be that hard, since there isn't that much metadata to be written.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 11:32 UTC (Sun) by tytso (subscriber, #9993) [Link]

The problem is that many file system operations require you to update more than one metadata block. For example, when you move a file from one directory to another, you need to add a directory entry into one directory, and remove a directory entry from another.

Or when you allocate a disk block, you need to modify the block allocation bitmap (or whatever data structure you use to indicate that the block is in use) and then update the data structures which map a particular inode's logical to physical block map.

Without a journal, you can't do this atomically, which means the state of the file system is undefined after a unclean/unexpected shutdown of the OS.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 17:02 UTC (Sun) by kleptog (subscriber, #1183) [Link]

Indeed. If there were an efficient way to guarantee consistency without a journal there'd be a significant market for it, namely in databases. Journals are a well understood and effective way of managing integrity of complicated disk structures. There are other ways, but journaling beats the others on a number of fronts.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 6, 2011 0:40 UTC (Tue) by cmccabe (guest, #60281) [Link]

There is an efficient way to guarantee consistency without a journal. Soft updates. See http://en.wikipedia.org/wiki/Soft_updates. The main disadvantage of soft updates is that the code seems to be more complex.

Soft updates would not work for databases, because database operations often need to be logged "logically" rather than "physically." For example, when you encounter an update statement that modifies every row of the table, you just want to add the update statement itself to the journal, not the contents of every row.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 6, 2011 1:24 UTC (Tue) by tytso (subscriber, #9993) [Link]

The problems with Soft Updates are quite adequately summed up here, by Val Aurora (formerly Henson): http://lwn.net/Articles/339337/

My favorite line from that article is "...and then I turn to page 8 and my head explodes."

The *BSD's didn't get advanced features such as Extended Attribute until some 2 or 3 years after Linux. My theory why is that it required someone as smart as Kirk McKusick to be able to modify UFS with Soft Updates to add support for Extended Attributes and ACL's.

Also, note that because of how Soft Update works, it requires forcing metadata blocks out to disk more frequently than without Soft Updates; it is not free. What's worse, it depends on the disk not reordering write requests, which modern disks do to avoid seeks (in some cases a write can not make it onto the platter in the absence of a Cache Flush request for 5-10 seconds or more). If you disable the HDD's write cacheing, your lose a lot of performance on HDD's; if you leave it enabled (which is the default) your data is not safe.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 11, 2011 10:18 UTC (Sun) by vsrinivas (subscriber, #56913) [Link]

FFS w/ soft updates assumes that drives honor write requests in the order they were dispatched. This is not necessarily the case, weakening the guarantees it means to provide. Also FFS doesn't ever issue what linux calls 'barriers' (on BSD known as device cache flushes or BUF_CMD_FLUSH).

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 21, 2011 23:09 UTC (Wed) by GalacticDomin8r (guest, #81935) [Link]

> Also, note that because of how Soft Update works, it requires forcing metadata blocks out to disk more frequently than without Soft Updates

Duh. Can you name a file system with integrity features that doesn't introduce a performance penalty? I thought not. The point is that the Soft Updates method is (far) less overhead than most.

> What's worse, it depends on the disk not reordering write requests

Bald faced lie. The only requirement of SU's is that writes reported as done by disk driver are indeed safely landed in the nonvolatile storage.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 22, 2011 11:32 UTC (Thu) by nix (subscriber, #2304) [Link]

A little civility would be appreciated. Unless you're a minor filesystem deity in pseudonymous disguise, it is reasonable to assume that Ted knows a hell of a lot more about filesystems than you (because he knows a hell of a lot more about filesystems than almost anyone). It's also extremely impolite to accuse someone of lying unless you have proof that what they are saying is not only wrong but maliciously meant. That is very unlikely here.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 4, 2011 17:13 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

The trade-off is that you go from a situation where you can guarantee metadata consistency to one where you can't. SSDs may make the window of inconsistency smaller, but it's still there.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds