LWN.net Logo

Improving ext4: bigalloc, inline data, and metadata checksums

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 12:01 UTC (Mon) by jlokier (guest, #52227)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by nix
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

I believe dlang is right. You need to enable barriers even with battery-backed disk write cache. If the storage device has a good implementation, the cache flush requests (used to implement barriers) will be low overhead.

Some battery-backed disk write caches can commit the RAM to flash storage or something else, on battery power, in the event that the power supply is removed for a long time. These systems don't need a large battery and provide stronger long-term guarantees.

Even ignoring ext3's no barrier default, and LVM missing them for ages, there is the kernel I/O queue (elevator) which can reorder requests. If the filesystem issues barrier requests, the elevator will send writes to the storage device in the correct order. If you turn off barriers in the filesystem when mounting, the kernel elevator is free to send writes out of order; then after a system crash, the system recovery will find inconsistent data from the storage unit. This can happen even after a normal crash such as a kernel panic or hard-reboot, no power loss required.

Whether that can happen when you tell the filesystem not to bother with barriers depends on the filesystem's implementation. To be honest, I don't know how ext3/4, xfs, btrfs etc. behave in that case. I always use barriers :-)


(Log in to post comments)

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 15:40 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

I think these days any sensible fs actually waits for the writes to reach storage independent of barrier usage. The only different with barriers on/off is whether a FUA/barrier/whatever is sent to the device to force the device to write out the data.
I am rather sure at least ext4 and xfs do it that way.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 18:14 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

no, jlokier is right, barriers are still needed to enforce ordering

there is no modern filesystem that waits for the data to be written before proceeding. Every single filesystem out there will allow it's writes to be cached and actually written out later (in some cases, this can be _much_ later)

when the OS finally gets around to writing the data out, it has no idea what the application (or filesystem) cares about, unless there are barriers issued to tell the OS that 'these writes must happen before these other writes'

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 18:15 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

The do wait for journaled data uppon journal commit. Which is the place where barriers are issued anyway.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 18:39 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

issueing barriers is _how_ the filesystem 'waits'

it actually doesn't stop processing requests and wait for the confirmation from the disk, it issues a barrier to tell the rest of the storage stack not to reorder around that point and goes on to process the next requrest and get it in flight.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 12, 2011 18:53 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

Err. Read the code. xfs uses io completion callbacks and only relies on the contents of the journal after the completion returned. (xlog_sync()->xlog_bdstrat()->xfs_buf_iorequest()->_xfs_buf_ioend()).
jbd does something similar but I don't want to look it up unless youre really interested.

It worked a littlebit more like you describe before 2.6.37 but back then it waited if barriers were disabled.

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 13, 2011 13:35 UTC (Tue) by nix (subscriber, #2304) [Link]

Well, this is clear as mud :) guess I'd better do some code reading and figure out wtf the properties of the system actually are...

Improving ext4: bigalloc, inline data, and metadata checksums

Posted Dec 13, 2011 13:38 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

If you want I can give you the approx calltrace for jbd2 as well, I know it took me some time when I looked it up...

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds