SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 16:35 UTC (Sat) by Wol (subscriber, #4433)
In reply to: SUSE reaffirms support for Btrfs by zlynx
Parent article: SUSE reaffirms support for Btrfs

Big secret: drives can corrupt data!

There's a recent thread on linux-raid where data loss seems to have been tracked down to the fact that the drive said "yes I've got the data", then lost it somewhere between the cache and rotating rust ...

(Oh, and fixing that - disabling write cache - really f***s up performance.)

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 10:28 UTC (Mon) by anton (subscriber, #25547) [Link] (1 responses)

In the good old times server drives came with disabled write caches, and file systems used tagged command queueing (essentially an asynchronous interface) to provide performance. PATA and SATA got several generations of tagged commands (from what I read, the first ones were pretty unusable, don't know about the latter ones), as well as write barriers (another way to support consistent file systems without killing performance).

It seems to me that Linux support for such features is lackluster. Block device layers like LVM have not supported such features for a long time (do they now?), a Linux file system developer has apologized for not losing more data; my general impression is that most people concerned with file systems and surrounding themes in Linux put performance first, and treat data consistency on crash recovery as an unloved stepchild (e.g., the ext3 data=journal corruption bug existed for several years; and last I looked the only Linux file system providing a good consistency guarantee was nilfs2).

In this context an advantage of BTRFS is that it does not need layers like md or LVM, and can therefore use tagged commands or write barriers to provide consistency. Does it do that? I have no idea. Given that the BTRFS developers are in the filter-bubble of the performance-first Linux file system people, I am pessimistic.

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 19:49 UTC (Mon) by Wol (subscriber, #4433) [Link]

I got the impression these drives were enterprise raid jobbies ...

That said, there does seem to be a bit of antipathy in certain quarters to "doing the job right". Like you know the dialog boxes that say "confirm yes/no, do you want to remember this answer?". WHY IS IT that they always seem to fill in *three* boxes of the two-by-two grid, and never the fourth? They'll remember that you said "yes", but forget that you said "no".

I'm one of those people who find it frustrating that such "stupidities" exist, yet there are many people who don't even seem capable of seeing them! Let alone consider them worthy of fixing - they'll try and obstruct any effort to clean up the logic.

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 30, 2017 12:43 UTC (Wed) by nix (subscriber, #2304) [Link]

That wasn't the problem being discussed on the list, as I understood it. The problem there was that a drive had been told "FUA", there was a timeout because the drive's power (but not the system's power) was unreliable so the drive had forgotten the command was issued and lost its transient state, including its RAM; the kernel retried the command, got told "OK! Flushed!"... and indeed the cache had been flushed, but because of the power interruption the cache that was flushed was *empty*.

Perhaps any timeout at all should force all layers to assume that everything until the last *acknowledged* FUA is potentially lost? Regardless, turning off write caching isn't going to help there at all: any rotating-rust drive needs *somewhere* to store data between it being sent to the drive and its being committed to the disk: whether you call it a cache or not, its contents can be lost if power goes out at the wrong instant (indeed, if the write to the disk is underway you can get a torn sector and ECC recovery on the spinning rust too). The real lesson here is that operating reliably atop hardware with faulty power rails is about as easy as operating atop any other faulty hardware...