SUSE reaffirms support for Btrfs

Posted Aug 24, 2017 23:25 UTC (Thu) by jhoblitt (subscriber, #77733)
In reply to: SUSE reaffirms support for Btrfs by drag
Parent article: SUSE reaffirms support for Btrfs

Filesystems tend to need to a long slow roast to shake out the OMG level bugs. XFS took about a decade (on Linux) to mature to the point that it didn't have memory leaks and do "very bad things" when the kernel ran out of memory. It took that long while XFS had a lot of popularity for large volumes with heavy workloads because the performance and scaling was quite good. I doubt btrfs has a large installed base for I/O intensive use as the performance can be very poor relative to XFS or even ext4. I expect that will slow down the maturation rate.

I'm honestly not sure why desktop users seem to be so enthusiastic about btrfs and/or zfs. Block level checksuming is a "nice to have" but a data integrity scheme, even for a laptop, it does not make. Software raid, other than raid1, is unimpressive and/or dangerous for many workloads without a battery backed write cache. Many desktop users, myself included, have been using lvm+dmraid since the early 2000s.

At the large end of the storage spectrum, parallel filesystems (GPFS, ceph, etc.) are widely used and there is a strong move towards object storage with very un-posix like semantics.

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 1:28 UTC (Fri) by drag (guest, #31333) [Link] (14 responses)

> Filesystems tend to need to a long slow roast to shake out the OMG level bugs. XFS took about a decade (on Linux) to mature to the point

Btrfs is now over 8 years old and Oracle was declaring it 'production ready' about 5 years ago.

It's had a long time and lots of testers.

I have used btrfs on many systems. I have tried to use it conjunction with ceph, which was a disaster. I have tried to use it provide backing storage virtual machines, which was really painful. I have used it's online compression to conserve disk space on small SSDs which for the most part was a very good decision. I have used it on my desktop and ran into issues with it running out of disk space, but I never lost me data. I also have used it on my own file servers and back up servers and took full advantage of the ability to use checksums to scrub and correct data, especially when I was losing drives due to a flaky SATA controller and it was completely awesome in that regards.

I have had mixed, but largely positive results in my own usage with it.

> I'm honestly not sure why desktop users seem to be so enthusiastic about btrfs and/or zfs.

Most of these desktop users have file servers they like to store lots of their personal data in. They use cheap consumer-grade hardware that sucks badly.

Having a file system that is robust and safe and able to add significant value to cheap hardware is something that everybody is looking for... All the way from Enterprise, public cloud provideres, to home gamers. It's a massive win for pretty much everybody.

> Software raid, other than raid1, is unimpressive and/or dangerous for many workloads without a battery backed write cache.

Software raid doesn't have a cache to need a battery back up, generally speaking. RAID 5 is obsolete, but RAID 1, RAID 10, and RAID 6 are very robust and useful software raid styles.

'RAID10' mode for BTRFS has always been more then enough for me and I never understood why RAID5 was so valuable to so many people. For the cost of a extra drive RAID10 is more then worth it.

I would of been quite happy if the btrfs developers simply told the world that if they want to have replicated data they need to do it in RAID1 or RAID10. But they didn't do that and promised people their RAID5-like features and so far it has caused quite a bit of pain for btrfs adopters. Hopefully they have this fixed.

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 2:14 UTC (Fri) by ttelford (guest, #44176) [Link] (13 responses)

> I never understood why RAID5 was so valuable to so many people

If your array has 3 drives, you have a point. There's only one more drive needed to go RAID 10.

However, RAID10 gets prohibitively expensive as the number of disks grows.

And of course, there's the issue of being able to find an individual users sweet spot between storage space, reliability, and the physical space to mount disks in a chassis.

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 3:24 UTC (Fri) by drag (guest, #31333) [Link] (12 responses)

RAID 5 recovery time is what kills it. Lack of performance doesn't help things. It's obsolete because of this. 'Individual sweet spot' is fairly irrelevant when the 'individual sweet spot' in question just setting yourself up for misery. It's not just a issue of chances of a second drive failing before the recovery is complete, but also having a fully usable and available system ASAP.

With 4TB drives being less then 100 dollars now you get 8TB for less then $400 or 16TB for less then $800. The per-drive expense of storage like that is far less of a concern then it used to be and other factors tend to matter more. If you are trying to get work done on a system rebuilding a array then the time to recover can easily stretch out to multiple days.

If somebody has their heart set on RAID5 then that's fine, but I still consider it ill-advised.

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 19:17 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (2 responses)

While I agree with you regarding RAID5, with a 4-disk RAID6 parity scheme the loss of _any_ two drives is recoverable. With RAID10 on the same array you have a 1/3 chance of losing access to half your data when the second drive fails (i.e. when it happens to be the mirror of the first drive that failed). On the other hand, while RAID6 can perform two reads in parallel from any part of the array, RAID10 has the advantage in speed since it can perform at least two reads (one per mirror) and possibly up to four (one per disk) at one time, depending on which part of the array is being read. Similarly, a RAID6 write touches all four drives whereas RAID10 only needs to update two, potentially allowing up to double the throughput if writes are distributed across the mirrors.

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 15:07 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

Just watch out though - as far as linux is concerned, raid-10 and raid-1+0 are two different beasts. For example, you only need three drives for raid-10, but four for raid-1+0.

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 17:26 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> you only need three drives for raid-10

Right, I was writing from the perspective of 'nested' RAID10, not the 'complex' version. With 'complex' RAID10 you would have guaranteed partial data loss on failure of any two drives, since the data stripes are distributed across the drives and—for an array of four uniformly-sized disks, as we've been discussing, which doesn't really need to be 'complex'—the two failed drives would be the sole mirrors for approximately one-twelfth of the stripes.

Whether a guaranteed 8% data loss (given a second drive failure) is better or worse than a 1/3 chance of losing 50% of the data will, I suppose, depend on your use case. However, do consider that if the 8% includes critical metadata (and it probably will) then the remainder, while still technically "present", may still be unrecoverable in practice.

SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 3:14 UTC (Sat) by ttelford (guest, #44176) [Link]

At the end of the day, at any given moment, there's a "largest size available". Simply buying a bigger disk isn't possible, and RAID-10 uses space very inefficiently.

Not all applications are terribly sensitive to performance. A lot of the classical file servers have files that are effectively static - with occasional updates.

A home media server is another example: you can record multiple channels of 1080 video (Using ATSC / MPEG-2), and playback multiple recordings simultaneously, without skipping. Why target performance (which you don't need), and sacrifice storage space (which you do need).

Ther is simply no one size fits all solution; it's all a compromise. I personally use RAID6 when performance isn't a problem.

SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 8:20 UTC (Sat) by Otus (subscriber, #67685) [Link] (7 responses)

> RAID 5 recovery time is what kills it.
> [...]
> If you are trying to get work done on a system rebuilding a array then the time to recover can easily stretch out to multiple days.

Really? Recovery is something that needs to be done rarely. As long as it can be done in the background without interruption I cannot see the problem, even if it takes days.

I also don't see why RAID 1 recovery should be significantly faster. Whenever you lose one disk you need to rewrite one disk, in any RAID mode. With RAID 5 you need to read double the data, but does that really make that much of a difference in practice?

(Finally, with 4 disks instead of 3 in an array, you will get a drive failure about a third more often.)

SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 12:43 UTC (Sat) by SampsonF (guest, #118216) [Link] (6 responses)

In a multi-drive volume (RAID), when one drive fails due to "age", the other drive is very like to start failing also.

When one drive failed, and once it started to rebuild, it created high volume of disk read/write activities for long duration of time. Thus surge of disk activities in turn increase the likelyhood of failure in the remaining disks.

That is why sometimes Mirroring or RAID1 is feasible for some usage case.

SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 16:30 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

RAIDs usually suffer multiple drive failures because they haven't been looked after. A sysadmin will do scrubs, will monitor SMART etc, and won't be surprised by a disk failure...

So why do we regularly get stories about arrays falling over? Maybe because home users (and bean counters) don't think checking the health of the system is important?

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 30, 2017 12:28 UTC (Wed) by nix (subscriber, #2304) [Link]

A sysadmin will do scrubs, will monitor SMART etc, and won't be surprised by a disk failure...

Your assumption that scrubbing and SMART will reliably detect disk failures is not very accurate. There are sudden failures, for starters, where the drive works until it suddenly doesn't: but also SMART is not terribly reliable at the best of times, and scrubbing is more to guard against slow magnetic degradation than to discern whether the drive is on its way out.

SUSE reaffirms support for Btrfs

Posted Aug 30, 2017 22:47 UTC (Wed) by jwarnica (subscriber, #27492) [Link] (1 responses)

Mortals, which is to say people with meetings and more than one system to nurse, have other demands on their time.

"Enterprise" hardware, and software should not catastrophically fail without warning that comes out of the box, and for sure not fail totally before shipping kicks in.

SUSE reaffirms support for Btrfs

Posted Aug 31, 2017 13:33 UTC (Thu) by Wol (subscriber, #4433) [Link]

No names no pack drill ...

But a company I know of bought a commercial raid-6 array. Cue a massive panic a couple of years down the line when an operator suddenly noticed two red lights indicating a double drive failure. Nobody'd been checking the raid.

So they had people who were supposed to be looking after it. So it did try to tell them something was wrong. And still they just dodged a catastrophic failure by pure luck rather than judgement.

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 26, 2017 21:14 UTC (Sat) by zlynx (guest, #2285) [Link] (1 responses)

A proper weekly scrub will solve this. It does a full read and verify and is just as much of a heavy load as a full rebuild. So if a drive would have failed during a rebuild it would have failed during a scrub.

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 15:10 UTC (Mon) by Wol (subscriber, #4433) [Link]

Also, a lot of raid failures are down to "soft" problems. You need to read (and rewrite) your drive regularly. Just as dynamic ram needs to be refreshed every couple of nanoseconds, so does your drive need to be refreshed regularly, although on a MUCH longer timescale. Again, a scrub will pick up problems before they get serious.

Cheers,
Wol

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 1:39 UTC (Fri) by ttelford (guest, #44176) [Link]

> I'm honestly not sure why desktop users seem to be so enthusiastic about btrfs and/or zfs.

Snapshots are highly useful for everyone. Most "new" users start with a Linux desktop, and snapshots are useful for those little "oops" moments that come with learning.

SUSE reaffirms support for Btrfs

Posted Aug 25, 2017 4:21 UTC (Fri) by raegis (guest, #19594) [Link]

> I'm honestly not sure why desktop users seem to be so enthusiastic about btrfs and/or zfs.

* snapshots for the usual stuff
* snapshots for lxc: I create base containers with btrfs storage and clone to make
several more--saves a tremendous amount of disk space
* btrfs send/receive for cheap incremental backups (the KILLER feature, IMO)
* Upgrade from Debian Jessie to Debian Stretch: debootstrap into a new subvolume and set it as the default for the root filesystem (btrfs subvol set-default ...) I think this is cool.
* Encrypted device-backed loop: I've experienced no issues with power loss using btrfs. In my experience, ext4 was not as robust.
* and more!

SUSE reaffirms support for Btrfs

Posted Aug 28, 2017 11:55 UTC (Mon) by Hanno (guest, #41730) [Link] (1 responses)

> I'm honestly not sure why desktop users seem to be so enthusiastic about btrfs and/or zfs.

I really want to see a really really simple and painless way to have multiple generations of snapshot backups on my desktop. I haven't used btrfs yet, but was under the impression that this is its promise to desktop users.

SUSE reaffirms support for Btrfs

Posted Sep 1, 2017 7:35 UTC (Fri) by niner (subscriber, #26151) [Link]

That's _the_ reason for me to use btrfs on my desktop. Snapshots are really pain free and don't slow down my system. And I use a trivial 18 line shell script (including sanity checks) that creates the snapshot and sends the differences between the previous snapshot and the new one to my backup drive and just to be sure also to my webserver as offsite backup. The local backup takes mere minutes to complete. The offsite backup at least fully uses my upload pipe as it doesn't have to ask the server for what it got. It already knows.