|
|
Subscribe / Log in / New account

Bringing bcachefs to the mainline

Bringing bcachefs to the mainline

Posted May 23, 2022 22:09 UTC (Mon) by Sesse (subscriber, #53779)
In reply to: Bringing bcachefs to the mainline by bartoc
Parent article: Bringing bcachefs to the mainline

If you have RAID-6, and a spurious bit flip (which generally needs to happen before it's written to disk, as ECC protects you well afterwards), you can tell which disk is bad.

Also, btrfs' RAID-[56] has spent 10+ years getting to production quality, and still is at “should not be used in production, only for evaluation or testing” (https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#ra..., linked from the btrfs wiki at kernel.org), so if nothing else, it's amazingly hard to get right.


to post comments

Bringing bcachefs to the mainline

Posted May 24, 2022 7:46 UTC (Tue) by atnot (subscriber, #124910) [Link] (3 responses)

> Also, btrfs' RAID-[56] has spent 10+ years getting to production quality, and still is at “should not be used in production, only for evaluation or testing”

I don't think this is accurate. My perception is that the RAID56 implementation has been more or less abandoned in it's current unfinished state. This is not that surprising to me because in general, OS-level parity RAID is kind of dead, at least amongst the people who could afford to put significant money behind developing it.

In a modern datacenter you're basically just going to have three types of storage: Local scratchpad SSDs, network block devices and blob storage services. The first is usually RAID10 for performance, the second solves redundancy at a lower level and the third solves redundancy at a higher level. This puts RAID56 in an awkward spot where it's useful for many home users, still decently well supported, but nobody else is really there to care about it anymore.

Bringing bcachefs to the mainline

Posted May 24, 2022 18:53 UTC (Tue) by raven667 (subscriber, #5198) [Link] (2 responses)

Although at some point, the network storage devices, whether they are sharing out a block or blob service, need to run on something and manage the storage, and who is writing that code? Even on a hardware raid controller, is the actual raid card itself just an embedded linux system? It's turtles all the way down, do all the vendors of this kind of hardware write their own proprietary in-house raid and filesystems or do some use the built-in linux support and innovate in the higher layer management, by actually using those building blocks to their fullest potential?

Bringing bcachefs to the mainline

Posted May 24, 2022 19:05 UTC (Tue) by xanni (subscriber, #361) [Link]

Many years ago I worked for an ISP that had a hardware RAID controller fail with a firmware bug that caused it to write bad data to all copies on all redundant storage devices... in both data centres in Adelaide and Sydney. We had an engineer from the vendor in the US on a flight to Australia the same day, and had to spend several days restoring all our customers' data from tapes.

Bringing bcachefs to the mainline

Posted May 24, 2022 20:05 UTC (Tue) by atnot (subscriber, #124910) [Link]

> Although at some point, the network storage devices, whether they are sharing out a block or blob service, need to run on something and manage the storage, and who is writing that code?

Afaict, there's two reasons storage folks generally skip the kernel. The first is that the UNIX filesystem API semantics are a poor fit for what they are doing, the second is that the code isn't capable of running in a distributed manner.

So for blob storage it's generally going to be almost entirely in user space, with no disk-level redundancy at all. See e.g. Ceph, Minio, Backblaze.

EMC/netapp/vSAN all have, to my knowledge, their own proprietary disk layouts. VMWare has their own kernel, not sure about the others. The block devices they present are all also redundant across multiple machines, so dm-raid alone wouldn't quite cut it there. You can use Ceph for block storage too, but that also skips the kernel.

So in general, this is why I say I find it hard to see a place for filesystem-level parity RAID in the near future. It basically amounts to a layering violation in today's virtualized infrastructure. But who knows, things might change again.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds