User: Password:
|
|
Subscribe / Log in / New account

Btrfs: Working with multiple devices

Btrfs: Working with multiple devices

Posted Jan 9, 2014 22:34 UTC (Thu) by kreijack (guest, #43513)
In reply to: Btrfs: Working with multiple devices by mrjoel
Parent article: Btrfs: Working with multiple devices

> To quote myself on the btrfs mailing list:

> "I was surprised to find that I'm not allowed to remove one of the two
> drives in a RAID1. Kernel message is 'btrfs: unable to go below two
> devices on raid1' Not allowing it by default makes some sense, however a
> --force flag or something would be beneficial."

I agree with you that allow to force a removing a disk would be useful; if a disk reports problem, it could slowdown the system due to the read-error-reset-retry cycle. So in this case it would be useful to stop using a disk.

> typically a system owner will want to use all drives that are available, and
> don't reserve an extra SATA/SAS channel for the times when a drive fails.
On that I cannot agree, think about the spare disk... However this doesn't change the point:
btrfs support an "add-remove" replace; but doesn't support a "remove-add" replace

> That seems to be a problem, to have to unmount the volume in order to
> remove the degraded flag, which is needed to begin the rebalance. And
> what if btrfs is the root file system? It needs to be rebooted to
> clear the degraded option."

Are you sure about that ? I tried to rebalance a degraded filesystem (but after having added a new disk). It seems to work:

ghigo@venice:/tmp$ sudo /sbin/mkfs.btrfs -K -f -d raid1 -m raid1 /dev/loop[01]
WARNING! - Btrfs v3.12 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
adding device /dev/loop1 id 2
fs created label (null) on /dev/loop0
nodesize 16384 leafsize 16384 sectorsize 4096 size 19.53GiB
Btrfs v3.12
ghigo@venice:/tmp$ sudo mount /dev/loop0 t/
ghigo@venice:/tmp$ sudo umount t/
ghigo@venice:/tmp$ sudo losetup -d /dev/loop0
ghigo@venice:/tmp$ sudo mount /dev/loop1 t
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

ghigo@venice:/tmp$ sudo mount -o degraded /dev/loop1 t
ghigo@venice:/tmp$ sudo btrfs fi balance t/
ERROR: error during balancing 't/' - No space left on device
There may be more info in syslog - try dmesg | tail
ghigo@venice:/tmp$ sudo btrfs dev add -f /dev/loop2 t
Performing full device TRIM (9.77GiB) ...
ghigo@venice:/tmp$ sudo btrfs fi balance t/
Done, had to relocate 2 out of 2 chunks
ghigo@venice:/tmp$ cat /proc/self/mountinfo | grep loop1
79 20 0:39 / /tmp/t rw,relatime shared:63 - btrfs /dev/loop1 rw,degraded,space_cache


(Log in to post comments)

Btrfs: Working with multiple devices

Posted Jan 9, 2014 22:39 UTC (Thu) by dlang (subscriber, #313) [Link]

>> typically a system owner will want to use all drives that are available, and don't reserve an extra SATA/SAS channel for the times when a drive fails.
> On that I cannot agree, think about the spare disk... However this doesn't change the point:
> btrfs support an "add-remove" replace; but doesn't support a "remove-add" replace

A lot of people don't run with a spare disk. They run with all the disks active and when one fails they replace it.

Yes, this extends the time when they are running in degraded mode, but since the disk will probably be significantly cheaper later, and there's no testing to see if the spare has a problem (and remember, it is running all the time), it's not a completely unreasonable thing to do.

people routinely run their home systems in much riskier modes.

Btrfs: Working with multiple devices

Posted Mar 19, 2014 22:33 UTC (Wed) by dany (subscriber, #18902) [Link]

I provide real world example from Solaris10/ZFS, I was recently dealing with.

Company has servers with many disks, which have raid2z (sort of raid6 analogy) redundancy in zpools/zvols. There is no hot-spare and there is no extra space for new disk. One disk fails (in example, pool name is "pool", failed disk is c7t6d0). ONLINE disk replacement procedure with LIVE ZFS is going to be:

1. zpool detach pool c7t6d0 #mark failed disk as detached for zfs
2. zpool offline pool c7t6d0 #take disk out of zfs control
3. cfgadm -l | grep c7t6d0 #figure out controller and target of disk
sata5/6::dsk/c7t6d0 disk connected configured ok
4. cfgadm -c unconfigure sata5/6 #power off disk
5. physicaly replace disk
6. cfgadm -c configure sata5/6 #power on disk
7. fdisk/format c7t6d0 #create partition table on disk
8. zpool online pool c7t6d0 #bring disk under zfs control
9. zpool replace pool c7t6d0 #resilver disk

Similar capability was possible with older volume manager (SDS) in Solaris. For Linux and Solaris there is proprietary VxVM, which also can do that.

So yes, there is a need for exactly this use case in LINUX also (no reboot, no remount, no extra space for new disk). I would definitely expect this functionality from next-gen FS. Actually it could be a risk to run FS without this capability for some companies. Yes hot-spares are great, but in real world, they are not always available.

Btrfs: Working with multiple devices

Posted Mar 19, 2014 22:53 UTC (Wed) by anselm (subscriber, #2796) [Link]

I can't say anything about Btrfs, but the scenario you've outlined should be quite tractable on Linux with LVM and, e.g., Ext4.

Btrfs: Working with multiple devices

Posted Jan 9, 2014 23:22 UTC (Thu) by mrjoel (subscriber, #60922) [Link]

> I agree with you that allow to force a removing a disk would be useful;
> if a disk reports problem, it could slowdown the system due to the
> read-error-reset-retry cycle. So in this case it would be useful to
> stop using a disk.

>> typically a system owner will want to use all drives that are
>> available, and don't reserve an extra SATA/SAS channel for
>> the times when a drive fails.
> On that I cannot agree, think about the spare disk... However
> this doesn't change the point:
> btrfs support an "add-remove" replace; but doesn't support
> a "remove-add" replace

I see these as two use cases of the same fundamental capability - removing a device when that action inherently results in running in a degraded state. That is indeed what I couldn't reconcile myself with when last evaluating btrfs and ended up being our showstopper.

On the spare device issue, sure - having at least one hot spare is very common. However, I expect that one typically has a hotspare configured in the hardware RAID controller in which case btrfs just sees a single device so the multi-device support doesn't come into play. That loses the ability of btrfs to do direct device integrity checking, but seems to be the only option since the btrfs wiki lists "Hot spare support" as not claimed and nothing done [1]. Eliminating the cases where HW RAID is used, if an additional drive is spinning in a chassis, then the ideal is have all drives added as RAID-6 and/or a reserved hot spare (in fact, a hot spare in btrfs may end up being just be bump the N-level redundancy one level, which would offer the additional failure tolerance as well as offer the additional active spindle for I/O. However, since both hot spare and RAID6 are not implemented in btrfs, I assume most would be inclined to add it to the mix, especially since btrfs can use odd number of drives in RAID1 since it's per block not overall device.

On the degraded flag changes, looks like it may have been updated since I was trying in August, I see some mailing list patches to allow changing feature bits while online, so that sounds like good news. However, even in your example, the mount still shows the degraded flag which is misleading, although understandable why required. At the time I was doing my testing, 'btrfs show' didn't reflect the actual runtime status of the filesystem from the kernel, and a quick perusal of the btrfs-progs git tree doesn't show anything updated related to that. So, at least in my mind, the question remains - if showing a degraded mount option in the mount arguments, how can one determine whether or not the mounted filesystem status is still degraded, or has been restored to nominal redundancy state?

[1] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Hot...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds