Going big with TCP packets

Posted Feb 15, 2022 14:44 UTC (Tue) by HenrikH (subscriber, #31152)
In reply to: Going big with TCP packets by mageta
Parent article: Going big with TCP packets

Well so are 100Gb/s NICs and switches so neither one is consumer level driven at the moment.

Going big with TCP packets

Posted Feb 15, 2022 14:59 UTC (Tue) by Wol (subscriber, #4433) [Link] (4 responses)

Well, admittedly my home system isn't configured like a lot of them ...

But I reorganised (as in archived last year's) loads of mailing lists, and it was noticeable that even after Thunderbird came back and said "finished", that the disk subsystem - a raid-5 array - was desperately struggling to catch up. With 32GB of ram, provided it caches everything fine I'm not worried, but it's still a little scary knowing there's loads of stuff in the disk queue flushing as fast as possible ...

Cheers,
Wol

Going big with TCP packets

Posted Aug 1, 2022 23:28 UTC (Mon) by WolfWings (subscriber, #56790) [Link] (3 responses)

That's a large reason my home NAS is a lot of smaller spindles when I built it last, using 2.5" 2TB HDD drives currently. Yeah there's single 3.5" drives in the next year or two that can approach the capacity of the array, but the throughput craters in that case especially for random I/O in comparison, and if I lose a 2TB drive it's a bit under 4 hours for a rebuild not days.

Since that's limited entirely by the write speed of the 2TB drive I've been thinking about adding a single NVMe exclusively as a hot-spare just to reduce that time down to about 30 minutes TBH.

Going big with TCP packets

Posted Aug 1, 2022 23:39 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

What raid level?

A four or five drive raid-6 reduces the danger of a disk failure. An NVMe cache will speed up write time. And the more spindles you have, the faster your reads, regardless of array size.

Cheers,
Wol

Going big with TCP packets

Posted Aug 2, 2022 21:13 UTC (Tue) by bartoc (guest, #124262) [Link] (1 responses)

Once the drives get big enough it makes sense to just use something like btrfs raid10, instead of something like raid6, rebuilds still take a long time but don't have to read all of every drive anymore. There are also fewer balancing issues if you add more drives. Actually, even with raid6 you should probably use something like btrfs or zfs (zfs can have some creeping performance problems, and is harder to expand/contract, but is better tested). Btrfs raid6 is said to be "unsafe" but in reality, it is probably safer than mdraid raid6 or a raid controller's raid6.

Not to mention the additional cost of the bigger drives (per TB) is offset by needing less "other crap" to drive them. You need smaller RAID enclosures, fewer controllers/expanders/etc, less space, and so on.

A four drive raid6 is pretty pointless, you get the write hole and the write amplification for a total of .... zero additional space efficiency. Just use a check summing raid10 type filesystem. IMHO 8-12 disks is the sweet spot for raid6.

fun quote from the parity delustering paper, published 1992:
> Since the time necessary to reconstruct the contents of a failed disk is certainly minutes and possibly hours, we focus this paper on the performance of a continuous-operation storage subsystem during on-line failure recovery.

My last raid rebuild was I think 5 full days long, using a small array of 18 TB disks.

Going big with TCP packets

Posted Aug 2, 2022 23:01 UTC (Tue) by Wol (subscriber, #4433) [Link]

> A four drive raid6 is pretty pointless, you get the write hole and the write amplification for a total of .... zero additional space efficiency. Just use a check summing raid10 type filesystem. IMHO 8-12 disks is the sweet spot for raid6.

But a four-drive raid-10 is actually far more dangerous to your data ... A raid 6 will survive a double disk failure. A double failure on a raid 10 has - if I've got my maths right - a 30% chance of trashing your array.

md-raid-6 doesn't (if configured that way) have a write hole any more.

> Btrfs raid6 is said to be "unsafe" but in reality, it is probably safer than mdraid raid6 or a raid controller's raid6.

I hate to say it, but I happen to KNOW that btrfs raid6 *IS* unsafe. A lot LESS safe than md-raid-6. It'll find problems better, but it's far more likely that those problems will have trashed your data. Trust me on that ...

At the end of the day, different raids have different properties. And most of them have their flaws. Unfortunately, at the moment btrfs parity raid is more flaw and promise than reality.

Oh - and my setup - which I admit chews up disk - is 3-disk raid-5 with spare, over dm-integrity and under lvm. Okay, it'll only survive one disk failure, but it will also survive corruption, just like btrfs. But each level of protection has its own dedicated layer - the Unix "do one thing and do it well". Btrfs tries to do everything - which does have many advantages - but unfortunately it's a "jack of all trades, crap at some of them", one of which unfortunately is parity raid ...

And while I don't know whether my layers support trim, if they do, btrfs has no advantage over me on time taken to rebuild/recover an array. Btrfs knows what part of the disk is use, but so does any logical/physical device that supports trim ...

Cheers,
Wol