|
|
Log in / Subscribe / Register

The 6.2 kernel has been released

Linus has released the 6.2 kernel as expected.

Please do give 6.2 a testing. Maybe it's not a sexy LTS release like 6.1 ended up being, but all those regular pedestrian kernels want some test love too.

Headline features in this release include the ability to manage linked lists and other data structures in BPF programs, more additions to the kernel's Rust infrastructure, improvements in Btrfs RAID5/6 reliability, IPv6 protective load balancing, faster "Retbleed" mitigation with return stack buffer stuffing, control-flow integrity improvements with FineIBT, oops limits, and more.

See the LWN merge-window summaries (part 1, part 2) and the KernelNewbies 6.2 page for more information.


to post comments

The 6.2 kernel has been released

Posted Feb 20, 2023 10:25 UTC (Mon) by littlesandra88 (guest, #64017) [Link] (30 responses)

Could this be the year where RAID 5/6 becomes stable in btrfs?

I don't understand why it haven't been a priority. Is it because people are happy with ZFS?

The 6.2 kernel has been released

Posted Feb 20, 2023 10:50 UTC (Mon) by atnot (guest, #124910) [Link] (1 responses)

I'll just copy a comment I made before as to why I think it is.

> RAID 5/6 is broken because nobody with money or sufficient time cares enough about it.
> The only place hard drives really live these days is in network storage devices, which do their own redundancy locally, often as some sort of cluster. If you're using local disks they're going to be high performance SSDs, in RAID 10 because the overhead of calculating parity at those speeds would be too high anyway.
> So at this point the only one you've got left is enthusiasts building a DIY NAS at home, who have enough time on their hands to just deal with the inconveniences of ZFS or mdraid anyway.

The 6.2 kernel has been released

Posted Feb 20, 2023 12:02 UTC (Mon) by JMB (guest, #74439) [Link]

The points in your comment are valid - but as I am focussing in relyability I never used ZFS even on a former Sun site
due to announcements that a filesystem checker is not needed and will not come ... even after it emerged it was
burnt in the eyes of many administrators.
Same for BTRFS: if one can not support a feature like RAID5/6 in a reliable way, it should not be implemented
at all - or even advertised. And what should "improved reliability" mean? It will eat your baby in one year?
So all features have to be bullet proof if you really care about reliability - and for me a file system must be
reliable before thinking about using it - even in case that used SSDs or HDDs are not!
For me this is ext4 on the desktop ... and I am astonished that it can even compete with F2FS on falsh storage.
I like file systems being boring ... ;)
The shiny things are in support of latest HW - CPUs, GPUs - and there one is not in that danger losing data ...
if one does ones homework right - and in this respect 6.2 may not be enough - but 6.3 may be better
for fresh AMD systems (Zen 4 & RDNA3) ... LTS kernels are not suitable for the desktop.

The 6.2 kernel has been released

Posted Feb 20, 2023 17:07 UTC (Mon) by MarcB (subscriber, #101804) [Link] (19 responses)

AFAICT, RAID5/6 have really become very niche. I guess there is not that much interest any more.

Personally, I still use it at home, but at work, there is only one system category (out of ~ 30) left, that still uses it. And we are just about to switch this to an distributed object storage as well.

RAID 5/6 used to be a viable compromise between performance, cost and reliability, but nowadays it just falls short on all aspects.
With fast storage, the performance impact is too high (it always was unsuitable for many workloads) and distributed storage solutions offer much higher fault-tolerance where performance requirements are not that high (basically RAID over servers or even data centers).

Essentially, it was killed by SSDs on one end and much higher/cheaper network bandwidth on the other.

The 6.2 kernel has been released

Posted Feb 20, 2023 21:51 UTC (Mon) by littlesandra88 (guest, #64017) [Link] (8 responses)

Where I work we have many servers running ZFS. Each server have HP D6000 enclosures connected as JBOD. The newest have 1.26 PB on each. Each server is ZFS replicated to a slave for backup. Here we use RAID 6 because reliability is the most important, followed by performance and then cost.

Can we really be the only in the world that have single digit peta byte needs where reliability is the most important?

The 6.2 kernel has been released

Posted Feb 21, 2023 1:43 UTC (Tue) by atnot (guest, #124910) [Link] (3 responses)

Well in the common case, high reliability deployments will want to protect against failure of servers just as much as failure of hard drives. This is going to require some sort of redundancy not just across drives, but across storage servers as well. And you're going to need some sort of system to handle that distribution across servers, ideally do something smarter than 3+ identical copies (e.g. reed solomon), and then have some way to group those redundant files into "volumes". No matter which of the solutions that fit this shape you choose, you probably won't be getting much benefit from ZFS anymore.
So it does sound like your infrastructure might be a bit unique if it has not needed to go down this path at all.

The 6.2 kernel has been released

Posted Feb 21, 2023 20:30 UTC (Tue) by littlesandra88 (guest, #64017) [Link] (2 responses)

it is an interesting point about distributed servers and hard disks...

Yes, we probably are a bit unique in that regard. Our view is that downtime is acceptable, as long there is no data loss. So by having a super simple setup the uptime is actually very high, as there is no complicated software or infrastructure that we rely on.

And the simplicity is important in case of recovery. had we gone with GlusterFS or Ceph we would need to be storage experts.

Perhaps our view is very unique/rare...

The 6.2 kernel has been released

Posted Feb 21, 2023 23:53 UTC (Tue) by sjj (guest, #2020) [Link] (1 responses)

I’m curious, why do you think using Ceph would require you to become storage experts, but not ZFS?

The 6.2 kernel has been released

Posted Feb 22, 2023 0:54 UTC (Wed) by littlesandra88 (guest, #64017) [Link]

I have never used it, but Ceph just sounds intimidating when talking about storage nodes, where everything is on one node with ZFS.

So Ceph and ZFS are comparable in complexity?

The 6.2 kernel has been released

Posted Feb 21, 2023 15:34 UTC (Tue) by MarcB (subscriber, #101804) [Link] (3 responses)

> Can we really be the only in the world that have single digit peta byte needs where reliability is the most important?

A distributed setup can easily achieve - or even surpass - the reliability/fault tolerance of RAID-6. So unless bandwidth cost is an issue, or the unavoidable increase in latency is unacceptable for your workloads, it should work equally well and cover more failure scenarios.

The 6.2 kernel has been released

Posted Feb 21, 2023 20:34 UTC (Tue) by littlesandra88 (guest, #64017) [Link] (2 responses)

The distributed setup you speak of. Is there a "turn key" solution or supported distributed setup? We wouldn't mind paying Redhat or other OSS companies for this.

Closed source solutions are out of the question as we don't want to be held hostage with out +10 petabyte times 2 of data.

The 6.2 kernel has been released

Posted Feb 21, 2023 23:47 UTC (Tue) by sjj (guest, #2020) [Link] (1 responses)

The 6.2 kernel has been released

Posted Feb 22, 2023 22:09 UTC (Wed) by riking (subscriber, #95706) [Link]

Unfortunately, the really good error correction codes like GF(2)-based ones have patent encumberance, so ceph includes this extremely awkward plugin system instead of shipping the good algorithms by default.

The 6.2 kernel has been released

Posted Feb 21, 2023 8:58 UTC (Tue) by Wol (subscriber, #4433) [Link] (4 responses)

> Personally, I still use it at home, but at work, there is only one system category (out of ~ 30) left, that still uses it. And we are just about to switch this to an distributed object storage as well.

What's "distributed object storage"? Is that some fancy name for "in the cloud"? And what physical device is it actually stored on? What filesystem does that use?

I really don't know, but sometimes I get the feeling people are so distracted by all the layers of abstraction they forget there has to be some sort of physical reality underneath.

Cheers,
Wol

The 6.2 kernel has been released

Posted Feb 21, 2023 10:13 UTC (Tue) by farnz (subscriber, #17727) [Link]

Amazon S3 is an example of a cloud "distributed object store"; Ceph is an example of an open source distributed object store.

The difference to a traditional setup is that a distributed object store assumes that entire servers can fail, not just single disks. It thus distributes multiple copies of data (and optionally erasure coding) over multiple hosts, not just over multiple disks, so that you can lose entire servers and still retain your data.

As a consequence, the "ideal" physical storage server for a distributed object store is a JBOD or RAID-0 setup - a large amount of fast storage, and rely on the higher levels distributing the data across multiple hosts (or even, in big setups, multiple racks and/or multiple continents). The Ceph team have written a paper about how they distribute chunks across servers such that you can lose multiple servers without losing data - noting that while the default is replication (RAID-1 style if on a single host), you can configure erasure coding instead to get RAID-5/6 style behaviour (although possibly set up to survive a much larger failure - you can set the erasure coding up so that you survive 10 hosts failing, with the data and erasure code chunks distributed across 100 servers, if you so desire).

The 6.2 kernel has been released

Posted Feb 21, 2023 10:25 UTC (Tue) by cesarb (subscriber, #6266) [Link]

> What's "distributed object storage"? Is that some fancy name for "in the cloud"?

It probably means something like Ceph.

> And what physical device is it actually stored on? What filesystem does that use?

In the case of Ceph, the physical device is normal disks in dedicated object storage servers. For modern installations (BlueStore backend), it's stored directly on a raw LVM LV or a raw partition; for older instalations, it's stored on top of a traditional filesystem, normally XFS.

The 6.2 kernel has been released

Posted Feb 21, 2023 15:27 UTC (Tue) by MarcB (subscriber, #101804) [Link]

In our case, it typically are Ceph and MinIO. There also are some Netapp systems that might use RAID internally.

Ceph runs on bare LVM, MinIO on XFS.

The thing is, unless you got some extreme latency requirements, you can easily get more than enough bandwidth to replicate data at least regionally. In that case, it simply makes little sense to limit your redundancy to a single server.

The 6.2 kernel has been released

Posted Feb 21, 2023 18:04 UTC (Tue) by atnot (guest, #124910) [Link]

The other answers are good, but the major thing that distinguishes object stores from traditional filesystems is less the technical details and more the provided API that makes these things possible in the first place. The UNIX filesystem API has all sorts of issues when faced with concurrent usage, as anyone who has tried to run NFS at scale will be able to attest.

Object stores are based around basic atomic operations like e.g.: PUT key [hash] blob, which atomically stores an uploaded file under the given key. There are no concepts like open file handles, directories, writing into the middle of files, symlinks, etc. All operations are asynchronous, expected to fail, and the state after failure is always well defined. They are very easy to use in a way that is both concurrency safe and scales well.

Applications that use object stores will generally still have and use local disks as scratch space, so they do not usually directly replace traditional filesystems.

The 6.2 kernel has been released

Posted Feb 21, 2023 14:59 UTC (Tue) by kilobyte (subscriber, #108024) [Link] (4 responses)

RAID5 over 5 disks gives you 80% of capacity, RAID1(btrfs) only 50%, RAID1(md) 20%. That's a serious saving.

On btrfs you also can (and really should) mix RAID levels: data on RAID5/6, metadata on RAID1/1c3/10. Parity RAID really hates random access but works fine for linear reads/writes; metadata is the former.

The 6.2 kernel has been released

Posted Feb 21, 2023 16:06 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

What do you mean raid-1 (md) only gives you 20%. Any mirror (including md) gives you 50% as standard. You can alter it ...

And apparently btrfs does - I heard it only mirrors metadata by default so it might give you 80% by default - and then you discover you've lost your data when there's a problem. I hope I'm wrong, but given some of the casual disregard I've seen from developers towards user data ...

Cheers,
Wol

The 6.2 kernel has been released

Posted Feb 21, 2023 17:58 UTC (Tue) by sjj (guest, #2020) [Link] (2 responses)

Mirrors are not necessarily pairs, you can have a five disk mirror.

Are you saying btrfs developers specifically have casual disregard towards user data? Pretty wild to throw such claims about.

The 6.2 kernel has been released

Posted Feb 21, 2023 19:05 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> Mirrors are not necessarily pairs, you can have a five disk mirror.

Are you saying that - BY DEFAULT - a mirror requires five disks?

> Are you saying btrfs developers specifically have casual disregard towards user data? Pretty wild to throw such claims about.

Well, I certainly get the impression some do ...

Seriously. If I select raid-1, I expect it to protect my data. Not just the filesystem meta-data. Did you read what I wrote?

Cheers,
Wol

The 6.2 kernel has been released

Posted Feb 21, 2023 21:42 UTC (Tue) by sjj (guest, #2020) [Link]

If you give mdadm five devices and ask for a mirror, it will create a 5-way mirror (mdadm --create /dev/md0 -l raid1 -n 5 /dev/loop?). By default. You are wrong about "50% as standard".

If you select raid1 for data on Btrfs, it will give you raid1 for data. There's nothing nefarious, it's all documented on the man pages etc. But you probably know this, you just wanted to cast baseless aspersions on some open source developers today, which I find distasteful (I have no connection w/btrfs, except as a Fedora user).

The 6.2 kernel has been released

Posted Feb 20, 2023 18:39 UTC (Mon) by dcg (subscriber, #9198) [Link] (4 responses)

At least one of the reasons is that ZFS is designed to not have a write hole, but Btrfs is not. In ZFS parity information is updated in a COW, atomic manner like the rest of the filesystem, Btrfs follows a traditional RAID5/6 design and the parity information is stored at fixed places.

In theory, it is not impossible to make Btrfs add some disk format changes and do what ZFS does, but from what I have read it would require rewriting large parts of the existing code. Depending in your POV, you might consider this a huge design mistake or not. In any case, the Btrfs devs have plans to implement new features including a new RAID5/6 implementation.

The 6.2 kernel has been released

Posted Feb 20, 2023 20:19 UTC (Mon) by pkese (guest, #38717) [Link] (1 responses)

A solution to RAID 5/6 issue has been in the works for a while - it's now at revision 5 of the patch-set and probably getting close to inclusion.
https://www.spinics.net/lists/linux-btrfs/msg132522.html

It's not a big change to the filesystem by any means - about a 1000 lines of code. Far less than 1% of btrfs specific kernel code gets touched by this patch set.
It solves not only RAID 5/6 but also a related RAID 1 issue, which was exclusive to running on SMR disks.

The 6.2 kernel has been released

Posted Feb 20, 2023 20:26 UTC (Mon) by pkese (guest, #38717) [Link]

Typo. Not SMR but ZNS.

The 6.2 kernel has been released

Posted Feb 20, 2023 21:39 UTC (Mon) by joib (subscriber, #8541) [Link] (1 responses)

Aren't the 6.2 changes at least partially fixing the write hole by doing a full-stripe data checksum validation as part of a sub-stripe RMW?

https://kernelnewbies.org/Linux_6.2#Btrfs_improvements

Explanation in this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

The 6.2 kernel has been released

Posted Feb 22, 2023 19:26 UTC (Wed) by kreijack (guest, #43513) [Link]

> Aren't the 6.2 changes at least partially fixing the write hole by doing a full-stripe data checksum validation as part of a sub-stripe RMW?

True, but unfortunately it doesn't address all the issues. IIRC this reduce the likelihood of the proliferation of the corruption.

To solve the write hole problem a journal or the RST (raid stripe tree) is needed.

My understanding is that the raid5/6 btrfs implementation doesn't get enough interest because its bugs. And being a low interest thing, nobody is pushed to solve these issue. Is like the egg/chicken problem:
- if there are bugs there is no interest
- if there is no interest there aren't the users
- if there aren't users there is no bugfix
- if there is no bugfix, there are bugs....

The 6.2 kernel has been released

Posted Feb 20, 2023 23:18 UTC (Mon) by andyc (subscriber, #1130) [Link] (2 responses)

> I don't understand why it haven't been a priority. Is it because people are happy with ZFS?

XFS + MD here.

The 6.2 kernel has been released

Posted Feb 21, 2023 6:30 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (1 responses)

> XFS + MD here.

Here, too. We don't use ZFS because we don't use anything which is out-of-tree.

The 6.2 kernel has been released

Posted Feb 21, 2023 7:09 UTC (Tue) by cyperpunks (subscriber, #39406) [Link]

RHEL seems to wants us to use XFS, it works just fine for us. File systems is not the place to be innovative and fancy.


Copyright © 2023, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds