|
|
Subscribe / Log in / New account

The first part of the 6.15 merge window

By Jonathan Corbet
March 28, 2025
As of this writing, 6,653 non-merge changesets have been pulled into the mainline kernel repository for the 6.15 release. This merge window is thus well underway. A number of significant changes have been merged so far; read on for our summary of the first half of the 6.15 merge window.

The most interesting changes merged to date include:

Architecture-specific

  • Support for larger 32-bit x86 systems (those with more than eight CPUs or more than 4GB of RAM) has been removed. Those hardware configurations have been unavailable for a long time, and any workloads needing such resources should have long since moved to 64-bit systems.
  • The way in which the POSIX timer subsystem assigns timer IDs has been enhanced to allow the Checkpoint/Restore in Userspace (CRIU) subsystem to reliably and quickly restore timer IDs.

Core kernel

  • The exit status of a process represented by a pidfd can be fetched even after the process has been reaped; see this commit for more information.
  • The special value PIDFD_SELF can by used by a process to refer to itself in the pidfd-taking system calls.
  • The way the kernel handles pidfd notifications in a multi-threaded process that either executes a new program or exits has changed; see this commit for details.

Filesystems and block I/O

  • There is a new override_creds mount option for overlayfs filesystem that changes the credentials that are used to access the lower layers; see this documentation commit for some more information.
  • All of the kernel's pseudo filesystems have now been converted to the new mount API. Amusingly, the System V filesystem has also been converted, even though it was removed entirely later in this merge window. If that removal has to be reverted for any reason, at least the filesystem will have been updated to match current practice.
  • There is a new API to receive information about filesystem mount and unmount events. This meticulously undocumented API is based on the fanotify mechanism; there are a few notes on its use in this commit.
  • The statmount() system call can now receive information about the ID mappings applied to a filesystem mount. This commit has some information on how the API works.
  • It is now possible to create an ID-mapped mount of another mount that is already ID-mapped, thus changing the mappings. This commit describes the motivation and implementation of this feature at length.
  • There have been a number of mount-API changes to make it easier to assemble complex filesystem hierarchies without exposing partial results or parts of any filesystem that are meant to remain hidden. This merge message contains a lot of details on what is now possible.
  • The block layer has gained support for hardware-wrapped encryption keys. This is a mechanism that allows the kernel to program encryption keys into a block device without actually keeping the key in memory, where it might be disclosed to an attacker. This commit contains a documentation file describing the feature, while this commit includes documentation for the related ioctl() operations.
  • The XFS filesystem now supports zoned storage devices.
  • The EROFS filesystem now supports 48-bit block addressing to enable it to handle massive files.
  • Bcachefs has gained a "scrub" functionality that will attempt to read all data and metadata within a filesystem, then repair, if possible, any errors found. Bcachefs is also now able to handle filesystems with a block size larger than the system page size.

Hardware support

  • Hardware monitoring: Congatec board sensors, Texas Instruments INA233 power monitors, and Measurement Specialties HTU31 humidity and temperature sensors.
  • Input: Apple touch bars.
  • Media: Qualcomm iris V4L2 decoders, Synopsys DesignWare HDMI receivers, and Lontium LT6911UXE decoders.
  • Miscellaneous: Sophgo SG2042 MSI interrupt controllers, Sophgo SG2042 pulse-width modulators, NXP PF9453 regulators, T-HEAD TH1520 power domains, Samsung Galaxy Book platform devices, Huawei Matebook E Go embedded controllers, Rockchip UFS host controllers, Renesas RZ/G3E system controllers, and NXP i.MX8Q reset controllers.
  • Networking: MCTP-over-USB interfaces, Airoha network processor units, and Realtek 8814AE and 8814AU network adapters.
  • Sound: Awinic aw88166 amplifiers.
  • SPI: QPIC SNAND controllers, STMicroelectronics STM32 OCTO SPI controllers, and Sophgo SG2044 SPI NOR controllers.

Networking

  • Work continues toward the breaking up of the RTNL lock (the "big networking lock"), which is a contention bottleneck in much of the networking subsystem.
  • Initial support for zero-copy data reception via io_uring has been added.
  • There is a new TCP socket option, TCP_RTO_MAX_MS, that can be used to set the maximum time between retransmit attempts on an IPv4 connection. There is also a new sysctl knob to set this limit system-wide.
  • There is a new set of BPF callbacks to obtain timestamps from various places in the networking stack; this feature is intended to help in the debugging of latency problems. This feature is severely undocumented, but this commit includes some self tests that show how it works.

Security-related

  • There is a new security hook for the io_uring subsystem, allowing security modules more control over what is allowed; the SELinux security module has gained support for this hook.
  • The SELinux security module can now apply policy controls to many types of data read by the kernel, including firmware images, security policies, certificates, and more. This change drew some criticism from Linus Torvalds, who did not see why it was necessary. It is not clear that the subsequent conversation convinced him of its value, but the feature was merged anyway.

Internal kernel changes

  • The minimum version of Python needed for code shipped with the kernel (including the documentation build system) has been raised to 3.9.
  • The minimum GCC version (for x86 builds) is now 8.1, while the minimum Clang version is 15.0.0.

The 6.15 merge window can be expected to remain open through April 6, after which it will be time to stabilize all of that new work. There are still nearly 6,000 commits sitting in linux-next, so the list of features for the next kernel release is far from complete. As usual, we will post another summary once the merge window closes describing those remaining changes.

Index entries for this article
KernelReleases/6.15


to post comments

Meticulously undocumented

Posted Mar 28, 2025 22:13 UTC (Fri) by nijhof (subscriber, #4034) [Link]

Is meticulously undocumented more or less undocumented than rigorously undocumented?

the self healing work continues in bcachefs

Posted Mar 29, 2025 5:09 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link] (8 responses)

the goal is a filesystem that can recover from /anything/

the self healing work continues in bcachefs

Posted Mar 30, 2025 22:59 UTC (Sun) by motk (subscriber, #51120) [Link] (5 responses)

True vacuum universe ending events? A bold claim.

the self healing work continues in bcachefs

Posted Mar 31, 2025 3:09 UTC (Mon) by koverstreet (✭ supporter ✭, #4296) [Link] (4 responses)

Not quite :) but anything within the realm of possible, yes. If there's still extents and dirents leaf nodes, we should get you a working filesystem with everything possible intact.

We regularly recover from extreme disaster scenarios today - I've been looking at a metadata dump where it looked like a head just skated across the platter, which created some very... particular alloc info inconsistencies, but that's been the only failure to repair in ~6 months, and I've seen logs of some good ones. So that's largely done.

Once the mount API extension happens, plus better communication between the mount helper and systemd/plymouth (because of course communicating things to the user has been getting more complicated), we'll even be able to tell the user "hey, your SSD crapped itself (X IO errors, toast btree nodes), please wait while we reconstruct btree roots/alloc/what have you, here's a progress bar"

And this stuff is pretty fast, too - post 6.14 that dealt with backpointers check/repair. Even btree node scan is fast thanks to a small bitmap in the superblock, if we lose btree roots.

Further off, post experimental, will be finishing off online fsck - and then we'll be able to recover from slightly absurd levels of damage in the background while your filesystem is RW. (People with huge arrays really want this).

the self healing work continues in bcachefs

Posted Mar 31, 2025 4:23 UTC (Mon) by jmalcolm (subscriber, #8876) [Link] (1 responses)

Is systemd going to be a requirement for bcachefs?

the self healing work continues in bcachefs

Posted Mar 31, 2025 4:57 UTC (Mon) by koverstreet (✭ supporter ✭, #4296) [Link]

I'm not anti systemd, but no

the self healing work continues in bcachefs

Posted Apr 3, 2025 8:06 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Will bcachefs be able to completely recover (no data loss) if all data is present on a quorum of replicas?

the self healing work continues in bcachefs

Posted Apr 3, 2025 13:52 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link]

What's missing in this case?

Is self-healing always good?

Posted Apr 3, 2025 8:03 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Is self-healing always wanted? My concerns are:
  1. It could risk trashing good-but-unreachable data, preventing subsequent data recovery operations.
  2. It could hide errors from userspace, such as by reporting “file definitely does not exist” instead of “I/O error occurred and we don’t know if the file exists”.
  3. It could recover data that was never actually present, such as freed disk blocks, creating a security concern.
If the filesystem can’t tell if file X should be there or not, or is uncertain as to what its contents should be, I would prefer that all attempts to access X fail with something other than -ENOENT until and unless the administrator tells the filesystem to use its best guess of what the pre-corruption situation was, or X is overwritten by an operation that makes that state irrelevant. Silently returning wrong data is the worst possible outcome.

Is self-healing always good?

Posted Apr 3, 2025 13:59 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link]

Well, this isn't btrfs - we don't do that.

There are cases where fsck will delete things, but for the most part that's only if we have another piece of metadata that says "this shouldn't exist".

e.g., extent past the end of an inode - something went wrong with truncate.

If a reflink pointer points to a missing indirect extent, we just mark it as poisoned, so on future attempts to read from it we don't have to print out the same error, and we can un-poison it if the indirect extent comes back; this guards against a temporary lookup error in the reflink btree.

For the snapshots btree, a key for a snapshot node that doesn't exist generally indicates a problem with snapshot deletion, and the key will be deleted. But we also track when a btree has lost data (topology error, IO error), and if the snapshots btree has lost data we'll instead try to reconstruct snapshot tree nodes (and also subvolume keys, etc.).

We can reconstruct inodes if the inodes btree has lost data (permissions, ownership, timestamps etc. will all be wrong, and i_size will be a bit off but you'll still have the correct file contents).

This topic is an area of future research, but for all practical purposes we're in good shape.

bcachefs news

Posted Mar 29, 2025 5:19 UTC (Sat) by alison (subscriber, #63752) [Link] (2 responses)

I see that Kent Overstreet is back, assuming that he was ever actually away.

bcachefs news

Posted Mar 29, 2025 5:41 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link] (1 responses)

never stop coding :)

bcachefs news

Posted Mar 29, 2025 17:11 UTC (Sat) by alison (subscriber, #63752) [Link]

> never stop coding :)

Words to live by!

-- Alison


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds