The first part of the 6.15 merge window
The most interesting changes merged to date include:
Architecture-specific
- Support for larger 32-bit x86 systems (those with more than eight CPUs or more than 4GB of RAM) has been removed. Those hardware configurations have been unavailable for a long time, and any workloads needing such resources should have long since moved to 64-bit systems.
- The way in which the POSIX timer subsystem assigns timer IDs has been enhanced to allow the Checkpoint/Restore in Userspace (CRIU) subsystem to reliably and quickly restore timer IDs.
Core kernel
- The exit status of a process represented by a pidfd can be fetched even after the process has been reaped; see this commit for more information.
- The special value PIDFD_SELF can by used by a process to refer to itself in the pidfd-taking system calls.
- The way the kernel handles pidfd notifications in a multi-threaded process that either executes a new program or exits has changed; see this commit for details.
Filesystems and block I/O
- There is a new override_creds mount option for overlayfs filesystem that changes the credentials that are used to access the lower layers; see this documentation commit for some more information.
- All of the kernel's pseudo filesystems have now been converted to the new mount API. Amusingly, the System V filesystem has also been converted, even though it was removed entirely later in this merge window. If that removal has to be reverted for any reason, at least the filesystem will have been updated to match current practice.
- There is a new API to receive information about filesystem mount and unmount events. This meticulously undocumented API is based on the fanotify mechanism; there are a few notes on its use in this commit.
- The statmount() system call can now receive information about the ID mappings applied to a filesystem mount. This commit has some information on how the API works.
- It is now possible to create an ID-mapped mount of another mount that is already ID-mapped, thus changing the mappings. This commit describes the motivation and implementation of this feature at length.
- There have been a number of mount-API changes to make it easier to assemble complex filesystem hierarchies without exposing partial results or parts of any filesystem that are meant to remain hidden. This merge message contains a lot of details on what is now possible.
- The block layer has gained support for hardware-wrapped encryption keys. This is a mechanism that allows the kernel to program encryption keys into a block device without actually keeping the key in memory, where it might be disclosed to an attacker. This commit contains a documentation file describing the feature, while this commit includes documentation for the related ioctl() operations.
- The XFS filesystem now supports zoned storage devices.
- The EROFS filesystem now supports 48-bit block addressing to enable it to handle massive files.
- Bcachefs has gained a "scrub" functionality that will attempt to read all data and metadata within a filesystem, then repair, if possible, any errors found. Bcachefs is also now able to handle filesystems with a block size larger than the system page size.
Hardware support
- Hardware monitoring: Congatec board sensors, Texas Instruments INA233 power monitors, and Measurement Specialties HTU31 humidity and temperature sensors.
- Input: Apple touch bars.
- Media: Qualcomm iris V4L2 decoders, Synopsys DesignWare HDMI receivers, and Lontium LT6911UXE decoders.
- Miscellaneous: Sophgo SG2042 MSI interrupt controllers, Sophgo SG2042 pulse-width modulators, NXP PF9453 regulators, T-HEAD TH1520 power domains, Samsung Galaxy Book platform devices, Huawei Matebook E Go embedded controllers, Rockchip UFS host controllers, Renesas RZ/G3E system controllers, and NXP i.MX8Q reset controllers.
- Networking: MCTP-over-USB interfaces, Airoha network processor units, and Realtek 8814AE and 8814AU network adapters.
- Sound: Awinic aw88166 amplifiers.
- SPI: QPIC SNAND controllers, STMicroelectronics STM32 OCTO SPI controllers, and Sophgo SG2044 SPI NOR controllers.
Networking
- Work continues toward the breaking up of the RTNL lock (the "big networking lock"), which is a contention bottleneck in much of the networking subsystem.
- Initial support for zero-copy data reception via io_uring has been added.
- There is a new TCP socket option, TCP_RTO_MAX_MS, that can be used to set the maximum time between retransmit attempts on an IPv4 connection. There is also a new sysctl knob to set this limit system-wide.
- There is a new set of BPF callbacks to obtain timestamps from various places in the networking stack; this feature is intended to help in the debugging of latency problems. This feature is severely undocumented, but this commit includes some self tests that show how it works.
Security-related
- There is a new security hook for the io_uring subsystem, allowing security modules more control over what is allowed; the SELinux security module has gained support for this hook.
- The SELinux security module can now apply policy controls to many types of data read by the kernel, including firmware images, security policies, certificates, and more. This change drew some criticism from Linus Torvalds, who did not see why it was necessary. It is not clear that the subsequent conversation convinced him of its value, but the feature was merged anyway.
Internal kernel changes
- The minimum version of Python needed for code shipped with the kernel (including the documentation build system) has been raised to 3.9.
- The minimum GCC version (for x86 builds) is now 8.1, while the minimum Clang version is 15.0.0.
The 6.15 merge window can be expected to remain open through April 6,
after which it will be time to stabilize all of that new work. There are
still nearly 6,000 commits sitting in linux-next, so the list of features
for the next kernel release is far from complete. As usual, we will post
another summary once the merge window closes describing those remaining
changes.
| Index entries for this article | |
|---|---|
| Kernel | Releases/6.15 |
Posted Mar 28, 2025 22:13 UTC (Fri)
by nijhof (subscriber, #4034)
[Link]
Posted Mar 29, 2025 5:09 UTC (Sat)
by koverstreet (✭ supporter ✭, #4296)
[Link] (8 responses)
Posted Mar 30, 2025 22:59 UTC (Sun)
by motk (subscriber, #51120)
[Link] (5 responses)
Posted Mar 31, 2025 3:09 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link] (4 responses)
We regularly recover from extreme disaster scenarios today - I've been looking at a metadata dump where it looked like a head just skated across the platter, which created some very... particular alloc info inconsistencies, but that's been the only failure to repair in ~6 months, and I've seen logs of some good ones. So that's largely done.
Once the mount API extension happens, plus better communication between the mount helper and systemd/plymouth (because of course communicating things to the user has been getting more complicated), we'll even be able to tell the user "hey, your SSD crapped itself (X IO errors, toast btree nodes), please wait while we reconstruct btree roots/alloc/what have you, here's a progress bar"
And this stuff is pretty fast, too - post 6.14 that dealt with backpointers check/repair. Even btree node scan is fast thanks to a small bitmap in the superblock, if we lose btree roots.
Further off, post experimental, will be finishing off online fsck - and then we'll be able to recover from slightly absurd levels of damage in the background while your filesystem is RW. (People with huge arrays really want this).
Posted Mar 31, 2025 4:23 UTC (Mon)
by jmalcolm (subscriber, #8876)
[Link] (1 responses)
Posted Mar 31, 2025 4:57 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link]
Posted Apr 3, 2025 8:06 UTC (Thu)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Apr 3, 2025 13:52 UTC (Thu)
by koverstreet (✭ supporter ✭, #4296)
[Link]
Posted Apr 3, 2025 8:03 UTC (Thu)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Apr 3, 2025 13:59 UTC (Thu)
by koverstreet (✭ supporter ✭, #4296)
[Link]
There are cases where fsck will delete things, but for the most part that's only if we have another piece of metadata that says "this shouldn't exist".
e.g., extent past the end of an inode - something went wrong with truncate.
If a reflink pointer points to a missing indirect extent, we just mark it as poisoned, so on future attempts to read from it we don't have to print out the same error, and we can un-poison it if the indirect extent comes back; this guards against a temporary lookup error in the reflink btree.
For the snapshots btree, a key for a snapshot node that doesn't exist generally indicates a problem with snapshot deletion, and the key will be deleted. But we also track when a btree has lost data (topology error, IO error), and if the snapshots btree has lost data we'll instead try to reconstruct snapshot tree nodes (and also subvolume keys, etc.).
We can reconstruct inodes if the inodes btree has lost data (permissions, ownership, timestamps etc. will all be wrong, and i_size will be a bit off but you'll still have the correct file contents).
This topic is an area of future research, but for all practical purposes we're in good shape.
Posted Mar 29, 2025 5:19 UTC (Sat)
by alison (subscriber, #63752)
[Link] (2 responses)
Meticulously undocumented
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
the self healing work continues in bcachefs
Is self-healing always wanted? My concerns are:
Is self-healing always good?
If the filesystem can’t tell if file X should be there or not, or is uncertain as to what its contents should be, I would prefer that all attempts to access X fail with something other than -ENOENT until and unless the administrator tells the filesystem to use its best guess of what the pre-corruption situation was, or X is overwritten by an operation that makes that state irrelevant. Silently returning wrong data is the worst possible outcome.
Is self-healing always good?
bcachefs news
