The ABI status of filesystem formats
Users reporting kernel regressions will receive a varying amount of sympathy, depending on where the regression is. For normal user-space programs using the system-call API, that sympathy is nearly absolute, and changes that break things will need to be redone. This view of the ABI also extends to the virtual filesystems, such as /proc and sysfs, exported by the kernel. Changes that break things are deemed a little more tolerable when they apply to low-level administrative tools; if there is only one program that is known to use an interface, and that program has been updated, the change may be allowed. On the other hand, nobody will be concerned about changes that break out-of-tree kernel modules; the interface they used is considered to be internal to the kernel and not subject to any stability guarantee.
But those are not the only places where user space interfaces with the kernel. Consider, for example, this regression report from Josh Triplett. It seems that an ext4 filesystem bug fix merged for 5.9-rc2 breaks the mounting of some ext4 filesystems that he works with.
These filesystems are created for some unspecified internal purpose with an unreleased internal tool. They are meant to be read-only, so there will never be any need (or ability) to create new files on them. As a space-saving measure, this tool overlays all of the block and inode bitmaps onto a single block, set to all ones, indicating that all blocks and inodes are already allocated. To indicate that this has been done, this tool marks the filesystem with a special flag (EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS). This flag is defined by the ext4 tools, but is not used by the kernel in any way. It is placed in the set of read-only compatibility flags, though, meaning that a kernel that sees it in a filesystem will know that said filesystem can be safely mounted, but only in read-only mode.
Until 5.9-rc2, mainline kernels were happy to mount these filesystems. The commit highlighted by Triplett changed that situation by adding some checks to the mount-time ext4 verifier that ensures that the filesystem image is valid. As a result of these new checks, the overlapping bitmaps are detected, the kernel complains, and any attempt to mount the filesystem fails. Something that used to work no longer does — the definition of a regression. Triplett included in his report a small patch that disables the validity check when the EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS flag is present, rendering his filesystems mountable again.
That change is likely to be merged, but it has not brought great joy to the filesystem developers, who see it as a sign of things going wrong. XFS maintainer Darrick Wong argued that "unofficial" filesystem variants should not be supported:
Triplett responded that filesystem images should fall within the realm of the kernel ABI:
Ext4 maintainer Ted Ts'o said that he was not opposed to Triplett's patch, but suggested that further patches to support this tool might receive a chillier reception. He then told the story of the make_ext4fs tool, an independently written ext4 filesystem creator used for years by Android. It created a long list of compatibility and corruption problems, he said, that took years to iron out. Third-party filesystem tools are prone to such problems, he said:
Filesystem developers have a certain natural aversion to "mischief", so it is unsurprising that Ts'o would prefer that these outside tools simply not exist. At a minimum, he suggested, future policy should say that filesystem-image regressions would only need to be addressed for images that were created and managed by the designated official tools. He requested that Triplett find a way to get rid of his custom tool.
Triplett disagreed with that policy suggestion, saying that a more reasonable approach would be that any filesystem images that pass the e2fsck checker should be supported. Not all tools that work with the ext4 format should have to live in the e2fsprogs repository; to say otherwise, he added, would be tantamount to saying that the FreeBSD kernel, which has an ext4 filesystem driver, should live in e2fsprogs too. The tool in question here, which Triplett finally described late in the conversation, appears equally unsuited to inclusion in the e2fsprogs repository.
Triplett concluded that message with a restatement of his request:
Ts'o's response made it clear that he is uninclined to grant that wish; that would, he said, make it impossible to fix security-related problems related to invalid filesystem images, of which there are many. Many of these invalid images — often generated by fuzzing tools or attackers — pass e2fsck until the problems are found and fixed. Grandfathering in any image that passes e2fsck would thus, he said, require invalid filesystems to be supported forever. He concluded with some suggestions for other ways to solve Triplett's problem and requested that Triplett work more closely with the ext4 developers in the future.
As of this writing, that is where the discussion has stopped. Ts'o's
willingness to apply the fix for the immediate problem means that there is
no pressing need to resolve the larger issue of regressions involving
filesystems; that, in turn, means that the issue is likely to come up again
at some point. The kernel ABI is a large and amorphous thing, and many of
the boundaries are fuzzy at best. Filesystems are one area where that
boundary has not yet been fully explored; somebody is likely to
inadvertently end up on the wrong side of it sooner or later.
| Index entries for this article | |
|---|---|
| Kernel | Development model/User-space ABI |
| Kernel | Filesystems/ext4 |
