A setback for fs-verity

By Jonathan Corbet
January 3, 2019

The fs-verity mechanism, created to protect files on Android devices from hostile modification by attackers, seemed to be on track for inclusion into the mainline kernel during the current merge window when the patch set was posted at the beginning of November. Indeed, it wasn't until mid-December that some other developers started to raise objections. The resulting conversation has revealed a deep difference of opinion regarding what makes a good filesystem-related API and may have implications for how similar features are implemented in the future.

The core idea behind fs-verity is the use of a Merkle tree to record a hash value associated with every block in a file. Whenever data from a protected file is read, the kernel first verifies the relevant block(s) against the hashes, and only allows the operation to proceed if there is a match. An attacker may find a way to change a critical file, but there is no way to change the Merkle tree after its creation, so any changes made would be immediately detected. In this way, it is hoped, Android systems can be protected against certain kinds of persistent malware attacks.

There is no opposition to the idea of adding functionality to the kernel to detect hostile modifications to files. It turns out, though, there there is indeed some opposition to how this functionality has been implemented in the current patch set. See the above-linked article and this documentation patch for details of how fs-verity is meant to work. In short, user space is responsible for the creation of the Merkle tree, which must be surrounded by header structures and carefully placed at the beginning of a block after the end of the file data. An ioctl() call tells the kernel that fs-verity is to be invoked on the file; after that, the location of the end of the file (from a user-space point of view) is changed to hide the Merkle tree from user space, and the file itself becomes read-only.

Christoph Hellwig was the first to oppose the work, less than two weeks before the opening of the merge window. The storage of the Merkle tree inline was, he said, "simply not acceptable" and the interface should not require a specific way of storing this data. He later suggested that the hash data should be passed separately to the ioctl() call, rather than being placed after the file data. Darrick Wong suggested a similar interface, noting that it would give the filesystem a lot of flexibility in terms of how the hash data would be stored.

Dave Chinner complained that storing the Merkle tree after the end of the file was incompatible with how some filesystems (XFS in particular) use that space. He described the approach as being "gross", arguing that it "bleeds implementation details all over the API" and creates problems far beyond the filesystems that actually implement fs-verity:

That's the problem here - fsverity completely redefines the layout of user data files for everyone, not just fsverity, and not just the filesystems that implement fsverity. You've taken an ext4 fsverity implementation feature and promoted it to being a linux-wide file data layout standard that is encoded into the kernel/user ABI/API forever more.

Chinner, too, argued that the Merkle-tree data should be provided separately to the kernel, rather than being stored in the file itself using a specific format. Filesystem implementations could still put the data after the end of the existing data, but that is a detail that should, according to Chinner be hidden from user space.

Eric Biggers, the developer of fs-verity, responded that, while the API requires user space to place the Merkle tree after the end of user data, there is no actual need for filesystems to keep it there:

As explained in the documentation, the core code uses the "metadata after EOF" format for the API, but not necessarily the on-disk format. I.e., FS_IOC_ENABLE_VERITY requires it, but during the ioctl the filesystem can choose to move the metadata into a different location, such as a file stream.

He also said that passing the Merkle tree in as a memory buffer is problematic, since it could be too large to fit into memory on a small system. (The size of this data also prevents it from being stored as an extended attribute as some have suggested.) Generating the hash data in the kernel was also considered, Biggers said, but it was concluded that this task was better handled in user space.

Ted Ts'o claimed repeatedly that there would be no value to be had by changing the API for creating protected files; he described the complaints as "really more of a philosophical objection than anything else". The requested API, he said, could be added later (in addition to the proposed API, which would have to be maintained indefinitely) if it turned out to be necessary. After the discussion continued for a while, he escalated the discussion to Linus Torvalds, asking for a decision:

Linus --- we're going round and round, and I don't think this is really a technical dispute at this point, but rather an aesthetics one. Will you be willing to accept my pull request for a feature which is being shipped on millions of Android phones, has been out for review for months, and for which, if we *really* need to add uselessly complicated interface later, we can do that?

Correction: I've been reminded that there was an extensive discussion of this work in early 2018 where many of the same objections were raised.

Complaining that the code had been out for review makes some sense; it is true that the objections surfaced at something close to the last minute. But that often happens in kernel development; the imminent merging of controversial code can concentrate developers' minds in that direction. Arguing that the API is already being shipped is definitely not a way to win favor. That notwithstanding, Ts'o had clearly hoped for a ruling from Torvalds that the current API was good enough and that the code could be merged.

What came back might well have failed to please anybody in the discussion, though. It turns out that Torvalds has no real objection to the model of storing the hash data at the end of the file itself:

So honestly, I personally *like* the model of "the file contains its own validation data" model. I think that's the right model, so that you can then basically just do "enable verification on this file, and verify that the root hash is this".

So that part I like. I think the people who argue for "let's have a separate interface that writes the merkle tree data" are completely wrong.

From there, though, he made it clear that he was not happy with the current implementation. This model, he said, should be independent of any specific filesystem, so it should be entirely implemented in the virtual filesystem layer. At that point, filesystems like XFS would never even see the fs-verity layer, so its implementation could not be a problem for them. A generic implementation would require no filesystem-specific code and would just work universally. He also disliked the trick that hides the Merkle tree after the fs-verity mode has been set; the validation data for the file should just be a part of the file itself, he said.

As Ts'o pointed out, keeping the hash data visible in the file would create confusion for higher-level software that has its own ideas about the format of any given file. He also provided some reasons for why he thinks filesystems need to be aware of fs-verity; they include ensuring that the right thing happens if a filesystem containing protected files is mounted by an older version of the filesystem code. Making fs-verity fully generic would, he said, have forced low-level API changes that would have affected "dozens of filesystems", a cost that he doesn't think is justified by the benefits.

The last message from Ts'o was sent on December 22; Torvalds has not responded to it. ~~There has not, however, been a pull request for fs-verity sent, and it is getting late in the merge window for such a thing to show up.~~ [Correction: a pull request was sent copied only to the linux-fscrypt mailing list; it has not received a response as of this writing.] It seems likely that fs-verity is going to have to skip this development cycle while the patches are reworked to address some of the objections that have been raised — those from Torvalds, at least. Even then, the work might be controversial; it is rare for the kernel to interpret the contents of files, rather than just serving as a container for them, and some developers are likely to dislike an implementation that depends on that sort of interpretation. But if Torvalds remains in favor of such an approach, it is likely to find its way into the kernel eventually.

Index entries for this article
Kernel	Filesystems/fs-verity
Kernel	Security/Integrity verification
Security	Integrity management

A setback for fs-verity

Posted Jan 3, 2019 20:13 UTC (Thu) by simcop2387 (subscriber, #101710) [Link] (30 responses)

I wonder if this would do better to store it in part of the xattr data instead?

Extended attributes

Posted Jan 3, 2019 20:37 UTC (Thu) by corbet (editor, #1) [Link] (29 responses)

As noted in the article, extended attributes (at least as implemented in Linux) won't work. The Merkle-tree data is simply too large to be stored that way.

Extended attributes

Posted Jan 3, 2019 22:13 UTC (Thu) by TheGopher (subscriber, #59256) [Link] (28 responses)

Just to iterate a bit on this: Why aren't xattr's changed then to support this size of data? This seems to be a case of "The current mechanism doesn't support storing metadata of this size, so let's create a new one..." instead of the general way of Linux, which is to fix things.

Extended attributes

Posted Jan 3, 2019 22:35 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Alternate data streams were suggested but shot down.

Extended attributes

Posted Jan 3, 2019 22:38 UTC (Thu) by foom (subscriber, #14868) [Link] (4 responses)

A generic mechanism would seem to be a better idea...

But, to make xattrs support large data would effectively also require creating a brand new mechanism. It's not quite simple. As the tip of the iceberg, "getxattr" and "setxattr" can only deal with the entire value at once -- not a good idea for a large data stream.

However, other OSes do support this sort of thing, allowing "forks" of the file to be opened for reading/writing just as a normal file. E.g., Windows NTFS has "alternate data streams", and Solaris has "fsattr". (https://docs.oracle.com/cd/E19253-01/816-5175/6mbba7f02/)

Extended attributes

Posted Jan 4, 2019 8:43 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

That doesn’t really help, since the alternate data streams can be used by applications, so the fs-verity Merkle tree needs to be for the whole file, including all its forks?

Extended attributes

Posted Jan 4, 2019 9:01 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Merkle tree is self-verifying, so it doesn't need to be further checksummed.

Extended attributes

Posted Jan 5, 2019 21:22 UTC (Sat) by epa (subscriber, #39769) [Link] (1 responses)

What I mean is that since the alternate data streams are exposed to user space, each one needs its Merkle tree. So it would appear you need to implement the verifying at some lower level of filesystem code where you are dealing with a single piece of data. Either that or generalize it to add one extra steam for each stream that exists - and then arrange to hide these extra ones from user space.

Extended attributes

Posted Jan 7, 2019 2:40 UTC (Mon) by marcH (subscriber, #57642) [Link]

> or generalize it to add one extra steam for each stream that exists

Sounds good.

> and then arrange to hide these extra ones from user space.

Why? Aren't stream typed on some way?

Extended attributes

Posted Jan 4, 2019 12:20 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (21 responses)

What about defining a special xattr that just points (via path, like a symlink) to a separate file containing the validation data? This way, you obviate the xattr size restriction. VFS, when it opened a file protected by this symlink-to-validation-blob xattr, would just transparently open the pointed-to file as well. You could even arrange for unlink(2) to delete the pointed-to validation file as well, if you wanted to remain compatible with today's file management code.

Extended attributes

Posted Jan 4, 2019 14:20 UTC (Fri) by dskoll (subscriber, #1630) [Link] (4 responses)

I like this idea, but on the other hand, you now used up two file descriptors each time you open a file, and something has to manage the hidden descriptor. You could also run into weird issues if the second open fails, but then I guess you'd just fail everything.

Extended attributes

Posted Jan 4, 2019 19:02 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (3 responses)

Not two file descriptors --- from userspace's POV, there's just one file and its descriptor. That VFS internally would maintain an internal pointer to a different struct file is just an implementation detail. And yes, VFS could just fail the open if the authentication blob file couldn't be opened.

Extended attributes

Posted Jan 4, 2019 20:26 UTC (Fri) by dskoll (subscriber, #1630) [Link] (2 responses)

You could generalize this idea to allow multiple data forks. The fs-verity Merkle tree would be a special data fork that could be set once only and then never changed. I think this is a much nicer approach than shoving the verification data at the end of the file.

Extended attributes

Posted Jan 4, 2019 20:34 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (1 responses)

generalization is the enemy of LKML patch acceptance though. :-) I think starting simple might be a good approach. I'll send an LKML email.

Extended attributes

Posted Jan 4, 2019 20:40 UTC (Fri) by dskoll (subscriber, #1630) [Link]

Haha. :) I'm not a kernel developer and have only recently started a job that requires looking deeply into the kernel, so I'm a newbie at this.

Extended attributes

Posted Jan 5, 2019 1:44 UTC (Sat) by himi (subscriber, #340) [Link] (13 responses)

Another option for generalising xttrs to support larger data sizes would be to allow for an xattr that pointed to an inode which would contain the extra data, with an extension to the API that allowed the inode (which would otherwise not have any path associated) to be opened to give a regular file descriptor that could be treated the same as any other open file. Voila - "multiple file streams" in a way that's completely transparent to the rest of the filesystem . . .

Extended attributes

Posted Jan 5, 2019 7:46 UTC (Sat) by alonz (subscriber, #815) [Link] (8 responses)

It can't be 100% transparent: you need the various integrity-validation mechanisms (fsck and its online brethren) to be aware, so they won't consider these inodes to be orphaned.

Extended attributes

Posted Jan 5, 2019 8:08 UTC (Sat) by bof (subscriber, #110741) [Link] (7 responses)

Hmm.

Keeping the extra streams could be as easy as putting them in a hidden directory of the same filesystem, e.g. ..forx at the root (each stream named after the original inode plus some discriminator for multiple streams of a given file). That should be transparent to any existing fsck.

Instead of xattrs, these /..forx/inum things could even just be directories by themselves, with each (named or whatever) fork getting a regular inode inside. Which would even allow for nested forks, a specific fork being a symlink, device node, have special ownership and permissions, times, .....

A special mount option to ignore the magic, would make the filesystem wholly-copyable/clonable using the usual tools, too.

However, whatever the exact approach, there's the little issue of stuff like "du" reporting underestimated sizes, the bigger issue of teaching any kind of "cp" like command to cope with the forks (including changes to archiver file formats....) - and all of that would only just simply work out for local filesystems, not NFS + friends.

Extended attributes

Posted Jan 5, 2019 9:18 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Instead of xattrs, these /..forx/inum things could even just be directories by themselves, with each (named or whatever) fork getting a regular inode inside. Which would even allow for nested forks, a specific fork being a symlink, device node, have special ownership and permissions, times, .....
There were several attempts to make "file-as-directory" thingies (I remember one in Reiser4). Whatever happened to them?

Extended attributes

Posted Jan 5, 2019 22:21 UTC (Sat) by neilbrown (subscriber, #359) [Link] (5 responses)

> There were several attempts to make "file-as-directory" thingies (I remember one in Reiser4). Whatever happened to them?

They all hit the cold hard wall of practicality. You cannot create semantics that do what you want.

If you want a file to start acting like a directory, it has to stop acting like a file. One way to achieve this is to "-o loop" mount it somewhere else with an appropriate filesystem driver. You could come up with other approaches, such as an overlay filesystem which presents select files as directories.

Extended attributes

Posted Jan 7, 2019 2:48 UTC (Mon) by marcH (subscriber, #57642) [Link] (4 responses)

> If you want a file to start acting like a directory, it has to stop acting like a file.

Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?

https://www.oreilly.com/library/view/mac-os-x/0596004605/...

Resource Forks, Bundles, etc.

Posted Jan 7, 2019 19:32 UTC (Mon) by jccleaver (guest, #127418) [Link] (3 responses)

Well, yes and no.

HFS (and HFS+, and MFS) all have a native understanding of dual-forked files. It's been a part of the Mac OS world ever since the original 128k Mac in 1984. The problem is that the rest of the computing world mostly had adopted the "a file is a data stream and that's it" model of unix. This is what led to years of problems transferring Mac files around via FTP on the internet and the creation and adoption of formats like MacBinary and BinHex. For MIME purposes, AppleSingle and AppleDouble were used. From RFC1740 https://www.ietf.org/rfc/rfc1740.txt

> Files on the Macintosh consists of two parts, called forks:
>
> Data fork: The actual data included in the file. The Data
> fork is typically the only meaningful part of a
> Macintosh file on a non-Macintosh computer system.
> For example, if a Macintosh user wants to send a
> file of data to a user on an IBM-PC, she would only
> send the Data fork.
>
> Resource fork: Contains a collection of arbitrary attribute/value
> pairs, including program segments, icon bitmaps,
> and parametric values.
>
> Additional information regarding Macintosh files is stored by the
> Finder in a hidden file, called the "Desktop Database".
>
> Because of the complications in storing different parts of a
> Macintosh file in a non-Macintosh filesystem that only handles
> consecutive data in one part, it is common to convert the Macintosh
> file into some other format before transferring it over the network.
>
> The two styles of use are [APPL90]:
>
> AppleSingle: Apple's standard format for encoding Macintosh files
> as one byte stream.
> AppleDouble: Similar to AppleSingle except that the Data fork is
> separated from the Macintosh-specific parts by the
> AppleDouble encoding.
>
> AppleDouble is the preferred format for a Macintosh file that is to
> be included in an Internet mail message, because it provides
> recipients with Macintosh computers the entire document, including
> Icons and other Macintosh specific information, while other users
> easily can extract the Data fork (the actual data) as it is separated
> from the AppleDouble encoding.

It was not uncommon for some files to be entirely resource forks and with empty data forks.

In fact, before the PowerPC era and the increasing complexity of shared libraries and apps generally, it was not uncommon for application programs in the System 6 era to consist entirely of a single file, which could be located where-ever you wanted to. (Imagine if, on the *nix side, all your gettext files and support directories and other miscellaneous crap even console programs might have strewn about the file system were all in a single file on the system.)

Eventually, things got more and more complicated (especially once shared libraries became A Thing on the Mac). Office 6 was a notable monstrosity of filesystem complexity, which made Office 98 a thing of beauty -- you could simply drag the entire Office folder over and that was it.

Note: All of this below is spelled out in Apple technote: https://developer.apple.com/library/archive/technotes/tn/...

By the Mac OS 9 timeframe, a solution was wanted. What ended up being used was a "bundle" bit being set in the fs metadata on certain folders to make them look like files. This hid most of the unnecessary complexity from the user and let them drag things around and manipulate the application as a single entity. If the app didn't need to install something into the System Folder, this meant you could once again think of the app as a single item and treat it accordingly.

It should be pointed out, though that resource forks were still being used here. The bundle-as-single-icon was more for associating multiple *files*, all of which could have both resource and data forks (and other named forks, but this was rarely used). That said, PPC code was located in the data fork now instead of the resource fork, so it was loaded as more or less an unstructured blob rather than structure CODE blocks handled by the Resource Manager.

When Mac OS became Mac OS X, the de-emphasis of resource folks continued. UFS didn't know what to do with them, more and more file transfer on the internet was happening, which meant lots of inefficient BinHex encoding, and command line tools had barely any concept of the closest parallel: Windows NTFS "streams", which were barely used as well. The solution eventually developed was to treat Resource Forks on filesystems that didn't support them the same way as had been done on FS.

The NeXT-ization of Apple had many benefits, but IMO this was a step backwards. Rich metadata like resources provided something few other OS's had, but made things actually *simpler* on disk as long as the tools and apps knew what to expect. Sadly, Apple didn't even patch the low-level BSD cp/mv commands to understand multiple fork files until 10.4, so it's clear this was the direction things were going to go toward. Thus we end up in a world where Bundles and Packages exist and native multiple-fork files are rare. See https://developer.apple.com/library/archive/documentation... for details on how it looks from an OS X perspective.

> Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?

TL;DR: Folders that appear as a single file, but can be drilled down into in some places (including the Finder).

Hope that helped.

Resource Forks, Bundles, etc.

Posted Jan 8, 2019 3:13 UTC (Tue) by ghane (guest, #1805) [Link] (1 responses)

Ah ...

So AppImage as done by Apple many years earlier? :-)

--
Sanjeev

Resource Forks, Bundles, etc.

Posted Jan 8, 2019 8:38 UTC (Tue) by jccleaver (guest, #127418) [Link]

> So AppImage as done by Apple many years earlier? :-)

Well, more or less. Apple obviously didn't have the true chaos of distros and *nix variants to deal with, just the (brilliant) 68k->PPC transition and the (very well done) PPC->Intel transition, so packaging was never *truly* horrible on the Mac compared to pretty much any other system (especially Windows). OS X application bundles handled multiple architectures with universal binaries almost as a trivial after-thought compared to all the other stuff that was now existing as a separate (hidden) file.

Having complex (eg, multi-stream, "Resource Manager"-accessed) files but far fewer of them made for a much more grokable operating system than what others have had to deal with. The classic Mac system software had no command line, and while graphical linux environments try to get by with that, *nix systems are still dealing with thousands or 10's of thousands of files on a fresh install. That's a lot of complexity to try to paper back over. Even at the worst "7.5.3 Update 2" era Mac OS complexity, you were still only dealing with a couple of hundred files on a new install, max. And if it weren't for System Enablers (basically hardware support files for each released Apple computer family model) it would have been far less.

The Mac OS -> OS X transition was rued by many a classic Mac fan for the interface changes, but more fundamental was the knowledge that we were going from a fundamentally *more simple* system to a complex one that would have to sort of simulate a simple one. FlatPacks, AppImage, and containers generally are all ways to try to get back to that sort of mental simplicity (at the expense of system-management issues such as duplicated libraries or fully static binaries). But there's a place for all kinds of paradigms out there, and smashing them together because Devs can't grok *nix can't really ever achieve the best of either world: the fine-tuning of professional system administration of a complex system, or the ease-of-use and only-a-certain-number-of-things-that-could-be-going-wrongness of a system with fewer parts.

Resource Forks, Bundles, etc.

Posted Jan 17, 2019 19:21 UTC (Thu) by kevinkrouse (guest, #86616) [Link]

I grew up with a Macintosh II in the '90s and your post brought me much nostalgia. I fondly recall changing the resource fork of games with ResEdit to swap out the built-in sounds and sprites with my own.

Extended attributes

Posted Jan 8, 2019 15:41 UTC (Tue) by mina86 (guest, #68442) [Link] (3 responses)

If I understand you correctly, you’re suggesting a iopen(int inode, int flags, mode_t mode) syscall. If that’s the case, the problem is that it would allow bypassing filesystem permissions. Namely, it would render execution bit of a directory useless since user would be able to read a world-readable file even if it resides in directory they have no access to.

Extended attributes

Posted Jan 8, 2019 20:55 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (2 responses)

I'm envisioning something more like openxattrat(int dirfd, const char *path, const char *name, int flags, mode_t mode)—the link to the internal xattr inode would be hidden in the filesystem and you would need at least search access to the file to open the linked xattr inode. User-mode software would never handle the raw inode numbers.

The resulting FD could then be passed as dirfd to openxattrat() (with an empty path) or to flistxattr()/fgetxattr()/fsetxattr() to access the xattrs of the resulting inode, recursively.

Extended attributes

Posted Jan 9, 2019 1:26 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)

It sounds like you're reinventing the same design Solaris already created, but with a needleesly different API.

Extended attributes

Posted Jan 10, 2019 4:05 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

I wouldn't say *needlessly* different, since in the Solaris version openat(O_XATTR) can only open attributes for files which are already open, while attropen() lacks a dirfd argument and thus can only use the current working directory for relative paths. My proposed openxattrat() function would basically be attropen() + dirfd. (Perhaps attropenat() would be more fitting?) In general, though, I agree that the concepts are very similar and there is no reason not to adopt the Solaris interface.

Extended attributes

Posted Jan 8, 2019 15:45 UTC (Tue) by mina86 (guest, #68442) [Link] (1 responses)

Changes to unlink(2) would break rm * though. E.g. if I run rm file file.xattr I’ll get an error deleting the second file since it got deleted transparently.

Extended attributes

Posted Jan 9, 2019 17:16 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

I was imagining that we'd keep the validation blobs in a separate hidden directory where normal user activity (e.g., "rm *") wouldn't find them.

A setback for fs-verity

Posted Jan 3, 2019 23:23 UTC (Thu) by ohrn (subscriber, #5509) [Link] (2 responses)

Dumb question. What is the purpose of using a hash tree?

Seems to me a single level list of hashes for the data blocks would be enough?

If someone malicious can modify a hash in the list they can surely modify the entire tree, so hashing the hashes doesn't seem to give much of a benefit?

A setback for fs-verity

Posted Jan 3, 2019 23:29 UTC (Thu) by neilbrown (subscriber, #359) [Link]

> If someone malicious can modify a hash in the list they can surely modify the entire tree, so hashing the hashes doesn't seem to give much of a benefit?

A benefit is that you can crypto-sign the root hash.

A setback for fs-verity

Posted Jan 10, 2019 4:51 UTC (Thu) by thestinger (guest, #91827) [Link]

The hashes of the blocks need to be verified too. The information on the disk isn't trusted. The hashes/signatures aren't generated locally but rather are shipped with the updates for those components. The fs-verity code is only used for dynamically updated components outside the base OS partitions, which are verified via a signature (vbmeta), hashes in vbmeta (boot/dtbo) and dm-verity (bootstrapped from vbmeta). Their fs-verity approach lets them extend the verification to components in the userdata partition.

A setback for fs-verity

Posted Jan 3, 2019 23:26 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

> it is rare for the kernel to interpret the contents of files, rather than just serving as a container for them

???
- Directorties
- symlinks
- device-files
- ELF files
Thanks to "-o loop" mounts, filesystems images in files could almost be seen as the kernel interpreting the contents of a file too.

I quite like Linus' idea that the integrity info should be part of the file (I never liked xattrs or streams).
Maybe if we had a filesystem full of files with integrity info, then an overlay filesystem on top of that which rejected writes, and performed integrity check on all reads.

A setback for fs-verity

Posted Jan 3, 2019 23:38 UTC (Thu) by corbet (editor, #1) [Link]

OK, so I could have written that as "it is rare for the kernel to interpret the contents of plain files" - but I think the meaning was already somewhat clear...

A setback for fs-verity

Posted Jan 4, 2019 1:49 UTC (Fri) by doublez13 (guest, #122213) [Link] (4 responses)

Ted did indeed send a pull request on 12/31/18 for fsverify.

Pull request

Posted Jan 4, 2019 2:34 UTC (Fri) by corbet (editor, #1) [Link] (2 responses)

Do you have a link? I'm not able to find that pull request.

Pull request

Posted Jan 4, 2019 3:45 UTC (Fri) by doublez13 (guest, #122213) [Link] (1 responses)

https://patchwork.kernel.org/patch/10745561/
:)

Pull request

Posted Jan 4, 2019 15:44 UTC (Fri) by corbet (editor, #1) [Link]

Ah...so the only CC was sent to the linux-fscrypt mailing list — a good destination if you want to avoid inconvenient discussion, but one that makes it hard for LWN editors to find the request! Thanks for the correction.

A setback for fs-verity

Posted Jan 7, 2019 2:51 UTC (Mon) by marcH (subscriber, #57642) [Link]

> send a pull request on 12/31/18

I'm sure you meant 18-12-31 or 31/12/18 :-)

A setback for fs-verity

Posted Jan 4, 2019 9:47 UTC (Fri) by corsac (subscriber, #49696) [Link] (4 responses)

I have to admit it's a bit confusing to me what's the difference between dm-verity and fs-verity (besides the fact that one is at the device-mapper level and the other at the filesystem-level). Is there some information on this somewhere?

vs. dm-verity

Posted Jan 4, 2019 15:08 UTC (Fri) by corbet (editor, #1) [Link] (3 responses)

dm-verity protects an entire block device, while fs-verity protects single files. In the Android use case, they don't want to make the entire block device read-only; they just want to keep specific files from being messed with.

vs. dm-verity

Posted Jan 7, 2019 2:56 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

Please correct me; I believe the focus of *both* is on executables: think 1. dm-verity for the entirely immutable partition part of the "OS image" that all users get exactly the same; 2. fs-verity for the partially writable partition where users download and install their own choice of applications from a trusted source like the Play Store.

vs. dm-verity

Posted Jan 10, 2019 4:45 UTC (Thu) by thestinger (guest, #91827) [Link]

Yes, that's correct. The early firmware is fully verified from a hardware root of trust, followed by vbmeta being verified by the late stage bootloader which then verifies the hashes of boot (and dtbo if present) from there. The system and vendor partitions are verified via dm-verity, bootstrapped from hashes in vbmeta. That's all read-only at boot, and is never directly written. Updates write to the alternate partition set (for firmware and the OS partitions), and then a reboot into the new version is done by switching to the new partition set, which has already been verified by the updater before marking it as usable (in case a write error occurred) and then gets verified again by verified boot. The fs-verity support is used to verify dynamically updated components in the rw userdata partition.

vs. dm-verity

Posted Jan 10, 2019 4:47 UTC (Thu) by thestinger (guest, #91827) [Link]

fs-verity is used to implement ro.apk_verity.mode for the rw userdata partition. The OS itself is fully verified via dm-verity (system/vendor), hashes (boot/dtbo) and a signature (vbmeta). Those are all read-only at runtime, and updated by writing to the alternate partition set.

A setback for fs-verity

Posted Jan 7, 2019 2:51 UTC (Mon) by bfields (subscriber, #19510) [Link] (1 responses)

"it is true that the objections surfaced at something close to the last minute."

Did they really? I thought that at the very least similar objections had been raised at LSFMM. My memories are vague, though.

A setback for fs-verity

Posted Jan 10, 2019 7:47 UTC (Thu) by dgc (subscriber, #6611) [Link]

Yes, they were raised at LSFMM. The LWN report doesn't really convey the "hell no" response that some of us had at LSFMM to putting special data beyond EOF in user files and adding special mechanisms to manage and then read beyond EOF....

https://lwn.net/Articles/752614/

-Dave.

A setback for fs-verity

Posted Jan 7, 2019 3:00 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

I wonder if *any* discussion about meta-data has ever seen some consensus :-)

A setback for fs-verity

Posted Jan 7, 2019 4:09 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> I wonder if *any* discussion about meta-data has ever seen some consensus :-)

I'm fairly sure the only way to achieve consensus, is to only have one voice.
(There is no "metadata" - there is only data, some of which you haven't met yet).