A setback for fs-verity
The core idea behind fs-verity is the use of a Merkle tree to record a hash value associated with every block in a file. Whenever data from a protected file is read, the kernel first verifies the relevant block(s) against the hashes, and only allows the operation to proceed if there is a match. An attacker may find a way to change a critical file, but there is no way to change the Merkle tree after its creation, so any changes made would be immediately detected. In this way, it is hoped, Android systems can be protected against certain kinds of persistent malware attacks.
There is no opposition to the idea of adding functionality to the kernel to detect hostile modifications to files. It turns out, though, there there is indeed some opposition to how this functionality has been implemented in the current patch set. See the above-linked article and this documentation patch for details of how fs-verity is meant to work. In short, user space is responsible for the creation of the Merkle tree, which must be surrounded by header structures and carefully placed at the beginning of a block after the end of the file data. An ioctl() call tells the kernel that fs-verity is to be invoked on the file; after that, the location of the end of the file (from a user-space point of view) is changed to hide the Merkle tree from user space, and the file itself becomes read-only.
Christoph Hellwig was the first to oppose the
work, less than two weeks before the opening of the merge window. The
storage of the Merkle tree inline was, he said, "simply not
acceptable
" and the interface should not require a specific way of
storing this data. He later suggested
that the hash data should be passed separately to the ioctl()
call, rather than being placed after the file data. Darrick Wong suggested a
similar interface, noting that it would give the filesystem a lot of
flexibility in terms of how the hash data would be stored.
Dave Chinner complained that
storing the Merkle tree after the end of the file was incompatible with how
some filesystems (XFS in particular) use that space. He described the
approach as being "gross
", arguing that it "bleeds
implementation details all over the API
" and creates problems far
beyond the filesystems that actually implement fs-verity:
Chinner, too, argued that the Merkle-tree data should be provided separately to the kernel, rather than being stored in the file itself using a specific format. Filesystem implementations could still put the data after the end of the existing data, but that is a detail that should, according to Chinner be hidden from user space.
Eric Biggers, the developer of fs-verity, responded that, while the API requires user space to place the Merkle tree after the end of user data, there is no actual need for filesystems to keep it there:
He also said that passing the Merkle tree in as a memory buffer is problematic, since it could be too large to fit into memory on a small system. (The size of this data also prevents it from being stored as an extended attribute as some have suggested.) Generating the hash data in the kernel was also considered, Biggers said, but it was concluded that this task was better handled in user space.
Ted Ts'o claimed repeatedly that there
would be no value to be had by changing the API for creating protected
files; he described the complaints as "really more of a philosophical
objection than anything else
". The requested API, he said, could be
added later (in addition to the proposed API, which would have to be
maintained indefinitely) if it turned out to be necessary.
After the discussion continued for
a while, he escalated the
discussion to Linus Torvalds, asking for a decision:
What came back might well have failed to please anybody in the discussion, though. It turns out that Torvalds has no real objection to the model of storing the hash data at the end of the file itself:
So that part I like. I think the people who argue for "let's have a separate interface that writes the merkle tree data" are completely wrong.
From there, though, he made it clear that he was not happy with the current implementation. This model, he said, should be independent of any specific filesystem, so it should be entirely implemented in the virtual filesystem layer. At that point, filesystems like XFS would never even see the fs-verity layer, so its implementation could not be a problem for them. A generic implementation would require no filesystem-specific code and would just work universally. He also disliked the trick that hides the Merkle tree after the fs-verity mode has been set; the validation data for the file should just be a part of the file itself, he said.
As Ts'o pointed
out, keeping the hash data visible in the file would create confusion
for higher-level software that has its own ideas about the format of any
given file. He also provided some reasons
for why he thinks filesystems need to be aware of fs-verity; they include
ensuring that the right thing happens if a filesystem containing protected
files is mounted by an older version of the filesystem code. Making
fs-verity fully generic would, he said, have forced low-level API changes
that would have affected "dozens of filesystems
", a cost that
he doesn't think is justified by the benefits.
The last message from Ts'o was sent on December 22; Torvalds has not
responded to it. There has not, however, been a pull request for
fs-verity
sent, and it is getting late in the merge window for such a thing to show
up. [Correction: a pull request was
sent copied only to the
linux-fscrypt mailing list; it has not received a response as of this
writing.] It seems likely that fs-verity is going to have to skip this
development cycle while the patches are reworked to address some of the
objections that have been raised — those from Torvalds, at least. Even
then, the work might be controversial; it is rare for the kernel to
interpret the contents of files, rather than just serving as a container
for them, and some developers are likely to dislike an implementation that
depends on that sort of interpretation. But if Torvalds remains in favor
of such an approach, it is likely to find its way into the kernel
eventually.
Index entries for this article | |
---|---|
Kernel | Filesystems/fs-verity |
Kernel | Security/Integrity verification |
Security | Integrity management |
Posted Jan 3, 2019 20:13 UTC (Thu)
by simcop2387 (subscriber, #101710)
[Link] (30 responses)
Posted Jan 3, 2019 20:37 UTC (Thu)
by corbet (editor, #1)
[Link] (29 responses)
Posted Jan 3, 2019 22:13 UTC (Thu)
by TheGopher (subscriber, #59256)
[Link] (28 responses)
Posted Jan 3, 2019 22:35 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 3, 2019 22:38 UTC (Thu)
by foom (subscriber, #14868)
[Link] (4 responses)
But, to make xattrs support large data would effectively also require creating a brand new mechanism. It's not quite simple. As the tip of the iceberg, "getxattr" and "setxattr" can only deal with the entire value at once -- not a good idea for a large data stream.
However, other OSes do support this sort of thing, allowing "forks" of the file to be opened for reading/writing just as a normal file. E.g., Windows NTFS has "alternate data streams", and Solaris has "fsattr". (https://docs.oracle.com/cd/E19253-01/816-5175/6mbba7f02/)
Posted Jan 4, 2019 8:43 UTC (Fri)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Jan 4, 2019 9:01 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Jan 5, 2019 21:22 UTC (Sat)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jan 7, 2019 2:40 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Sounds good.
> and then arrange to hide these extra ones from user space.
Posted Jan 4, 2019 12:20 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (21 responses)
Posted Jan 4, 2019 14:20 UTC (Fri)
by dskoll (subscriber, #1630)
[Link] (4 responses)
I like this idea, but on the other hand, you now used up two file descriptors each time you open a file, and something has to manage the hidden descriptor. You could also run into weird issues if the second open fails, but then I guess you'd just fail everything.
Posted Jan 4, 2019 19:02 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (3 responses)
Posted Jan 4, 2019 20:26 UTC (Fri)
by dskoll (subscriber, #1630)
[Link] (2 responses)
You could generalize this idea to allow multiple data forks. The fs-verity Merkle tree would be a special data fork that could be set once only and then never changed. I think this is a much nicer approach than shoving the verification data at the end of the file.
Posted Jan 4, 2019 20:34 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Posted Jan 4, 2019 20:40 UTC (Fri)
by dskoll (subscriber, #1630)
[Link]
Haha. :) I'm not a kernel developer and have only recently started a job that requires looking deeply into the kernel, so I'm a newbie at this.
Posted Jan 5, 2019 1:44 UTC (Sat)
by himi (subscriber, #340)
[Link] (13 responses)
Posted Jan 5, 2019 7:46 UTC (Sat)
by alonz (subscriber, #815)
[Link] (8 responses)
Posted Jan 5, 2019 8:08 UTC (Sat)
by bof (subscriber, #110741)
[Link] (7 responses)
Keeping the extra streams could be as easy as putting them in a hidden directory of the same filesystem, e.g. ..forx at the root (each stream named after the original inode plus some discriminator for multiple streams of a given file). That should be transparent to any existing fsck.
Instead of xattrs, these /..forx/inum things could even just be directories by themselves, with each (named or whatever) fork getting a regular inode inside. Which would even allow for nested forks, a specific fork being a symlink, device node, have special ownership and permissions, times, .....
A special mount option to ignore the magic, would make the filesystem wholly-copyable/clonable using the usual tools, too.
However, whatever the exact approach, there's the little issue of stuff like "du" reporting underestimated sizes, the bigger issue of teaching any kind of "cp" like command to cope with the forks (including changes to archiver file formats....) - and all of that would only just simply work out for local filesystems, not NFS + friends.
Posted Jan 5, 2019 9:18 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
Posted Jan 5, 2019 22:21 UTC (Sat)
by neilbrown (subscriber, #359)
[Link] (5 responses)
They all hit the cold hard wall of practicality. You cannot create semantics that do what you want.
If you want a file to start acting like a directory, it has to stop acting like a file. One way to achieve this is to "-o loop" mount it somewhere else with an appropriate filesystem driver. You could come up with other approaches, such as an overlay filesystem which presents select files as directories.
Posted Jan 7, 2019 2:48 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (4 responses)
Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?
https://www.oreilly.com/library/view/mac-os-x/0596004605/...
Posted Jan 7, 2019 19:32 UTC (Mon)
by jccleaver (guest, #127418)
[Link] (3 responses)
HFS (and HFS+, and MFS) all have a native understanding of dual-forked files. It's been a part of the Mac OS world ever since the original 128k Mac in 1984. The problem is that the rest of the computing world mostly had adopted the "a file is a data stream and that's it" model of unix. This is what led to years of problems transferring Mac files around via FTP on the internet and the creation and adoption of formats like MacBinary and BinHex. For MIME purposes, AppleSingle and AppleDouble were used. From RFC1740 https://www.ietf.org/rfc/rfc1740.txt
> Files on the Macintosh consists of two parts, called forks:
It was not uncommon for some files to be entirely resource forks and with empty data forks.
In fact, before the PowerPC era and the increasing complexity of shared libraries and apps generally, it was not uncommon for application programs in the System 6 era to consist entirely of a single file, which could be located where-ever you wanted to. (Imagine if, on the *nix side, all your gettext files and support directories and other miscellaneous crap even console programs might have strewn about the file system were all in a single file on the system.)
Eventually, things got more and more complicated (especially once shared libraries became A Thing on the Mac). Office 6 was a notable monstrosity of filesystem complexity, which made Office 98 a thing of beauty -- you could simply drag the entire Office folder over and that was it.
Note: All of this below is spelled out in Apple technote: https://developer.apple.com/library/archive/technotes/tn/...
By the Mac OS 9 timeframe, a solution was wanted. What ended up being used was a "bundle" bit being set in the fs metadata on certain folders to make them look like files. This hid most of the unnecessary complexity from the user and let them drag things around and manipulate the application as a single entity. If the app didn't need to install something into the System Folder, this meant you could once again think of the app as a single item and treat it accordingly.
It should be pointed out, though that resource forks were still being used here. The bundle-as-single-icon was more for associating multiple *files*, all of which could have both resource and data forks (and other named forks, but this was rarely used). That said, PPC code was located in the data fork now instead of the resource fork, so it was loaded as more or less an unstructured blob rather than structure CODE blocks handled by the Resource Manager.
When Mac OS became Mac OS X, the de-emphasis of resource folks continued. UFS didn't know what to do with them, more and more file transfer on the internet was happening, which meant lots of inefficient BinHex encoding, and command line tools had barely any concept of the closest parallel: Windows NTFS "streams", which were barely used as well. The solution eventually developed was to treat Resource Forks on filesystems that didn't support them the same way as had been done on FS.
The NeXT-ization of Apple had many benefits, but IMO this was a step backwards. Rich metadata like resources provided something few other OS's had, but made things actually *simpler* on disk as long as the tools and apps knew what to expect. Sadly, Apple didn't even patch the low-level BSD cp/mv commands to understand multiple fork files until 10.4, so it's clear this was the direction things were going to go toward. Thus we end up in a world where Bundles and Packages exist and native multiple-fork files are rare. See https://developer.apple.com/library/archive/documentation... for details on how it looks from an OS X perspective.
> Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?
TL;DR: Folders that appear as a single file, but can be drilled down into in some places (including the Finder).
Hope that helped.
Posted Jan 8, 2019 3:13 UTC (Tue)
by ghane (guest, #1805)
[Link] (1 responses)
So AppImage as done by Apple many years earlier? :-)
--
Posted Jan 8, 2019 8:38 UTC (Tue)
by jccleaver (guest, #127418)
[Link]
Well, more or less. Apple obviously didn't have the true chaos of distros and *nix variants to deal with, just the (brilliant) 68k->PPC transition and the (very well done) PPC->Intel transition, so packaging was never *truly* horrible on the Mac compared to pretty much any other system (especially Windows). OS X application bundles handled multiple architectures with universal binaries almost as a trivial after-thought compared to all the other stuff that was now existing as a separate (hidden) file.
Having complex (eg, multi-stream, "Resource Manager"-accessed) files but far fewer of them made for a much more grokable operating system than what others have had to deal with. The classic Mac system software had no command line, and while graphical linux environments try to get by with that, *nix systems are still dealing with thousands or 10's of thousands of files on a fresh install. That's a lot of complexity to try to paper back over. Even at the worst "7.5.3 Update 2" era Mac OS complexity, you were still only dealing with a couple of hundred files on a new install, max. And if it weren't for System Enablers (basically hardware support files for each released Apple computer family model) it would have been far less.
The Mac OS -> OS X transition was rued by many a classic Mac fan for the interface changes, but more fundamental was the knowledge that we were going from a fundamentally *more simple* system to a complex one that would have to sort of simulate a simple one. FlatPacks, AppImage, and containers generally are all ways to try to get back to that sort of mental simplicity (at the expense of system-management issues such as duplicated libraries or fully static binaries). But there's a place for all kinds of paradigms out there, and smashing them together because Devs can't grok *nix can't really ever achieve the best of either world: the fine-tuning of professional system administration of a complex system, or the ease-of-use and only-a-certain-number-of-things-that-could-be-going-wrongness of a system with fewer parts.
Posted Jan 17, 2019 19:21 UTC (Thu)
by kevinkrouse (guest, #86616)
[Link]
Posted Jan 8, 2019 15:41 UTC (Tue)
by mina86 (guest, #68442)
[Link] (3 responses)
Posted Jan 8, 2019 20:55 UTC (Tue)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
I'm envisioning something more like The resulting FD could then be passed as dirfd to openxattrat() (with an empty path) or to flistxattr()/fgetxattr()/fsetxattr() to access the xattrs of the resulting inode, recursively.
Posted Jan 9, 2019 1:26 UTC (Wed)
by foom (subscriber, #14868)
[Link] (1 responses)
Posted Jan 10, 2019 4:05 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
Posted Jan 8, 2019 15:45 UTC (Tue)
by mina86 (guest, #68442)
[Link] (1 responses)
Posted Jan 9, 2019 17:16 UTC (Wed)
by quotemstr (subscriber, #45331)
[Link]
Posted Jan 3, 2019 23:23 UTC (Thu)
by ohrn (subscriber, #5509)
[Link] (2 responses)
Seems to me a single level list of hashes for the data blocks would be enough?
If someone malicious can modify a hash in the list they can surely modify the entire tree, so hashing the hashes doesn't seem to give much of a benefit?
Posted Jan 3, 2019 23:29 UTC (Thu)
by neilbrown (subscriber, #359)
[Link]
A benefit is that you can crypto-sign the root hash.
Posted Jan 10, 2019 4:51 UTC (Thu)
by thestinger (guest, #91827)
[Link]
Posted Jan 3, 2019 23:26 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (1 responses)
???
I quite like Linus' idea that the integrity info should be part of the file (I never liked xattrs or streams).
Posted Jan 3, 2019 23:38 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Jan 4, 2019 1:49 UTC (Fri)
by doublez13 (guest, #122213)
[Link] (4 responses)
Posted Jan 4, 2019 2:34 UTC (Fri)
by corbet (editor, #1)
[Link] (2 responses)
Posted Jan 4, 2019 3:45 UTC (Fri)
by doublez13 (guest, #122213)
[Link] (1 responses)
Posted Jan 4, 2019 15:44 UTC (Fri)
by corbet (editor, #1)
[Link]
Posted Jan 7, 2019 2:51 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
I'm sure you meant 18-12-31 or 31/12/18 :-)
Posted Jan 4, 2019 9:47 UTC (Fri)
by corsac (subscriber, #49696)
[Link] (4 responses)
Posted Jan 4, 2019 15:08 UTC (Fri)
by corbet (editor, #1)
[Link] (3 responses)
Posted Jan 7, 2019 2:56 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Jan 10, 2019 4:45 UTC (Thu)
by thestinger (guest, #91827)
[Link]
Posted Jan 10, 2019 4:47 UTC (Thu)
by thestinger (guest, #91827)
[Link]
Posted Jan 7, 2019 2:51 UTC (Mon)
by bfields (subscriber, #19510)
[Link] (1 responses)
Did they really? I thought that at the very least similar objections had been raised at LSFMM. My memories are vague, though.
Posted Jan 10, 2019 7:47 UTC (Thu)
by dgc (subscriber, #6611)
[Link]
https://lwn.net/Articles/752614/
-Dave.
Posted Jan 7, 2019 3:00 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Jan 7, 2019 4:09 UTC (Mon)
by neilbrown (subscriber, #359)
[Link]
I'm fairly sure the only way to achieve consensus, is to only have one voice.
A setback for fs-verity
As noted in the article, extended attributes (at least as implemented in Linux) won't work. The Merkle-tree data is simply too large to be stored that way.
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Why? Aren't stream typed on some way?
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
Extended attributes
It can't be 100% transparent: you need the various integrity-validation mechanisms (fsck and its online brethren) to be aware, so they won't consider these inodes to be orphaned.
Extended attributes
Extended attributes
Extended attributes
There were several attempts to make "file-as-directory" thingies (I remember one in Reiser4). Whatever happened to them?
Extended attributes
Extended attributes
Resource Forks, Bundles, etc.
>
> Data fork: The actual data included in the file. The Data
> fork is typically the only meaningful part of a
> Macintosh file on a non-Macintosh computer system.
> For example, if a Macintosh user wants to send a
> file of data to a user on an IBM-PC, she would only
> send the Data fork.
>
> Resource fork: Contains a collection of arbitrary attribute/value
> pairs, including program segments, icon bitmaps,
> and parametric values.
>
> Additional information regarding Macintosh files is stored by the
> Finder in a hidden file, called the "Desktop Database".
>
> Because of the complications in storing different parts of a
> Macintosh file in a non-Macintosh filesystem that only handles
> consecutive data in one part, it is common to convert the Macintosh
> file into some other format before transferring it over the network.
>
> The two styles of use are [APPL90]:
>
> AppleSingle: Apple's standard format for encoding Macintosh files
> as one byte stream.
> AppleDouble: Similar to AppleSingle except that the Data fork is
> separated from the Macintosh-specific parts by the
> AppleDouble encoding.
>
> AppleDouble is the preferred format for a Macintosh file that is to
> be included in an Internet mail message, because it provides
> recipients with Macintosh computers the entire document, including
> Icons and other Macintosh specific information, while other users
> easily can extract the Data fork (the actual data) as it is separated
> from the AppleDouble encoding.
Resource Forks, Bundles, etc.
Sanjeev
Resource Forks, Bundles, etc.
Resource Forks, Bundles, etc.
If I understand you correctly, you’re suggesting a Extended attributes
iopen(int inode, int flags, mode_t mode)
syscall. If that’s the case, the problem is that it would allow bypassing filesystem permissions. Namely, it would render execution bit of a directory useless since user would be able to read a world-readable file even if it resides in directory they have no access to.
Extended attributes
openxattrat(int dirfd, const char *path, const char *name, int flags, mode_t mode)
—the link to the internal xattr inode would be hidden in the filesystem and you would need at least search access to the file to open the linked xattr inode. User-mode software would never handle the raw inode numbers.Extended attributes
Extended attributes
Changes to Extended attributes
unlink(2)
would break rm *
though. E.g. if I run rm file file.xattr
I’ll get an error deleting the second file since it got deleted transparently.
Extended attributes
A setback for fs-verity
A setback for fs-verity
A setback for fs-verity
A setback for fs-verity
- Directorties
- symlinks
- device-files
- ELF files
Thanks to "-o loop" mounts, filesystems images in files could almost be seen as the kernel interpreting the contents of a file too.
Maybe if we had a filesystem full of files with integrity info, then an overlay filesystem on top of that which rejected writes, and performed integrity check on all reads.
OK, so I could have written that as "it is rare for the kernel to interpret the contents of plain files" - but I think the meaning was already somewhat clear...
A setback for fs-verity
A setback for fs-verity
Do you have a link? I'm not able to find that pull request.
Pull request
Pull request
Ah...so the only CC was sent to the linux-fscrypt mailing list — a good destination if you want to avoid inconvenient discussion, but one that makes it hard for LWN editors to find the request! Thanks for the correction.
Pull request
A setback for fs-verity
A setback for fs-verity
dm-verity protects an entire block device, while fs-verity protects single files. In the Android use case, they don't want to make the entire block device read-only; they just want to keep specific files from being messed with.
vs. dm-verity
vs. dm-verity
vs. dm-verity
vs. dm-verity
A setback for fs-verity
A setback for fs-verity
A setback for fs-verity
A setback for fs-verity
(There is no "metadata" - there is only data, some of which you haven't met yet).