Extended attributes [LWN.net]

Extended attributes

Posted Jan 3, 2019 22:35 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Alternate data streams were suggested but shot down.

Extended attributes

Posted Jan 3, 2019 22:38 UTC (Thu) by foom (subscriber, #14868) [Link] (4 responses)

A generic mechanism would seem to be a better idea...

But, to make xattrs support large data would effectively also require creating a brand new mechanism. It's not quite simple. As the tip of the iceberg, "getxattr" and "setxattr" can only deal with the entire value at once -- not a good idea for a large data stream.

However, other OSes do support this sort of thing, allowing "forks" of the file to be opened for reading/writing just as a normal file. E.g., Windows NTFS has "alternate data streams", and Solaris has "fsattr". (https://docs.oracle.com/cd/E19253-01/816-5175/6mbba7f02/)

Extended attributes

Posted Jan 4, 2019 8:43 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

That doesn’t really help, since the alternate data streams can be used by applications, so the fs-verity Merkle tree needs to be for the whole file, including all its forks?

Extended attributes

Posted Jan 4, 2019 9:01 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Merkle tree is self-verifying, so it doesn't need to be further checksummed.

Extended attributes

Posted Jan 5, 2019 21:22 UTC (Sat) by epa (subscriber, #39769) [Link] (1 responses)

What I mean is that since the alternate data streams are exposed to user space, each one needs its Merkle tree. So it would appear you need to implement the verifying at some lower level of filesystem code where you are dealing with a single piece of data. Either that or generalize it to add one extra steam for each stream that exists - and then arrange to hide these extra ones from user space.

Extended attributes

Posted Jan 7, 2019 2:40 UTC (Mon) by marcH (subscriber, #57642) [Link]

> or generalize it to add one extra steam for each stream that exists

Sounds good.

> and then arrange to hide these extra ones from user space.

Why? Aren't stream typed on some way?

Extended attributes

Posted Jan 4, 2019 12:20 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (21 responses)

What about defining a special xattr that just points (via path, like a symlink) to a separate file containing the validation data? This way, you obviate the xattr size restriction. VFS, when it opened a file protected by this symlink-to-validation-blob xattr, would just transparently open the pointed-to file as well. You could even arrange for unlink(2) to delete the pointed-to validation file as well, if you wanted to remain compatible with today's file management code.

Extended attributes

Posted Jan 4, 2019 14:20 UTC (Fri) by dskoll (subscriber, #1630) [Link] (4 responses)

I like this idea, but on the other hand, you now used up two file descriptors each time you open a file, and something has to manage the hidden descriptor. You could also run into weird issues if the second open fails, but then I guess you'd just fail everything.

Extended attributes

Posted Jan 4, 2019 19:02 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (3 responses)

Not two file descriptors --- from userspace's POV, there's just one file and its descriptor. That VFS internally would maintain an internal pointer to a different struct file is just an implementation detail. And yes, VFS could just fail the open if the authentication blob file couldn't be opened.

Extended attributes

Posted Jan 4, 2019 20:26 UTC (Fri) by dskoll (subscriber, #1630) [Link] (2 responses)

You could generalize this idea to allow multiple data forks. The fs-verity Merkle tree would be a special data fork that could be set once only and then never changed. I think this is a much nicer approach than shoving the verification data at the end of the file.

Extended attributes

Posted Jan 4, 2019 20:34 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (1 responses)

generalization is the enemy of LKML patch acceptance though. :-) I think starting simple might be a good approach. I'll send an LKML email.

Extended attributes

Posted Jan 4, 2019 20:40 UTC (Fri) by dskoll (subscriber, #1630) [Link]

Haha. :) I'm not a kernel developer and have only recently started a job that requires looking deeply into the kernel, so I'm a newbie at this.

Extended attributes

Posted Jan 5, 2019 1:44 UTC (Sat) by himi (subscriber, #340) [Link] (13 responses)

Another option for generalising xttrs to support larger data sizes would be to allow for an xattr that pointed to an inode which would contain the extra data, with an extension to the API that allowed the inode (which would otherwise not have any path associated) to be opened to give a regular file descriptor that could be treated the same as any other open file. Voila - "multiple file streams" in a way that's completely transparent to the rest of the filesystem . . .

Extended attributes

Posted Jan 5, 2019 7:46 UTC (Sat) by alonz (subscriber, #815) [Link] (8 responses)

It can't be 100% transparent: you need the various integrity-validation mechanisms (fsck and its online brethren) to be aware, so they won't consider these inodes to be orphaned.

Extended attributes

Posted Jan 5, 2019 8:08 UTC (Sat) by bof (subscriber, #110741) [Link] (7 responses)

Hmm.

Keeping the extra streams could be as easy as putting them in a hidden directory of the same filesystem, e.g. ..forx at the root (each stream named after the original inode plus some discriminator for multiple streams of a given file). That should be transparent to any existing fsck.

Instead of xattrs, these /..forx/inum things could even just be directories by themselves, with each (named or whatever) fork getting a regular inode inside. Which would even allow for nested forks, a specific fork being a symlink, device node, have special ownership and permissions, times, .....

A special mount option to ignore the magic, would make the filesystem wholly-copyable/clonable using the usual tools, too.

However, whatever the exact approach, there's the little issue of stuff like "du" reporting underestimated sizes, the bigger issue of teaching any kind of "cp" like command to cope with the forks (including changes to archiver file formats....) - and all of that would only just simply work out for local filesystems, not NFS + friends.

Extended attributes

Posted Jan 5, 2019 9:18 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Instead of xattrs, these /..forx/inum things could even just be directories by themselves, with each (named or whatever) fork getting a regular inode inside. Which would even allow for nested forks, a specific fork being a symlink, device node, have special ownership and permissions, times, .....
There were several attempts to make "file-as-directory" thingies (I remember one in Reiser4). Whatever happened to them?

Extended attributes

Posted Jan 5, 2019 22:21 UTC (Sat) by neilbrown (subscriber, #359) [Link] (5 responses)

> There were several attempts to make "file-as-directory" thingies (I remember one in Reiser4). Whatever happened to them?

They all hit the cold hard wall of practicality. You cannot create semantics that do what you want.

If you want a file to start acting like a directory, it has to stop acting like a file. One way to achieve this is to "-o loop" mount it somewhere else with an appropriate filesystem driver. You could come up with other approaches, such as an overlay filesystem which presents select files as directories.

Extended attributes

Posted Jan 7, 2019 2:48 UTC (Mon) by marcH (subscriber, #57642) [Link] (4 responses)

> If you want a file to start acting like a directory, it has to stop acting like a file.

Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?

https://www.oreilly.com/library/view/mac-os-x/0596004605/...

Resource Forks, Bundles, etc.

Posted Jan 7, 2019 19:32 UTC (Mon) by jccleaver (guest, #127418) [Link] (3 responses)

Well, yes and no.

HFS (and HFS+, and MFS) all have a native understanding of dual-forked files. It's been a part of the Mac OS world ever since the original 128k Mac in 1984. The problem is that the rest of the computing world mostly had adopted the "a file is a data stream and that's it" model of unix. This is what led to years of problems transferring Mac files around via FTP on the internet and the creation and adoption of formats like MacBinary and BinHex. For MIME purposes, AppleSingle and AppleDouble were used. From RFC1740 https://www.ietf.org/rfc/rfc1740.txt

> Files on the Macintosh consists of two parts, called forks:
>
> Data fork: The actual data included in the file. The Data
> fork is typically the only meaningful part of a
> Macintosh file on a non-Macintosh computer system.
> For example, if a Macintosh user wants to send a
> file of data to a user on an IBM-PC, she would only
> send the Data fork.
>
> Resource fork: Contains a collection of arbitrary attribute/value
> pairs, including program segments, icon bitmaps,
> and parametric values.
>
> Additional information regarding Macintosh files is stored by the
> Finder in a hidden file, called the "Desktop Database".
>
> Because of the complications in storing different parts of a
> Macintosh file in a non-Macintosh filesystem that only handles
> consecutive data in one part, it is common to convert the Macintosh
> file into some other format before transferring it over the network.
>
> The two styles of use are [APPL90]:
>
> AppleSingle: Apple's standard format for encoding Macintosh files
> as one byte stream.
> AppleDouble: Similar to AppleSingle except that the Data fork is
> separated from the Macintosh-specific parts by the
> AppleDouble encoding.
>
> AppleDouble is the preferred format for a Macintosh file that is to
> be included in an Internet mail message, because it provides
> recipients with Macintosh computers the entire document, including
> Icons and other Macintosh specific information, while other users
> easily can extract the Data fork (the actual data) as it is separated
> from the AppleDouble encoding.

It was not uncommon for some files to be entirely resource forks and with empty data forks.

In fact, before the PowerPC era and the increasing complexity of shared libraries and apps generally, it was not uncommon for application programs in the System 6 era to consist entirely of a single file, which could be located where-ever you wanted to. (Imagine if, on the *nix side, all your gettext files and support directories and other miscellaneous crap even console programs might have strewn about the file system were all in a single file on the system.)

Eventually, things got more and more complicated (especially once shared libraries became A Thing on the Mac). Office 6 was a notable monstrosity of filesystem complexity, which made Office 98 a thing of beauty -- you could simply drag the entire Office folder over and that was it.

Note: All of this below is spelled out in Apple technote: https://developer.apple.com/library/archive/technotes/tn/...

By the Mac OS 9 timeframe, a solution was wanted. What ended up being used was a "bundle" bit being set in the fs metadata on certain folders to make them look like files. This hid most of the unnecessary complexity from the user and let them drag things around and manipulate the application as a single entity. If the app didn't need to install something into the System Folder, this meant you could once again think of the app as a single item and treat it accordingly.

It should be pointed out, though that resource forks were still being used here. The bundle-as-single-icon was more for associating multiple *files*, all of which could have both resource and data forks (and other named forks, but this was rarely used). That said, PPC code was located in the data fork now instead of the resource fork, so it was loaded as more or less an unstructured blob rather than structure CODE blocks handled by the Resource Manager.

When Mac OS became Mac OS X, the de-emphasis of resource folks continued. UFS didn't know what to do with them, more and more file transfer on the internet was happening, which meant lots of inefficient BinHex encoding, and command line tools had barely any concept of the closest parallel: Windows NTFS "streams", which were barely used as well. The solution eventually developed was to treat Resource Forks on filesystems that didn't support them the same way as had been done on FS.

The NeXT-ization of Apple had many benefits, but IMO this was a step backwards. Rich metadata like resources provided something few other OS's had, but made things actually *simpler* on disk as long as the tools and apps knew what to expect. Sadly, Apple didn't even patch the low-level BSD cp/mv commands to understand multiple fork files until 10.4, so it's clear this was the direction things were going to go toward. Thus we end up in a world where Bundles and Packages exist and native multiple-fork files are rare. See https://developer.apple.com/library/archive/documentation... for details on how it looks from an OS X perspective.

> Mac OS X plays tricks like this but I guess they're just directories from its filesystem perspective, all magic in userspace?

TL;DR: Folders that appear as a single file, but can be drilled down into in some places (including the Finder).

Hope that helped.

Resource Forks, Bundles, etc.

Posted Jan 8, 2019 3:13 UTC (Tue) by ghane (guest, #1805) [Link] (1 responses)

Ah ...

So AppImage as done by Apple many years earlier? :-)

--
Sanjeev

Resource Forks, Bundles, etc.

Posted Jan 8, 2019 8:38 UTC (Tue) by jccleaver (guest, #127418) [Link]

> So AppImage as done by Apple many years earlier? :-)

Well, more or less. Apple obviously didn't have the true chaos of distros and *nix variants to deal with, just the (brilliant) 68k->PPC transition and the (very well done) PPC->Intel transition, so packaging was never *truly* horrible on the Mac compared to pretty much any other system (especially Windows). OS X application bundles handled multiple architectures with universal binaries almost as a trivial after-thought compared to all the other stuff that was now existing as a separate (hidden) file.

Having complex (eg, multi-stream, "Resource Manager"-accessed) files but far fewer of them made for a much more grokable operating system than what others have had to deal with. The classic Mac system software had no command line, and while graphical linux environments try to get by with that, *nix systems are still dealing with thousands or 10's of thousands of files on a fresh install. That's a lot of complexity to try to paper back over. Even at the worst "7.5.3 Update 2" era Mac OS complexity, you were still only dealing with a couple of hundred files on a new install, max. And if it weren't for System Enablers (basically hardware support files for each released Apple computer family model) it would have been far less.

The Mac OS -> OS X transition was rued by many a classic Mac fan for the interface changes, but more fundamental was the knowledge that we were going from a fundamentally *more simple* system to a complex one that would have to sort of simulate a simple one. FlatPacks, AppImage, and containers generally are all ways to try to get back to that sort of mental simplicity (at the expense of system-management issues such as duplicated libraries or fully static binaries). But there's a place for all kinds of paradigms out there, and smashing them together because Devs can't grok *nix can't really ever achieve the best of either world: the fine-tuning of professional system administration of a complex system, or the ease-of-use and only-a-certain-number-of-things-that-could-be-going-wrongness of a system with fewer parts.

Resource Forks, Bundles, etc.

Posted Jan 17, 2019 19:21 UTC (Thu) by kevinkrouse (guest, #86616) [Link]

I grew up with a Macintosh II in the '90s and your post brought me much nostalgia. I fondly recall changing the resource fork of games with ResEdit to swap out the built-in sounds and sprites with my own.

Extended attributes

Posted Jan 8, 2019 15:41 UTC (Tue) by mina86 (guest, #68442) [Link] (3 responses)

If I understand you correctly, you’re suggesting a iopen(int inode, int flags, mode_t mode) syscall. If that’s the case, the problem is that it would allow bypassing filesystem permissions. Namely, it would render execution bit of a directory useless since user would be able to read a world-readable file even if it resides in directory they have no access to.

Extended attributes

Posted Jan 8, 2019 20:55 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (2 responses)

I'm envisioning something more like openxattrat(int dirfd, const char *path, const char *name, int flags, mode_t mode)—the link to the internal xattr inode would be hidden in the filesystem and you would need at least search access to the file to open the linked xattr inode. User-mode software would never handle the raw inode numbers.

The resulting FD could then be passed as dirfd to openxattrat() (with an empty path) or to flistxattr()/fgetxattr()/fsetxattr() to access the xattrs of the resulting inode, recursively.

Extended attributes

Posted Jan 9, 2019 1:26 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)

It sounds like you're reinventing the same design Solaris already created, but with a needleesly different API.

Extended attributes

Posted Jan 10, 2019 4:05 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

I wouldn't say *needlessly* different, since in the Solaris version openat(O_XATTR) can only open attributes for files which are already open, while attropen() lacks a dirfd argument and thus can only use the current working directory for relative paths. My proposed openxattrat() function would basically be attropen() + dirfd. (Perhaps attropenat() would be more fitting?) In general, though, I agree that the concepts are very similar and there is no reason not to adopt the Solaris interface.

Extended attributes

Posted Jan 8, 2019 15:45 UTC (Tue) by mina86 (guest, #68442) [Link] (1 responses)

Changes to unlink(2) would break rm * though. E.g. if I run rm file file.xattr I’ll get an error deleting the second file since it got deleted transparently.

Extended attributes

Posted Jan 9, 2019 17:16 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

I was imagining that we'd keep the validation blobs in a separate hidden directory where normal user activity (e.g., "rm *") wouldn't find them.