Rethinking fsinfo()

By Jonathan Corbet
August 21, 2020

The proposed fsinfo() system call, which returns extended information about mounted filesystems, was first covered here just over one year ago. The form of fsinfo() has not changed much in that year, but the debate over merging it continues. To some, fsinfo() is needed to efficiently obtain information about filesystems; to others, it is an unnecessary and over-engineered mechanism. Changes will probably be necessary if this feature is ever to make it into the mainline kernel.

Linux has long supported the statfs() system call (usually seen from user space as statvfs()) as a way of obtaining information about mounted filesystems. As has happened so often, though, the designers of statfs() made a list of all the filesystem attributes they thought might be interesting and limited the call to those attributes; there is no way to extend it with new attributes. Filesystem designers, though, have stubbornly refused to stop designing new features in the decades since statfs() was set in stone, so there is now a lot of relevant information that cannot be obtained from statfs(). Such details include mount options, timestamp granularity, associated labels and UUIDs, and whether the filesystem supports features like extended attributes, access-control lists, and case-insensitive lookups.

As it happens, the kernel does make much of that information available now by way of the /proc/mounts virtual file. The problem with /proc/mounts, beyond the fact that some information is still missing, is that it is inefficient to access. Reading the contents of that file requires the kernel to query every mounted filesystem for the relevant information; on systems with a lot of mounted filesystems, that can get expensive. Systems running containerized workloads, in particular, can have vast numbers of mounts — thousands in some cases — so reading /proc/mounts can be painful indeed. For extra fun, the only way to know about newly mounted filesystems with current kernels is to poll /proc/mounts and look for new entries.

David Howells proposes to solve the polling problem with a new notification mechanism, but that mechanism, in turn, relies on fsinfo(), the 21st revision of which was posted on August 3. Howells requested that both notifications and fsinfo() be pulled during the 5.9 merge window, but that did not happen. Instead, the request resulted in yet another discussion about whether fsinfo() makes sense in its current form.

fsinfo()

The API for fsinfo() is comprehensive and extensible; there should never be a need for an fsinfo2() to add new attributes in the future. But it is also complex. On the surface, the interface looks like this:

    int fsinfo(int dfd, const char *pathname, const struct fsinfo_params *params,
	       size_t params_size, void *result_buffer, size_t result_buf_size);

Where the params structure is defined as:

    struct fsinfo_params {
	__u64	resolve_flags;	/* RESOLVE_* flags */
	__u32	at_flags;	/* AT_* flags */
	__u32	flags;		/* Flags controlling fsinfo() specifically */
	__u32	request;	/* ID of requested attribute */
	__u32	Nth;		/* Instance of it (some may have multiple) */
	__u32	Mth;		/* Subinstance of Nth instance */
    };

There are four different ways to use dfd, pathname, and params->at_flags to specify which filesystem should be queried; see this patch changelog for details. The rest of the params structure describes the actual information request; the results end up in result_buffer.

There are numerous possibilities for params->request, including:

FSINFO_ATTR_STATFS returns more-or-less the same information that would be obtained from statfs().
FSINFO_ATTR_LIMITS returns various limits of the filesystem, including maximum file size, inode number, user ID number, hard links to a file, file-name length, etc. These are returned in an fsinfo_limits structure.
FSINFO_ATTR_TIMESTAMP_INFO yields information about timestamps on files as a set of binary structures; this information includes the maximum values and granularity of timestamps expressed in a unique (to the kernel) mantissa-and-exponent format.
FSINFO_ATTR_MOUNT_POINT generates a string showing where the filesystem is mounted.
FSINFO_ATTR_MOUNT_CHILDREN gives an array of structures identifying the filesystems mounted below the filesystem being queried.

The full list of possible requests is rather longer than the above. Each returns data in a different format, usually a specific binary structure for the information requested. For some attributes, a query might return an arbitrary number of elements; in this case, the Nth and Mth fields in the fsinfo_params structure can be used to identify which should be returned. This patch contains a sample program that exercises a number of fsinfo() features to produce a listing showing the mount topography of the current system.

Complaints and alternatives

There are a couple of points of resistance to the fsinfo() proposal, starting with whether it is needed at all. Linus Torvalds called it "engineering for its own sake, rather than responding to actual user concerns" and wondered why it was needed now after Linux has done without it for so many years. Torvalds tends to worry about adding system calls that end up being used by nobody, so it is not unusual for him to push for justification for the addition of new interfaces. It didn't take long for potential users to make their needs clear; Steven Whitehouse described it this way:

The overall aim is to solve some issues relating to scaling to large numbers of mount in systemd and autofs, and also to provide a generically useful interface that other tools may use to monitor mounts in due course too. Currently parsing /proc/mounts is the only option, and that tends to be slow and is certainly not atomic.

Karel Zak, maintainer of the util-linux package, described the needs of systems with thousands of mount points. Lennart Poettering provided a long list of attributes he would like to learn about filesystems and why they would be useful. The end result of all this discussion is that the need for some sort of filesystem-information system call is not really in doubt.

The complexity of fsinfo() still gives some developers something to worry about, though; to them, it looks like yet another multiplexer system call that tries to do a large number of things. But it's not entirely clear what an alternative would look like. There was a brief digression in which Torvalds suggested an API where attributes of a file could be opened as if that file were actually a directory; so, for example, opening (with a special flag) foo/max_file_size would allow the reading of the maximum file size supported by the filesystem hosting the plain file foo. This idea strongly resembles the controversial approach to metadata implemented by the reiser4 filesystem back in 2004, though nobody seemed to think it was politic to point that out in the discussion.

What was pointed out was that there are numerous practical difficulties associated with implementing this sort of mechanism. Even precisely defining its semantics turns out to be hard. So this idea was put aside; it will languish until somebody else surely suggests it again several years from now.

That leaves open the question of what a new API for obtaining filesystem information should look like. Torvalds called fsinfo() "confusing and over-engineered" and asked: "Can we just make a simple extended statfs() and be done with it, instead of this hugely complex thing that does five different things with the same interface and makes it really odd as a result?" He further suggested that a number of the binary structures used by fsinfo() could be replaced by ASCII data. He pointed out that a number of filesystem interfaces use ASCII for the more complex attributes already and expressed hope that a kernel interface exporting information in ASCII would make life easier for code that is parsing that information out of /proc/mounts now.

So the end result of this discussion is likely to be an attempt to redesign fsinfo() along those lines. There is a problem here, though: the information needed is, like the systems it is representing, inherently complex. By the time a statfs()-like API that can represent all of this information and which can be extended in the future is designed, chances are that this design will start to look a lot like what fsinfo() is now. Replacing a few binary structures with ASCII seems unlikely to change the picture significantly. The end result of this whole exercise may be something that strongly resembles the current design.

Index entries for this article
Kernel	System calls

Rethinking fsinfo()

Posted Aug 21, 2020 17:55 UTC (Fri) by zyga (subscriber, #81533) [Link] (29 responses)

Unrelated to fsinfo directly, but related to parsing ascii. The problem is not the parsing of one-off specific file. It's that there's no consistency anywhere in the kernel, to represent structured data. Having written and maintained a parser for /proc/PID/mountinfo for several years I found:

- A few bugs, over time, in a rather small and well tested code.
- Edge case that affects extremely common and battle-tested implementations (mount/systemd)
- Had to venture into the belly of the kernel to understand the precise implementation, and escaping rules

Perhaps it would be worth recognizing that the format is neither simple nor common. Space escaping rules differ from file to file. There is no libkernel, with reference parsers, everyone rolls their own.

Could the kernel, just maybe, adopt something (whatever) standard and simple. So that parsing the next "simple" text file, is done from a library function available in all the modern programming languages? Could we just use JSON or something of the kind?

If the counter argument is that parsing JSON is hard, I will only say that there are a few high-quality implementations, including the one that everyone reading this comment is using at the time. I don't think we need to invent a new format for fsinfo.

Rethinking fsinfo()

Posted Aug 21, 2020 18:09 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

I often wish the kernel would use something like protobufs for syscalls. It would save a lot of issues with marshalling/unmarshalling.

Rethinking fsinfo()

Posted Aug 22, 2020 12:57 UTC (Sat) by ibukanov (subscriber, #3942) [Link]

This still adds friction as suddenly one needs to account for generated headers in the build system and add extra dependency. I just wish the Linux kernel has used more that old trick of using a struct for sys call arguments and passing the struct size as a version information. Then new fields can be added as necessary and the code will be both forward and backward compatible.

Rethinking fsinfo()

Posted Aug 22, 2020 15:31 UTC (Sat) by dezgeg (subscriber, #92243) [Link]

Reusing the tracepoint infrastructure where every event can be either read as a binary struct (with the struct layout described by some sysfs file) or as a formatted string would be a nice solution, I think.

Rethinking fsinfo()

Posted Aug 24, 2020 15:45 UTC (Mon) by SEJeff (guest, #51588) [Link] (4 responses)

The double copy required to read protobufs would probably be terrible for user/kernel interaction. Flatbuffers, and its zero copy deserialization would be a better fit, even if it isn't quite as well known.

Rethinking fsinfo()

Posted Aug 24, 2020 18:29 UTC (Mon) by josh (subscriber, #17465) [Link] (3 responses)

"double copy"? I understand why it might require one copy (to deserialize to an in-memory format), but why two?

Rethinking fsinfo()

Posted Aug 24, 2020 19:07 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

I would guess to avoid racing with the userspace while unmarshalling the data? Though it might be possible to create a race-safe parser for protobufs, they are not complicated.

Rethinking fsinfo()

Posted Aug 24, 2020 20:15 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

I think Nanopb (a protobuf implementation for embedded systems) is already race-safe, since the decoder just reads linearly from an input stream, so that does seem possible.

When decoding strings or byte arrays, it can read directly from the stream into the decoded message struct (if the field is configured with a fixed max size) or into a malloced buffer (if configured with variable size) or can pass a substream object representing the value into a callback function. Using the callback interface would let the kernel copy directly from the userspace input buffer into the appropriate internal kernel struct.

It looks like FlatBuffers can't do that, because verification (to avoid out-of-bounds reads etc) is a separate operation from reading fields. You'd have to memcpy the whole buffer from userspace to kernel memory before verifying and then copying strings again into kernel structs.

For kernel-to-userspace messages, you don't need to worry about race conditions and you probably don't need the verification step (since you have to trust the kernel anyway), so FlatBuffers could work better there.

For userspace-to-kernel in both protocols, if you really don't want to force the user to pack all their data into a single buffer, you could always encode userspace pointers as integers (like a "fixed64" in protobuf) to point to raw data or encoded messages at other addresses, and the kernel can traverse those pointers manually like it does today. You'd still get the benefit of automatic marshalling for the majority of structs and fields.

Rethinking fsinfo()

Posted Aug 27, 2020 16:50 UTC (Thu) by esemwy (guest, #83963) [Link]

Anybody remember VMS? DEC solved this by passing arguments by *descriptor*, which was a self describing array of parameters. Given how often the syscall vs kernel feature mismatch comes up, it seems it wasn’t such a weird idea after all.

http://h30266.www3.hpe.com/odl/axpos/opsys/vmsos84/5841/5...

Rethinking fsinfo()

Posted Aug 21, 2020 22:59 UTC (Fri) by chris_se (subscriber, #99706) [Link]

A couple of years or so ago I had to write some portable code (Windows, macOS, Linux) to figure out the number of physical CPU cores in the system (in order to provide the user with a sane default for the number of processing threads for a workload that doesn't jive well with hyperthreading/SMT). This was near-trivial on macOS/Windows, but a plain awful experience on Linux.

macOS: 1 direct system call, 5 LOC with error handling
Windows: 2 low-level API calls (that probably translate directly into system calls? not sure) + some struct processing in a loop, 12 LOC with error handling

Linux: have to parse /sys/devices/system/cpu/cpuX/topology/thread_siblings_list, X starting at 0, as well as checking /sys/devices/system/cpu/cpuX/online to see whether the core was actually active. I wrote my first implementation on an Intel system, where the siblings list of a 2-core system with hyperthreading would be {"0,2", "1,3", "0,2", "1,3"}, because Intel puts all of their hyperthreaded cores _after_ all of the physical cores. Tried that a year later on an AMD system with SMT, and because AMD groups their cores differently, the contents are {"0-1", "0-1", "2-3", "2-3"}. My code couldn't interpret that because it didn't consider the fact that the cores could be specified as ranges. So I had to go back and change it. Sure, if you read the documentation _really_ carefully there is an indication how to parse the format, so yeah, that was my fault. But in the end, my (now correct) implementation is > 20 LOC just for parsing the thread_siblings_list file, not including the logic to actually obtain the result that I actually want. A simple system call to obtain information about the CPUs in the current system as some kind of struct would have made my life a _lot_ easier in that case.

I have to do a lot of low-level OS programming on various OS on a daily basis -- and in general I like the Linux APIs a lot better than the APIs of most other operating systems -- but as soon as I have to parse or generate ASCII, I start to hate it with a vengeance. (Thankfully this is not the case for most things I have to do.) I consider the proposed fsinfo() system call interface to be VASTLY superior to any ASCII-based interface. The underlying user-space code would be a LOT less error-prone for me. For me as a user-space developer, I could see myself using the proposed fsinfo() call (with the structs) in the future to obtain some information about the filesystem, but unless I desperately needed a piece of information for some reason I would completely avoid this system call if it required me to parse an ASCII string. (Though, to be fair, I'm definitely not one of the people who is the main target of this functionality at all; for my software this currently falls more into the "nice to have" category.)

Rethinking fsinfo()

Posted Aug 22, 2020 15:03 UTC (Sat) by vadim (subscriber, #35271) [Link]

I concur with the desire for JSON or something similar.

The worst thing about the text formats is that they are brittle. They're prone to failure when some unexpected character sneaks in, to ad-hoc implementations that are based on a programmer looking at 'cat /proc/whatever' and writing whatever comes to mind ("oh, so this is a list with one entry per line, with elements separated by spaces") and to being inflexible for expansion (can you add anything to /proc/partitions and not break a lot of stuff?)

The second worst is that different philosophies are being followed.

/proc/swaps might as well be the output of a command like 'df'

/proc/meminfo is half-written for human consumption, with amounts in kB and aligned columns, and half written for machine consumption with obscure labels like "Committed_AS".

/proc/mounts is very machine oriented.

Some files explicitly tell you the units (and it can be unclear whether another unit could ever be used). Some files have column names. Some like /proc/ioports have meaningful identation.

Many are uncomfortable for machine parsing. Eg, in /proc/cpuinfo you get "address sizes : 39 bits physical, 48 bits virtual". -- you have to parse that comma, and it's unclear whether a third thing could ever be there. Many files are lacking in unique identifiers, eg, my /proc/ioports has 3 levels of "0000-0000 : pnp 00:00", one under another.

I'm amazed that the computing industry is pretty old by now, we have plentiful RAM, storage and CPU power, and the issues of escaping data, representing arrays and trees, and allowing for adding extra info without breaking existing software are still with us, despite there being things like JSON that solve the vast majority of them.

Rethinking fsinfo()

Posted Aug 22, 2020 15:26 UTC (Sat) by dezgeg (subscriber, #92243) [Link] (12 responses)

Sadly JSON (and many other text formats these days) assume Unicode strings though, while for example mountinfo may contain non-Unicode data like pathnames.

Rethinking fsinfo()

Posted Aug 22, 2020 16:36 UTC (Sat) by zyga (subscriber, #81533) [Link]

Well, is that a reason for all the ad-hoc formats? We could pass paths as byte arrays. In reality, most software will have issues with non-UTF8-friendly things anyway, because they may want to display it. It's nice that open(2) does not complain but it's pretty rubbish if no application can ever display that thing without "here are some bytes".

Rethinking fsinfo()

Posted Aug 22, 2020 18:50 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

Realistically, if you want to pass raw bytes through a text-formatted thing, you should be using Base64 or something similar. This of course means that paths become unreadable to humans without decoding, but you could have a flag indicating whether a path has been escaped, and then only escape things that aren't valid UTF-8. Alternatively, you could encode the "bad bytes" with \\x00 through \\xFF, which is valid in a JSON string (the backslash is escaped, so it's "just" a backslash followed by three letters), but could be confused with a real filename (so you would need to invent further escaping for that case, as described in https://xkcd.com/1638/).

Or, if you think letting people create files with ridiculous names was a bad idea to begin with, you could simply declare non-UTF-8 paths unsupported and spit out invalid JSON if the user tries to create one. Much userspace software already does something like that anyway (see for example Python 3's surrogateescape hack). But then a lot of parsers will work just fine the vast majority of the time, and break on an obscure condition that the average engineer may not even realize is possible. So that's probably not ideal...

Rethinking fsinfo()

Posted Aug 23, 2020 15:22 UTC (Sun) by abo (subscriber, #77288) [Link] (9 responses)

Surrogate escapes can be used to encode arbitrary bytes in JSON:

(python)

>>> b = bytes(range(256))
>>> u = b.decode("UTF-8", errors="surrogateescape")
>>> import json
>>> j = json.dumps(u)
>>> uin = json.loads(j)
>>> bin = uin.encode("UTF-8", errors="surrogateescape")
>>> [n for n in bin]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255]
>>> bin == b
True

The JSON looks like this:

"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f\udc80\udc81\udc82\udc83\udc84\udc85\udc86\udc87\udc88\udc89\udc8a\udc8b\udc8c\udc8d\udc8e\udc8f\udc90\udc91\udc92\udc93\udc94\udc95\udc96\udc97\udc98\udc99\udc9a\udc9b\udc9c\udc9d\udc9e\udc9f\udca0\udca1\udca2\udca3\udca4\udca5\udca6\udca7\udca8\udca9\udcaa\udcab\udcac\udcad\udcae\udcaf\udcb0\udcb1\udcb2\udcb3\udcb4\udcb5\udcb6\udcb7\udcb8\udcb9\udcba\udcbb\udcbc\udcbd\udcbe\udcbf\udcc0\udcc1\udcc2\udcc3\udcc4\udcc5\udcc6\udcc7\udcc8\udcc9\udcca\udccb\udccc\udccd\udcce\udccf\udcd0\udcd1\udcd2\udcd3\udcd4\udcd5\udcd6\udcd7\udcd8\udcd9\udcda\udcdb\udcdc\udcdd\udcde\udcdf\udce0\udce1\udce2\udce3\udce4\udce5\udce6\udce7\udce8\udce9\udcea\udceb\udcec\udced\udcee\udcef\udcf0\udcf1\udcf2\udcf3\udcf4\udcf5\udcf6\udcf7\udcf8\udcf9\udcfa\udcfb\udcfc\udcfd\udcfe\udcff"

Rethinking fsinfo()

Posted Aug 23, 2020 18:05 UTC (Sun) by excors (subscriber, #95769) [Link] (8 responses)

That goes against the interoperability recommendations of the JSON RFC, which says (https://tools.ietf.org/html/rfc8259#section-8.2):

> the ABNF in this specification allows member names and string values to contain bit sequences that cannot encode Unicode characters; for example, "\uDEAD" (a single unpaired UTF-16 surrogate). Instances of this have been observed, for example, when a library truncates a UTF-16 string without checking whether the truncation split a surrogate pair. The behavior of software that receives JSON texts containing such values is unpredictable; for example, implementations might return different values for the length of a string value or even suffer fatal runtime exceptions.

so it seems a bad idea to rely on unpaired surrogates (like surrogateescape) if you're choosing JSON specifically for its interoperability.

Surely the simplest way to encode Linux's 8-bit paths in JSON is to map the bytes 0x00..0xFF onto U+0000..U+00FF and then proceed as normal. When decoding, treat any element >=U+0100 as a syntax error. That should be interoperable between all JSON implementations, and very easy to handle in both Unicode-aware and -unaware applications.

If the application wants to display the path to a user, do a potentially-lossy UTF-8 decode in the UI layer, which is about the best you can ever do with Linux paths regardless of how they're encoded for transport. For all non-display-related processing of paths (which I think is more common and more important than displaying paths), keep them in the simple lossless U+0000..U+00FF representation.

(Windows' 16-bit paths are more complicated, if you want to handle them pedantically correctly: they can contain unpaired surrogates so you can't simply interpret them as JSON-compatible Unicode strings. In that case it's probably safer to treat them as binary data and encode with base64.)

Rethinking fsinfo()

Posted Aug 23, 2020 19:08 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Surely the simplest way to encode Linux's 8-bit paths in JSON is to map the bytes 0x00..0xFF onto U+0000..U+00FF and then proceed as normal. When decoding, treat any element >=U+0100 as a syntax error. That should be interoperable between all JSON implementations, and very easy to handle in both Unicode-aware and -unaware applications.

So, basically, pretend we have LC_ALL="[whatever].ISO-8859-1" at both ends, and then require userspace to clean up the mess if LC_ALL is actually set to a different value (which, on modern systems, is typically the case). The problem, of course, is that if you ever try to decode that JSON with a naive implementation, you will get mojibake since they will skip the "clean up the mess" step. So you still need non-naive implementations, which makes me wonder, why bother with JSON in the first place?

> If the application wants to display the path to a user, do a potentially-lossy UTF-8 decode in the UI layer, which is about the best you can ever do with Linux paths regardless of how they're encoded for transport.

Strictly, you should be consulting the locale information rather than just assuming UTF-8. UTF-8 is the most common encoding, but its use in pathnames is not required by any standard that I'm aware of. Now, you can't use something too weird such as UTF-16 (null bytes not allowed), but legacy 8-bit encodings are very much legal and valid on some older systems.

Rethinking fsinfo()

Posted Aug 23, 2020 22:44 UTC (Sun) by shemminger (subscriber, #5739) [Link]

Text interfaces to userspace are brittle (easily broken) and suck. If you look at some of the interface in /proc/net there are columns filled with zeros because some field existed in 2.2 and can never change.

Message based interfaces like netlink are more slightly more difficult to program but offer opportunity for expansion.

Rethinking fsinfo()

Posted Aug 24, 2020 4:51 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (5 responses)

> In that case it's probably safer to treat them as binary data and encode with base64

In what endianness do you treat the incoming 16bit data? Big? Little? Native? Native is easy, but it means you need to know what the host system is before archiving the raw data. Little is easy, but then can be confusing in the raw data viewers (which could render backwards). BOM is ok? But you could also start a filename with a BOM and…blah.

Rethinking fsinfo()

Posted Aug 24, 2020 12:30 UTC (Mon) by excors (subscriber, #95769) [Link] (4 responses)

Since this is about Windows, and Windows is always little-endian (except on Xbox 360, as far as I can tell), it seems obvious to use little-endian here. Since it's not Unicode (it's just an array of 16-bit values) there's no reason to even think about BOMs. You'd simply take the LPCWSTR path (i.e. const wchar_t*, where sizeof(wchar_t)==2) which is used by the Win32 APIs, then cast to uint8_t* and base64-encode as normal. That seems easy.

Most code that processes the path should treat it as an opaque blob or decode it to wchar_t*, and wouldn't need to care about Unicode or surrogates etc.

When you want to display the path to a user, you'd need to do a lossy UTF-16LE decode to get a real Unicode string to pass into your UI system. (Lossy because the path might contain unpaired surrogates which you can't decode safely). (If you're using the Win32 UI APIs, that decoding will probably happen implicitly inside the API implementation; otherwise you might need to do it in the application). The important thing is to avoid trying to decode into a real Unicode string in any context where the lossiness will cause worse than a cosmetic glitch. (So you shouldn't try to store Windows paths directly in JSON, because interoperable JSON requires real Unicode strings, hence the base64 encoding.)

(Linux is the same except 8-bit instead of 16-bit, and probably UTF-8 (or the user's current locale, as NYKevin mentioned, though of course they might have files created under a different locale and there's no way to be sure what they were meant to be) instead of almost always UTF-16LE, and you can encode arbitrary 8-bit strings as JSON strings much more easily than encoding arbitrary 16-bit strings (where you need base64 etc). On both platforms it's a mistake to think that a path is simply an encoded Unicode string, and that you can decode/encode at the edge and do all your internal processing with Unicode.)

Rethinking fsinfo()

Posted Aug 25, 2020 19:02 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (3 responses)

> Most code that processes the path should treat it as an opaque blob or decode it to wchar_t*, and wouldn't need to care about Unicode or surrogates etc.

I agree that just stuffing paths into binary storage is the best solution. However, usually paths need displayed or the storage you're using has a human caring about it at some point in its lifetime. Especially if you're using a container format like JSON. It's nice and all, but a way to store arbitrary binary data without having to figure out how to encode it so that it is Unicode safe would have been much appreciated. (No, BSON don't fix this; they just change the window dressing from `{:"",}` into type-and-length-prefixed fields or type-and-NUL-terminated sequences). CBOR has binary data, but then library support is more widely lacking.

FWIW, I've spent a lot of time thinking about how to stuff paths into JSON: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p...

Rethinking fsinfo()

Posted Aug 26, 2020 19:43 UTC (Wed) by unilynx (guest, #114305) [Link] (2 responses)

I hope for a future where someone introduces a 'sane-names' filesystem mount option, which will forbid the use of invalid UTF8, filenames starting with a dash or space, containing dollar signs, and all the other funny things that make processing filenames hard or dangerous. Spaces in filenames we probably have to live with.

Distributions might slowly make that option the default for new systems, sysadmins can opt-in faster themselves, unless they really have to deal with those few applications (which will hopefully disappear or become obsolete fast) that really, really want to create weird filenames.

Rethinking fsinfo()

Posted Aug 27, 2020 4:48 UTC (Thu) by neilbrown (subscriber, #359) [Link]

Requiring valid UTF-8 is probably sensible for a new filesystem.
Excluding end-of-line characters is probably justifiable too. (or any control char ... I don't think we need TAB or DEL).
Anything else is parochial.
When I'm choosing a name to save my document from my GUI, why should I care about your inability to write safe shell scripts, or even have any understanding that "the shell" exists.
It is bad enough that I cannot put a '/' in my file names, why would you prevent me using '$'??

Rethinking fsinfo()

Posted Aug 29, 2020 11:29 UTC (Sat) by flussence (guest, #85566) [Link]

We're slowly getting there, the kernel has Unicode normalisation for filesystems at long last. I think we could live without ASCII control chars next, though I don't agree that we should forbid filenames from containing strings that an average person at a regular keyboard could type. Computers are meant to serve people, not the other way around.

Rethinking fsinfo()

Posted Aug 26, 2020 14:09 UTC (Wed) by flussence (guest, #85566) [Link] (5 responses)

JSON looks simple until you want to add a human-readable comment (syntax error), or escape some non-ASCII chars in text (utf-16 only), or even just shoot it out without counting and cleaning up trailing commas (syntax error).

Anything that requires more than about half a kb of bash-builtins parsing code to deal with is too complicated, IMHO.

Rethinking fsinfo()

Posted Aug 26, 2020 19:36 UTC (Wed) by unilynx (guest, #114305) [Link]

That is what `jq` is for. It turns bash into a very capable automation environment around eg. AWS or digital ocean, as they ship with cli tools that give json output.

Rethinking fsinfo()

Posted Aug 27, 2020 12:08 UTC (Thu) by zyga (subscriber, #81533) [Link] (3 responses)

We are not talking about configuration files but about kernel-userspace interfaces. If you need comments, just describe it like any other data. It's not something you will edit by hand. In addition, current syntax does not support comments so I don't quite know if you are confusing this with something else or if I'm missing your point.

As for parsing, please show me a correct /proc/self/mountinfo parser in shell. I'll wait. As another poster commented, jq handles that for shell scripts in a single-line correct and simple manner. The moment we step out of custom formats the kernel forces on us, the moment we start to have really rich set of tools for processing data.

And it doesn't have to be JSON. It should just not be ad-hoc, per-file convention with custom, brittle parser.

Rethinking fsinfo()

Posted Aug 27, 2020 13:10 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (1 responses)

YAML is a superset of JSON, using whitespace instead of (most) punctuation to indicate structure, and it supports comments.

Rethinking fsinfo()

Posted Aug 28, 2020 13:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

And if we could strip away things like type annotations, the umpteen ways of quoting text blocks (and other incidental possibilities without a complicated escape generator. Sure, you could call it YAML, but the only safe way to actually generate it with arbitrary data is to treat it like JSON because no one wants to write escape detectors for all of YAML's arcane syntax features.

Don't forget that some parsers have baked in the extension proposals(!) in, so you have to pay attention to things like accidentally generating merge keys ("<<").

Rethinking fsinfo()

Posted Aug 28, 2020 12:13 UTC (Fri) by mvdwege (guest, #113583) [Link]

Best solution: discard the dogmatic adherence to 'it must be text'. If the primary consumer of the information is not a sysadmin at the console, use binary data and cut out the redundant parsing step.

Yes, 'everything is an ASCII stream' makes things easily readable for humans, and it is great if you have text tools to write ad-hoc parsers for it. The problem is that you keep writing ad-hoc parsers. A little more pragmatism on this old UNIX dogma would be appreciated.

I'm speaking as a sysadmin/developer who ran into this when I wanted to verify if my local MTA had successfully sent (aka received a 250 reply) an email. The only way to do that was to parse fscking syslog. In 2018. When tools like D-Bus notification already had existed for over a decade.