LWN: Comments on "Rethinking fsinfo()"

Rethinking fsinfo()

flussence — Sat, 29 Aug 2020 11:29:01 +0000

We're slowly getting there, the kernel has Unicode normalisation for filesystems at long last. I think we could live without ASCII control chars next, though I don't agree that we should forbid filenames from containing strings that an average person at a regular keyboard could type. Computers are meant to serve people, not the other way around.

Rethinking fsinfo()

mathstuf — Fri, 28 Aug 2020 13:13:25 +0000

And if we could strip away things like type annotations, the umpteen ways of quoting text blocks (and other incidental possibilities without a complicated escape generator. Sure, you could call it YAML, but the only safe way to actually generate it with arbitrary data is to treat it like JSON because no one wants to write escape detectors for all of YAML's arcane syntax features.

Don't forget that some parsers have baked in the extension proposals(!) in, so you have to pay attention to things like accidentally generating merge keys ("<<").

Rethinking fsinfo()

mvdwege — Fri, 28 Aug 2020 12:13:58 +0000

Best solution: discard the dogmatic adherence to 'it must be text'. If the primary consumer of the information is not a sysadmin at the console, use binary data and cut out the redundant parsing step.

Yes, 'everything is an ASCII stream' makes things easily readable for humans, and it is great if you have text tools to write ad-hoc parsers for it. The problem is that you keep writing ad-hoc parsers. A little more pragmatism on this old UNIX dogma would be appreciated.

I'm speaking as a sysadmin/developer who ran into this when I wanted to verify if my local MTA had successfully sent (aka received a 250 reply) an email. The only way to do that was to parse fscking syslog. In 2018. When tools like D-Bus notification already had existed for over a decade.

Rethinking fsinfo()

esemwy — Thu, 27 Aug 2020 16:50:07 +0000

Anybody remember VMS? DEC solved this by passing arguments by *descriptor*, which was a self describing array of parameters. Given how often the syscall vs kernel feature mismatch comes up, it seems it wasn’t such a weird idea after all.

http://h30266.www3.hpe.com/odl/axpos/opsys/vmsos84/5841/5...

Rethinking fsinfo()

kpfleming — Thu, 27 Aug 2020 13:10:21 +0000

YAML is a superset of JSON, using whitespace instead of (most) punctuation to indicate structure, and it supports comments.

Rethinking fsinfo()

zyga — Thu, 27 Aug 2020 12:08:21 +0000

We are not talking about configuration files but about kernel-userspace interfaces. If you need comments, just describe it like any other data. It's not something you will edit by hand. In addition, current syntax does not support comments so I don't quite know if you are confusing this with something else or if I'm missing your point.

As for parsing, please show me a correct /proc/self/mountinfo parser in shell. I'll wait. As another poster commented, jq handles that for shell scripts in a single-line correct and simple manner. The moment we step out of custom formats the kernel forces on us, the moment we start to have really rich set of tools for processing data.

And it doesn't have to be JSON. It should just not be ad-hoc, per-file convention with custom, brittle parser.

Rethinking fsinfo()

neilbrown — Thu, 27 Aug 2020 04:48:26 +0000

Requiring valid UTF-8 is probably sensible for a new filesystem.
Excluding end-of-line characters is probably justifiable too. (or any control char ... I don't think we need TAB or DEL).
Anything else is parochial.
When I'm choosing a name to save my document from my GUI, why should I care about your inability to write safe shell scripts, or even have any understanding that "the shell" exists.
It is bad enough that I cannot put a '/' in my file names, why would you prevent me using '$'??

Rethinking fsinfo()

unilynx — Wed, 26 Aug 2020 19:43:45 +0000

I hope for a future where someone introduces a 'sane-names' filesystem mount option, which will forbid the use of invalid UTF8, filenames starting with a dash or space, containing dollar signs, and all the other funny things that make processing filenames hard or dangerous. Spaces in filenames we probably have to live with.

Distributions might slowly make that option the default for new systems, sysadmins can opt-in faster themselves, unless they really have to deal with those few applications (which will hopefully disappear or become obsolete fast) that really, really want to create weird filenames.

Rethinking fsinfo()

unilynx — Wed, 26 Aug 2020 19:36:40 +0000

That is what `jq` is for. It turns bash into a very capable automation environment around eg. AWS or digital ocean, as they ship with cli tools that give json output.

Rethinking fsinfo()

flussence — Wed, 26 Aug 2020 14:09:51 +0000

JSON looks simple until you want to add a human-readable comment (syntax error), or escape some non-ASCII chars in text (utf-16 only), or even just shoot it out without counting and cleaning up trailing commas (syntax error).

Anything that requires more than about half a kb of bash-builtins parsing code to deal with is too complicated, IMHO.

Rethinking fsinfo()

mathstuf — Tue, 25 Aug 2020 19:02:36 +0000

> Most code that processes the path should treat it as an opaque blob or decode it to wchar_t*, and wouldn't need to care about Unicode or surrogates etc.

I agree that just stuffing paths into binary storage is the best solution. However, usually paths need displayed or the storage you're using has a human caring about it at some point in its lifetime. Especially if you're using a container format like JSON. It's nice and all, but a way to store arbitrary binary data without having to figure out how to encode it so that it is Unicode safe would have been much appreciated. (No, BSON don't fix this; they just change the window dressing from `{:"",}` into type-and-length-prefixed fields or type-and-NUL-terminated sequences). CBOR has binary data, but then library support is more widely lacking.

FWIW, I've spent a lot of time thinking about how to stuff paths into JSON: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p...

Rethinking fsinfo()

excors — Mon, 24 Aug 2020 20:15:53 +0000

I think Nanopb (a protobuf implementation for embedded systems) is already race-safe, since the decoder just reads linearly from an input stream, so that does seem possible.

When decoding strings or byte arrays, it can read directly from the stream into the decoded message struct (if the field is configured with a fixed max size) or into a malloced buffer (if configured with variable size) or can pass a substream object representing the value into a callback function. Using the callback interface would let the kernel copy directly from the userspace input buffer into the appropriate internal kernel struct.

It looks like FlatBuffers can't do that, because verification (to avoid out-of-bounds reads etc) is a separate operation from reading fields. You'd have to memcpy the whole buffer from userspace to kernel memory before verifying and then copying strings again into kernel structs.

For kernel-to-userspace messages, you don't need to worry about race conditions and you probably don't need the verification step (since you have to trust the kernel anyway), so FlatBuffers could work better there.

For userspace-to-kernel in both protocols, if you really don't want to force the user to pack all their data into a single buffer, you could always encode userspace pointers as integers (like a "fixed64" in protobuf) to point to raw data or encoded messages at other addresses, and the kernel can traverse those pointers manually like it does today. You'd still get the benefit of automatic marshalling for the majority of structs and fields.

Rethinking fsinfo()

Cyberax — Mon, 24 Aug 2020 19:07:31 +0000

I would guess to avoid racing with the userspace while unmarshalling the data? Though it might be possible to create a race-safe parser for protobufs, they are not complicated.

Rethinking fsinfo()

josh — Mon, 24 Aug 2020 18:29:42 +0000

"double copy"? I understand why it might require one copy (to deserialize to an in-memory format), but why two?

Rethinking fsinfo()

SEJeff — Mon, 24 Aug 2020 15:45:54 +0000

The double copy required to read protobufs would probably be terrible for user/kernel interaction. Flatbuffers, and its zero copy deserialization would be a better fit, even if it isn't quite as well known.

Rethinking fsinfo()

excors — Mon, 24 Aug 2020 12:30:34 +0000

Since this is about Windows, and Windows is always little-endian (except on Xbox 360, as far as I can tell), it seems obvious to use little-endian here. Since it's not Unicode (it's just an array of 16-bit values) there's no reason to even think about BOMs. You'd simply take the LPCWSTR path (i.e. const wchar_t*, where sizeof(wchar_t)==2) which is used by the Win32 APIs, then cast to uint8_t* and base64-encode as normal. That seems easy.

Most code that processes the path should treat it as an opaque blob or decode it to wchar_t*, and wouldn't need to care about Unicode or surrogates etc.

When you want to display the path to a user, you'd need to do a lossy UTF-16LE decode to get a real Unicode string to pass into your UI system. (Lossy because the path might contain unpaired surrogates which you can't decode safely). (If you're using the Win32 UI APIs, that decoding will probably happen implicitly inside the API implementation; otherwise you might need to do it in the application). The important thing is to avoid trying to decode into a real Unicode string in any context where the lossiness will cause worse than a cosmetic glitch. (So you shouldn't try to store Windows paths directly in JSON, because interoperable JSON requires real Unicode strings, hence the base64 encoding.)

(Linux is the same except 8-bit instead of 16-bit, and probably UTF-8 (or the user's current locale, as NYKevin mentioned, though of course they might have files created under a different locale and there's no way to be sure what they were meant to be) instead of almost always UTF-16LE, and you can encode arbitrary 8-bit strings as JSON strings much more easily than encoding arbitrary 16-bit strings (where you need base64 etc). On both platforms it's a mistake to think that a path is simply an encoded Unicode string, and that you can decode/encode at the edge and do all your internal processing with Unicode.)

Rethinking fsinfo()

mathstuf — Mon, 24 Aug 2020 04:51:59 +0000

> In that case it's probably safer to treat them as binary data and encode with base64

In what endianness do you treat the incoming 16bit data? Big? Little? Native? Native is easy, but it means you need to know what the host system is before archiving the raw data. Little is easy, but then can be confusing in the raw data viewers (which could render backwards). BOM is ok? But you could also start a filename with a BOM and…blah.

Rethinking fsinfo()

shemminger — Sun, 23 Aug 2020 22:44:42 +0000

Text interfaces to userspace are brittle (easily broken) and suck. If you look at some of the interface in /proc/net there are columns filled with zeros because some field existed in 2.2 and can never change.

Message based interfaces like netlink are more slightly more difficult to program but offer opportunity for expansion.

Rethinking fsinfo()

NYKevin — Sun, 23 Aug 2020 19:08:00 +0000

> Surely the simplest way to encode Linux's 8-bit paths in JSON is to map the bytes 0x00..0xFF onto U+0000..U+00FF and then proceed as normal. When decoding, treat any element >=U+0100 as a syntax error. That should be interoperable between all JSON implementations, and very easy to handle in both Unicode-aware and -unaware applications.

So, basically, pretend we have LC_ALL="[whatever].ISO-8859-1" at both ends, and then require userspace to clean up the mess if LC_ALL is actually set to a different value (which, on modern systems, is typically the case). The problem, of course, is that if you ever try to decode that JSON with a naive implementation, you will get mojibake since they will skip the "clean up the mess" step. So you still need non-naive implementations, which makes me wonder, why bother with JSON in the first place?

> If the application wants to display the path to a user, do a potentially-lossy UTF-8 decode in the UI layer, which is about the best you can ever do with Linux paths regardless of how they're encoded for transport.

Strictly, you should be consulting the locale information rather than just assuming UTF-8. UTF-8 is the most common encoding, but its use in pathnames is not required by any standard that I'm aware of. Now, you can't use something too weird such as UTF-16 (null bytes not allowed), but legacy 8-bit encodings are very much legal and valid on some older systems.

Rethinking fsinfo()

excors — Sun, 23 Aug 2020 18:05:50 +0000

That goes against the interoperability recommendations of the JSON RFC, which says (https://tools.ietf.org/html/rfc8259#section-8.2):

> the ABNF in this specification allows member names and string values to contain bit sequences that cannot encode Unicode characters; for example, "\uDEAD" (a single unpaired UTF-16 surrogate). Instances of this have been observed, for example, when a library truncates a UTF-16 string without checking whether the truncation split a surrogate pair. The behavior of software that receives JSON texts containing such values is unpredictable; for example, implementations might return different values for the length of a string value or even suffer fatal runtime exceptions.

so it seems a bad idea to rely on unpaired surrogates (like surrogateescape) if you're choosing JSON specifically for its interoperability.

Surely the simplest way to encode Linux's 8-bit paths in JSON is to map the bytes 0x00..0xFF onto U+0000..U+00FF and then proceed as normal. When decoding, treat any element >=U+0100 as a syntax error. That should be interoperable between all JSON implementations, and very easy to handle in both Unicode-aware and -unaware applications.

If the application wants to display the path to a user, do a potentially-lossy UTF-8 decode in the UI layer, which is about the best you can ever do with Linux paths regardless of how they're encoded for transport. For all non-display-related processing of paths (which I think is more common and more important than displaying paths), keep them in the simple lossless U+0000..U+00FF representation.

(Windows' 16-bit paths are more complicated, if you want to handle them pedantically correctly: they can contain unpaired surrogates so you can't simply interpret them as JSON-compatible Unicode strings. In that case it's probably safer to treat them as binary data and encode with base64.)

Rethinking fsinfo()

abo — Sun, 23 Aug 2020 15:22:37 +0000

Surrogate escapes can be used to encode arbitrary bytes in JSON:

(python)

>>> b = bytes(range(256))
>>> u = b.decode("UTF-8", errors="surrogateescape")
>>> import json
>>> j = json.dumps(u)
>>> uin = json.loads(j)
>>> bin = uin.encode("UTF-8", errors="surrogateescape")
>>> [n for n in bin]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255]
>>> bin == b
True

The JSON looks like this:

"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f\udc80\udc81\udc82\udc83\udc84\udc85\udc86\udc87\udc88\udc89\udc8a\udc8b\udc8c\udc8d\udc8e\udc8f\udc90\udc91\udc92\udc93\udc94\udc95\udc96\udc97\udc98\udc99\udc9a\udc9b\udc9c\udc9d\udc9e\udc9f\udca0\udca1\udca2\udca3\udca4\udca5\udca6\udca7\udca8\udca9\udcaa\udcab\udcac\udcad\udcae\udcaf\udcb0\udcb1\udcb2\udcb3\udcb4\udcb5\udcb6\udcb7\udcb8\udcb9\udcba\udcbb\udcbc\udcbd\udcbe\udcbf\udcc0\udcc1\udcc2\udcc3\udcc4\udcc5\udcc6\udcc7\udcc8\udcc9\udcca\udccb\udccc\udccd\udcce\udccf\udcd0\udcd1\udcd2\udcd3\udcd4\udcd5\udcd6\udcd7\udcd8\udcd9\udcda\udcdb\udcdc\udcdd\udcde\udcdf\udce0\udce1\udce2\udce3\udce4\udce5\udce6\udce7\udce8\udce9\udcea\udceb\udcec\udced\udcee\udcef\udcf0\udcf1\udcf2\udcf3\udcf4\udcf5\udcf6\udcf7\udcf8\udcf9\udcfa\udcfb\udcfc\udcfd\udcfe\udcff"

Rethinking fsinfo()

NYKevin — Sat, 22 Aug 2020 18:50:39 +0000

Realistically, if you want to pass raw bytes through a text-formatted thing, you should be using Base64 or something similar. This of course means that paths become unreadable to humans without decoding, but you could have a flag indicating whether a path has been escaped, and then only escape things that aren't valid UTF-8. Alternatively, you could encode the "bad bytes" with \\x00 through \\xFF, which is valid in a JSON string (the backslash is escaped, so it's "just" a backslash followed by three letters), but could be confused with a real filename (so you would need to invent further escaping for that case, as described in https://xkcd.com/1638/).

Or, if you think letting people create files with ridiculous names was a bad idea to begin with, you could simply declare non-UTF-8 paths unsupported and spit out invalid JSON if the user tries to create one. Much userspace software already does something like that anyway (see for example Python 3's surrogateescape hack). But then a lot of parsers will work just fine the vast majority of the time, and break on an obscure condition that the average engineer may not even realize is possible. So that's probably not ideal...

Rethinking fsinfo()

zyga — Sat, 22 Aug 2020 16:36:26 +0000

Well, is that a reason for all the ad-hoc formats? We could pass paths as byte arrays. In reality, most software will have issues with non-UTF8-friendly things anyway, because they may want to display it. It's nice that open(2) does not complain but it's pretty rubbish if no application can ever display that thing without "here are some bytes".

Rethinking fsinfo()

dezgeg — Sat, 22 Aug 2020 15:31:25 +0000

Reusing the tracepoint infrastructure where every event can be either read as a binary struct (with the struct layout described by some sysfs file) or as a formatted string would be a nice solution, I think.

Rethinking fsinfo()

dezgeg — Sat, 22 Aug 2020 15:26:58 +0000

Sadly JSON (and many other text formats these days) assume Unicode strings though, while for example mountinfo may contain non-Unicode data like pathnames.

Rethinking fsinfo()

vadim — Sat, 22 Aug 2020 15:03:22 +0000

I concur with the desire for JSON or something similar.

The worst thing about the text formats is that they are brittle. They're prone to failure when some unexpected character sneaks in, to ad-hoc implementations that are based on a programmer looking at 'cat /proc/whatever' and writing whatever comes to mind ("oh, so this is a list with one entry per line, with elements separated by spaces") and to being inflexible for expansion (can you add anything to /proc/partitions and not break a lot of stuff?)

The second worst is that different philosophies are being followed.

/proc/swaps might as well be the output of a command like 'df'

/proc/meminfo is half-written for human consumption, with amounts in kB and aligned columns, and half written for machine consumption with obscure labels like "Committed_AS".

/proc/mounts is very machine oriented.

Some files explicitly tell you the units (and it can be unclear whether another unit could ever be used). Some files have column names. Some like /proc/ioports have meaningful identation.

Many are uncomfortable for machine parsing. Eg, in /proc/cpuinfo you get "address sizes : 39 bits physical, 48 bits virtual". -- you have to parse that comma, and it's unclear whether a third thing could ever be there. Many files are lacking in unique identifiers, eg, my /proc/ioports has 3 levels of "0000-0000 : pnp 00:00", one under another.

I'm amazed that the computing industry is pretty old by now, we have plentiful RAM, storage and CPU power, and the issues of escaping data, representing arrays and trees, and allowing for adding extra info without breaking existing software are still with us, despite there being things like JSON that solve the vast majority of them.

Rethinking fsinfo()

ibukanov — Sat, 22 Aug 2020 12:57:32 +0000

This still adds friction as suddenly one needs to account for generated headers in the build system and add extra dependency. I just wish the Linux kernel has used more that old trick of using a struct for sys call arguments and passing the struct size as a version information. Then new fields can be added as necessary and the code will be both forward and backward compatible.

Rethinking fsinfo()

chris_se — Fri, 21 Aug 2020 22:59:12 +0000

A couple of years or so ago I had to write some portable code (Windows, macOS, Linux) to figure out the number of physical CPU cores in the system (in order to provide the user with a sane default for the number of processing threads for a workload that doesn't jive well with hyperthreading/SMT). This was near-trivial on macOS/Windows, but a plain awful experience on Linux.

macOS: 1 direct system call, 5 LOC with error handling
Windows: 2 low-level API calls (that probably translate directly into system calls? not sure) + some struct processing in a loop, 12 LOC with error handling

Linux: have to parse /sys/devices/system/cpu/cpuX/topology/thread_siblings_list, X starting at 0, as well as checking /sys/devices/system/cpu/cpuX/online to see whether the core was actually active. I wrote my first implementation on an Intel system, where the siblings list of a 2-core system with hyperthreading would be {"0,2", "1,3", "0,2", "1,3"}, because Intel puts all of their hyperthreaded cores _after_ all of the physical cores. Tried that a year later on an AMD system with SMT, and because AMD groups their cores differently, the contents are {"0-1", "0-1", "2-3", "2-3"}. My code couldn't interpret that because it didn't consider the fact that the cores could be specified as ranges. So I had to go back and change it. Sure, if you read the documentation _really_ carefully there is an indication how to parse the format, so yeah, that was my fault. But in the end, my (now correct) implementation is > 20 LOC just for parsing the thread_siblings_list file, not including the logic to actually obtain the result that I actually want. A simple system call to obtain information about the CPUs in the current system as some kind of struct would have made my life a _lot_ easier in that case.

I have to do a lot of low-level OS programming on various OS on a daily basis -- and in general I like the Linux APIs a lot better than the APIs of most other operating systems -- but as soon as I have to parse or generate ASCII, I start to hate it with a vengeance. (Thankfully this is not the case for most things I have to do.) I consider the proposed fsinfo() system call interface to be VASTLY superior to any ASCII-based interface. The underlying user-space code would be a LOT less error-prone for me. For me as a user-space developer, I could see myself using the proposed fsinfo() call (with the structs) in the future to obtain some information about the filesystem, but unless I desperately needed a piece of information for some reason I would completely avoid this system call if it required me to parse an ASCII string. (Though, to be fair, I'm definitely not one of the people who is the main target of this functionality at all; for my software this currently falls more into the "nice to have" category.)

Rethinking fsinfo()

Cyberax — Fri, 21 Aug 2020 18:09:40 +0000

I often wish the kernel would use something like protobufs for syscalls. It would save a lot of issues with marshalling/unmarshalling.

Rethinking fsinfo()

zyga — Fri, 21 Aug 2020 17:55:23 +0000

Unrelated to fsinfo directly, but related to parsing ascii. The problem is not the parsing of one-off specific file. It's that there's no consistency anywhere in the kernel, to represent structured data. Having written and maintained a parser for /proc/PID/mountinfo for several years I found:

- A few bugs, over time, in a rather small and well tested code.
- Edge case that affects extremely common and battle-tested implementations (mount/systemd)
- Had to venture into the belly of the kernel to understand the precise implementation, and escaping rules

Perhaps it would be worth recognizing that the format is neither simple nor common. Space escaping rules differ from file to file. There is no libkernel, with reference parsers, everyone rolls their own.

Could the kernel, just maybe, adopt something (whatever) standard and simple. So that parsing the next "simple" text file, is done from a library function available in all the modern programming languages? Could we just use JSON or something of the kind?

If the counter argument is that parsing JSON is hard, I will only say that there are a few high-quality implementations, including the one that everyone reading this comment is using at the time. I don't think we need to invent a new format for fsinfo.