|
|
Subscribe / Log in / New account

Rethinking fsinfo()

Rethinking fsinfo()

Posted Aug 23, 2020 19:08 UTC (Sun) by NYKevin (subscriber, #129325)
In reply to: Rethinking fsinfo() by excors
Parent article: Rethinking fsinfo()

> Surely the simplest way to encode Linux's 8-bit paths in JSON is to map the bytes 0x00..0xFF onto U+0000..U+00FF and then proceed as normal. When decoding, treat any element >=U+0100 as a syntax error. That should be interoperable between all JSON implementations, and very easy to handle in both Unicode-aware and -unaware applications.

So, basically, pretend we have LC_ALL="[whatever].ISO-8859-1" at both ends, and then require userspace to clean up the mess if LC_ALL is actually set to a different value (which, on modern systems, is typically the case). The problem, of course, is that if you ever try to decode that JSON with a naive implementation, you will get mojibake since they will skip the "clean up the mess" step. So you still need non-naive implementations, which makes me wonder, why bother with JSON in the first place?

> If the application wants to display the path to a user, do a potentially-lossy UTF-8 decode in the UI layer, which is about the best you can ever do with Linux paths regardless of how they're encoded for transport.

Strictly, you should be consulting the locale information rather than just assuming UTF-8. UTF-8 is the most common encoding, but its use in pathnames is not required by any standard that I'm aware of. Now, you can't use something too weird such as UTF-16 (null bytes not allowed), but legacy 8-bit encodings are very much legal and valid on some older systems.


to post comments

Rethinking fsinfo()

Posted Aug 23, 2020 22:44 UTC (Sun) by shemminger (subscriber, #5739) [Link]

Text interfaces to userspace are brittle (easily broken) and suck. If you look at some of the interface in /proc/net there are columns filled with zeros because some field existed in 2.2 and can never change.

Message based interfaces like netlink are more slightly more difficult to program but offer opportunity for expansion.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds