|
|
Log in / Subscribe / Register

Filesystems and case-insensitivity

Filesystems and case-insensitivity

Posted Nov 30, 2018 9:09 UTC (Fri) by jezuch (subscriber, #52988)
In reply to: Filesystems and case-insensitivity by perennialmind
Parent article: Filesystems and case-insensitivity

I guess the concept of control characters should have been retired long time ago. I also think that it was a huge mistake to bring them to UTF-8 along with the rest of ASCII. But I'm pretty sure someone will explain to me that they are in fact critical and there are further control characters in the Unicode spec anyway :)


to post comments

Filesystems and case-insensitivity

Posted Nov 30, 2018 16:25 UTC (Fri) by perennialmind (guest, #45817) [Link] (2 responses)

You mean end-of-string delimiters, end-of-line delimiters, tabs, and the codes needed for controlling a terminal such as escape and erase? Setting aside hurdles to adoption, one can imagine hoisting those into markup. Perhaps there's even a spec for plainer-than-plain-text for when such markup exists (i.e. HTML). If so, it might be perfect for filenames.

ASCII compatibility was the selling point for UTF-8. Beyond the above, even the oddballs are still in use. Take for example "group separator" which stands in for FNC1 in barcodes.

Somebody else will have to defend the C1 block though.

Filesystems and case-insensitivity

Posted Dec 1, 2018 11:24 UTC (Sat) by jezuch (subscriber, #52988) [Link] (1 responses)

I mean all the bytes below 0x20. This is not text, they have no place in a *character* encoding. Apart from that I'm totally fine with ASCII compatibility, even though it's typically American culturally insensitive invention ;)

Filesystems and case-insensitivity

Posted Dec 6, 2018 10:16 UTC (Thu) by Wol (subscriber, #4433) [Link]

I believe there are two control characters RS1 and RS2? Basically standing for "Repeat String"? Which were used on a system I worked on, and actually were a damn good fix for "how many characters does a tab stand for?". So most lines in my FORTRAN source code would have been physically stored on disk as "<RS1><6>code..."

And if you had a lot of spaces it saved a fair few bytes over tab-encoding, plus being completely unambiguous.

Cheers,
Wol


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds