|
|
Subscribe / Log in / New account

Filesystems and case-insensitivity

Filesystems and case-insensitivity

Posted Nov 28, 2018 22:28 UTC (Wed) by perennialmind (guest, #45817)
In reply to: Filesystems and case-insensitivity by smurf
Parent article: Filesystems and case-insensitivity

Newline, tab, and bel codepoints are perfectly valid UTF-8 plain text, but I'd prefer to push that out to userspace as well. I don't much care whether curl -O gives me filenames with spaces or %20s, but I do object if I see files with newlines in the names. I don't mind if I'm left with sneaky left-to-right, right-to-left marks or explicitly red hearts. I see the need for parentheses and question marks...

... but not control characters. To me, a natural language filename would comprise user-perceived characters and the one true space space character (U+0020). Flexibility beyond that does more harm than good. Leave those footguns to the bytestring paths. 😉


to post comments

Filesystems and case-insensitivity

Posted Nov 29, 2018 13:41 UTC (Thu) by utoddl (guest, #1232) [Link] (3 responses)

I was with you until you got to spaces. It's only wishing of course, but I wish spaces in file names would go away. Personal peeve.

Filesystems and case-insensitivity

Posted Nov 30, 2018 9:04 UTC (Fri) by jezuch (subscriber, #52988) [Link] (2 responses)

Spaces in file names are wonderful. You can name your files like a human being, not a slave of poorly written shell scripts with broken quoting :) (I have a pet peeve too)

Filesystems and case-insensitivity

Posted Dec 3, 2018 12:30 UTC (Mon) by ale2018 (guest, #128727) [Link] (1 responses)

Ah, poorly written shell scripts, eh? Because you obviously think that being slave of over-complicated command lines is fine? A good percentage of my command lines start with find . -name whatever | xargs... Yes, I know I can write -print0 and -0, I do that when I write shell scripts.

When I find a filename with spaces I just move it away.

For the record, the normalization step and control characters were never taken care of. For example:

    ~$ touch aaabd $(printf 'aaabc\bd')  "$(printf 'aaabc\nd')"
    ~$ ls -lt | head -5
    total 3686968
    -rw-r--r--  1 ale ale                 0 Dec  3 13:21 aaabd
    -rw-r--r--  1 ale ale                 0 Dec  3 13:21 aaabc
    d
    -rw-r--r--  1 ale ale                 0 Dec  3 13:21 aaabd
Control characters where never forbidden. Consider that human beings are sometimes uncertain about the name they're typing and type a backspace (\b) in it. So, why isn't that beautiful too? Perhaps, users should have a clue. In the words of the Ancient Philosophy, rubbish in, rubbish out.

Filesystems and case-insensitivity

Posted Dec 3, 2018 20:24 UTC (Mon) by flussence (guest, #85566) [Link]

ls took care of that a few years ago…
~/test $ ls
'aaabc'$'\b''d'  'aaabc'$'\n''d'   aaabd
~/test $ ls --version
ls (GNU coreutils) 8.30
Packaged by Gentoo (8.30 (p01))

Filesystems and case-insensitivity

Posted Nov 30, 2018 9:09 UTC (Fri) by jezuch (subscriber, #52988) [Link] (3 responses)

I guess the concept of control characters should have been retired long time ago. I also think that it was a huge mistake to bring them to UTF-8 along with the rest of ASCII. But I'm pretty sure someone will explain to me that they are in fact critical and there are further control characters in the Unicode spec anyway :)

Filesystems and case-insensitivity

Posted Nov 30, 2018 16:25 UTC (Fri) by perennialmind (guest, #45817) [Link] (2 responses)

You mean end-of-string delimiters, end-of-line delimiters, tabs, and the codes needed for controlling a terminal such as escape and erase? Setting aside hurdles to adoption, one can imagine hoisting those into markup. Perhaps there's even a spec for plainer-than-plain-text for when such markup exists (i.e. HTML). If so, it might be perfect for filenames.

ASCII compatibility was the selling point for UTF-8. Beyond the above, even the oddballs are still in use. Take for example "group separator" which stands in for FNC1 in barcodes.

Somebody else will have to defend the C1 block though.

Filesystems and case-insensitivity

Posted Dec 1, 2018 11:24 UTC (Sat) by jezuch (subscriber, #52988) [Link] (1 responses)

I mean all the bytes below 0x20. This is not text, they have no place in a *character* encoding. Apart from that I'm totally fine with ASCII compatibility, even though it's typically American culturally insensitive invention ;)

Filesystems and case-insensitivity

Posted Dec 6, 2018 10:16 UTC (Thu) by Wol (subscriber, #4433) [Link]

I believe there are two control characters RS1 and RS2? Basically standing for "Repeat String"? Which were used on a system I worked on, and actually were a damn good fix for "how many characters does a tab stand for?". So most lines in my FORTRAN source code would have been physically stored on disk as "<RS1><6>code..."

And if you had a lot of spaces it saved a fair few bytes over tab-encoding, plus being completely unambiguous.

Cheers,
Wol

Filesystems and case-insensitivity

Posted Dec 4, 2018 8:13 UTC (Tue) by pr1268 (guest, #24648) [Link] (4 responses)

one true space space character (U+0020)

Um, there's more than one space: ' ' and ' '. One is \u0020 (good ol' ASCII 0x20) and the other is \u00a0.

I was personally burned by the second "space" above appearing in an Excel spreadsheet (to the exclusion of the "one true space character" you mentioned). >:-(

Filesystems and case-insensitivity

Posted Dec 4, 2018 10:24 UTC (Tue) by smurf (subscriber, #17840) [Link] (3 responses)

Don't worry – Unicode has a bunch more spaces, including zero-width ones.

On second thought: do worry.

Filesystems and case-insensitivity

Posted Dec 4, 2018 13:27 UTC (Tue) by hummassa (subscriber, #307) [Link] (2 responses)

The initial argument still holds: there is no reasonable rationale for those other space characters (including U+00a0) in file names.

Filesystems and case-insensitivity

Posted Dec 12, 2018 23:45 UTC (Wed) by pr1268 (guest, #24648) [Link] (1 responses)

there is no reasonable rationale for those other space characters (including U+00a0) in file names.

Agreed, but try telling that to those fools who auto-generated the spreadsheet with \u00a0 spaces. </angry rant>

Filesystems and case-insensitivity

Posted Dec 13, 2018 10:33 UTC (Thu) by james (subscriber, #1325) [Link]

Would it calm your anger to point out that LibreOffice can search using regexps?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds