Filesystems and case-insensitivity
Filesystems and case-insensitivity
Posted Nov 28, 2018 22:28 UTC (Wed) by perennialmind (guest, #45817)In reply to: Filesystems and case-insensitivity by smurf
Parent article: Filesystems and case-insensitivity
Newline, tab, and bel codepoints are perfectly valid UTF-8 plain text, but I'd prefer to push that out to userspace as well. I don't much care whether curl -O gives me filenames with spaces or %20s, but I do object if I see files with newlines in the names. I don't mind if I'm left with sneaky left-to-right, right-to-left marks or explicitly red hearts. I see the need for parentheses and question marks...
... but not control characters. To me, a natural language filename would comprise user-perceived characters and the one true space space character (U+0020). Flexibility beyond that does more harm than good. Leave those footguns to the bytestring paths. 😉
Posted Nov 29, 2018 13:41 UTC (Thu)
by utoddl (guest, #1232)
[Link] (3 responses)
Posted Nov 30, 2018 9:04 UTC (Fri)
by jezuch (subscriber, #52988)
[Link] (2 responses)
Posted Dec 3, 2018 12:30 UTC (Mon)
by ale2018 (guest, #128727)
[Link] (1 responses)
Ah, poorly written shell scripts, eh? Because you obviously think that being slave of over-complicated command lines is fine? A good percentage of my command lines start with When I find a filename with spaces I just move it away.
For the record, the normalization step and control characters were never taken care of. For example:
Posted Dec 3, 2018 20:24 UTC (Mon)
by flussence (guest, #85566)
[Link]
Posted Nov 30, 2018 9:09 UTC (Fri)
by jezuch (subscriber, #52988)
[Link] (3 responses)
Posted Nov 30, 2018 16:25 UTC (Fri)
by perennialmind (guest, #45817)
[Link] (2 responses)
You mean end-of-string delimiters, end-of-line delimiters, tabs, and the codes needed for controlling a terminal such as escape and erase? Setting aside hurdles to adoption, one can imagine hoisting those into markup. Perhaps there's even a spec for plainer-than-plain-text for when such markup exists (i.e. HTML). If so, it might be perfect for filenames.
ASCII compatibility was the selling point for UTF-8. Beyond the above, even the oddballs are still in use. Take for example "group separator" which stands in for FNC1 in barcodes.
Somebody else will have to defend the C1 block though.
Posted Dec 1, 2018 11:24 UTC (Sat)
by jezuch (subscriber, #52988)
[Link] (1 responses)
Posted Dec 6, 2018 10:16 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
And if you had a lot of spaces it saved a fair few bytes over tab-encoding, plus being completely unambiguous.
Cheers,
Posted Dec 4, 2018 8:13 UTC (Tue)
by pr1268 (guest, #24648)
[Link] (4 responses)
Um, there's more than one space: ' ' and 'Â '. One is \u0020 (good ol' ASCII 0x20) and the other is \u00a0. I was personally burned by the second "space" above appearing in an Excel spreadsheet (to the exclusion of the "one true space character" you mentioned). >:-(
Posted Dec 4, 2018 10:24 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (3 responses)
On second thought: do worry.
Posted Dec 4, 2018 13:27 UTC (Tue)
by hummassa (subscriber, #307)
[Link] (2 responses)
Posted Dec 12, 2018 23:45 UTC (Wed)
by pr1268 (guest, #24648)
[Link] (1 responses)
Agreed, but try telling that to those fools who auto-generated the spreadsheet with \u00a0 spaces. </angry rant>
Posted Dec 13, 2018 10:33 UTC (Thu)
by james (subscriber, #1325)
[Link]
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
find . -name whatever | xargs... Yes, I know I can write -print0 and -0, I do that when I write shell scripts.
~$ touch aaabd $(printf 'aaabc\bd') "$(printf 'aaabc\nd')"
~$ ls -lt | head -5
total 3686968
-rw-r--r-- 1 ale ale 0 Dec 3 13:21 aaabd
-rw-r--r-- 1 ale ale 0 Dec 3 13:21 aaabc
d
-rw-r--r-- 1 ale ale 0 Dec 3 13:21 aaabd
Control characters where never forbidden. Consider that human beings are sometimes uncertain about the name they're typing and type a backspace (\b) in it. So, why isn't that beautiful too? Perhaps, users should have a clue. In the words of the Ancient Philosophy, rubbish in, rubbish out.
ls took care of that a few years ago…
Filesystems and case-insensitivity
~/test $ ls
'aaabc'$'\b''d' 'aaabc'$'\n''d' aaabd
~/test $ ls --version
ls (GNU coreutils) 8.30
Packaged by Gentoo (8.30 (p01))
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Wol
Filesystems and case-insensitivity
one true space space character (U+0020)
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
there is no reasonable rationale for those other space characters (including U+00a0) in file names.
Would it calm your anger to point out that LibreOffice can search using regexps?
Filesystems and case-insensitivity
