|
|
Log in / Subscribe / Register

Case-insensitive filesystem lookups

Case-insensitive filesystem lookups

Posted May 24, 2018 15:09 UTC (Thu) by hkario (guest, #94864)
Parent article: Case-insensitive filesystem lookups

not only there is problem of different locales that consider different codepoints to be the same letters (already mentioned Turkish is common example)
but filenames are quite explicitly NOT unicode, they don't even have to be printable characters, in any encoding!

how do you handle case comparison for a file that has a name that is broken UTF-8?

what if the files were written on system that used ISO-8859-2 codepage? Let alone one of the CJK systems?


to post comments

Case-insensitive filesystem lookups

Posted May 25, 2018 9:16 UTC (Fri) by MarcB (subscriber, #101804) [Link]

Exactly. Before you can start with case-insensitive lookups, you first need a definition of what case even means. And currently, Linux filesystems do *not* have this. Those semantics only exist on the application level - and nothing on a stock Linux system is forcing applications to be consistent about those semantics.

I haven't found any detailed information of Android's "wrapfs", but I assume it does more than just providing case-insensitive lookups. Likely it also enforces an encoding and perhaps even a Unicode normalization.

If case-insensitive lookups are added, what is preventing a bad or careless actor from creating "duplicate" files, perhaps "superseding" the original file? (But this is already possible in Linux today, as no Unicode normalization is enforced, i.e. you can have files using different UTF-8 encodings of the same decoded string in the same directory).

Either you rely on some sanitization layer on top of the filesystem through which every access must pass, or you add new, stricter versions of all syscalls dealing with filenames.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds