|
|
Log in / Subscribe / Register

Filesystems and case-insensitivity

Filesystems and case-insensitivity

Posted Dec 2, 2018 16:42 UTC (Sun) by epa (subscriber, #39769)
In reply to: Filesystems and case-insensitivity by gioele
Parent article: Filesystems and case-insensitivity

Is there any reason not to treat i, İ, I, and ı the same for case-folding purposes on the file system?

I am not asking whether they are the same in all uses. I know that in Turkish i and ı are different letters. What I'm suggesting is that for making a case-insensitive filesystem lookup -- where you have already waved goodbye to a strict 1-1 mapping between byte sequences and directory entries -- it surely doesn't matter that much to gloss over the distinction and treat all these four characters the same. Similarly I would consider it a feature, not a bug, if accented characters could be preserved in filenames, but ignored when matching. There are pairs of words in German that differ only in accent, but it's very unlikely an accent would be the only difference between two human-written document names.

Now, you may with some justice argue that loose matching like this belongs in user space, not the kernel. But in the end it's not my preferences or anyone else's that matter. What matters is to efficiently implement the existing (de facto or de jure) standards. What behaviour is Samba required to support with the Turkish uppercase and lowercase letters? The kernel should provide the semantics that Samba needs so it doesn't have to laboriously scan the whole directory to match a filename.


to post comments

Filesystems and case-insensitivity

Posted Dec 2, 2018 17:19 UTC (Sun) by gioele (subscriber, #61675) [Link]

> Is there any reason not to treat i, İ, I, and ı the same for case-folding purposes on the file system?

Sure they could. But doing it is hard (and computationally expensive).

This is what I meant with

> What the developers could do is a kind of case-insensitive look-up that also clusters together "similar" letters. Defining which characters are similar opens, however, another can of worms (see `confusables.txt` from Unicode or all the discussions around IDNA and its Nameprep algorithm).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds