|
|
Subscribe / Log in / New account

Filesystems and case-insensitivity

Filesystems and case-insensitivity

Posted Nov 29, 2018 12:21 UTC (Thu) by Sesse (subscriber, #53779)
Parent article: Filesystems and case-insensitivity

So, one thing is encoding, but what about collation? If you want correct Unicode case handling, you absolutely need to know which locale you're in. The common example: In English, i and I are the same letter with different case. In Turkish, they are empathically not (the lowercase of I is ı, the uppercase of i is İ, and ı and i are as different letters as v and w are in English).

The only way I know of to deal with these kinds of issues is to specify a collation when creating the filesystem. Windows does (and many other things) this based on installation language, which causes all kinds of funky issues on large installations where you could have multiple users with different languages.


to post comments

Filesystems and case-insensitivity

Posted Nov 29, 2018 14:23 UTC (Thu) by willy (subscriber, #9762) [Link] (2 responses)

Collation is handled in userspace. There's no guarantee what order getdents() will return filenames in.

Filesystems and case-insensitivity

Posted Nov 29, 2018 14:26 UTC (Thu) by Sesse (subscriber, #53779) [Link] (1 responses)

There are two parts to collation; ordering and equality. (If you have the former, you also have the latter.) I'm fine with ordering not being handled by the kernel, but equality needs to be. And if you want case-insensitivity, equality is locale-dependent.

Filesystems and case-insensitivity

Posted Nov 30, 2018 17:55 UTC (Fri) by k8to (guest, #15413) [Link]

Indeed, the existence of the file accessed by a given byte sequence will vary depending on locale. This is true for many situations, not just the rather clear turkish one.

This leads to a problem where a user or process in one locale should get different results from the kernel than another. This traditionally was viewed as a rathole and I've seen many situations where osx behaves in bizarre ways due to this sort of thing.

The proposal here seems to be to push the rules into the filesystem or directory, which effectively means having locale behavior independent of the user / process, which means we will get a fun matrix of file name locale vs user locale. I'm not a fan.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds