|
|
Log in / Subscribe / Register

Case-insensitive filesystem lookups

Case-insensitive filesystem lookups

Posted May 24, 2018 10:34 UTC (Thu) by epa (subscriber, #39769)
In reply to: Case-insensitive filesystem lookups by Sesse
Parent article: Case-insensitive filesystem lookups

What I mean is, the more you get into these distinctions, the further away you move from what makes case sensitivity useful to start with. I appreciate the convenience of having a file called Sandia.txt on disk and being able to load it by the name sandia.txt, so I can save the effort of pressing the shift key or remembering exactly what the capitalization was. I would appreciate less getting a file not found error because it was called Sandía.txt and I forgot to include the accent on the letter i. But then, in Turkish the distinction between i and ı is probably a lot stronger than the difference of an accent in Spanish.

All in all it's a knottier problem than it appears (https://bugzilla.mozilla.org/show_bug.cgi?id=202251 has been going on for 15 years) and I sympathize with the view that these things should be handled in the user interface, not the filesystem. If you have to put locale code in the filesystem itself you've surely taken a wrong turning.


to post comments

Case-insensitive filesystem lookups

Posted May 24, 2018 17:45 UTC (Thu) by excors (subscriber, #95769) [Link] (1 responses)

> I sympathize with the view that these things should be handled in the user interface, not the filesystem. If you have to put locale code in the filesystem itself you've surely taken a wrong turning.

In many cases, I think filenames really are a UI concept that is being used directly as a core part of the filesystem (the disk format plus the associated APIs and protocols like SMB), which feels like a serious layering violation. When a user saves a document, they give it a human-readable name so they can find it later in a list of all their saved documents. They don't care if it's stored with that name as its filename, or if it's stored as "cff5f247-64bd-4066-ab2f-66ff8aed2322.doc" and the name is in some metadata, or if it's stored in a special database and not as a separate file at all - the UI could be the same for all of those. But since we choose to implement it with human-readable filenames, the UI is complicated by filesystem restrictions (why can't the user put "/" in a document name?), and the filesystems(/APIs/protocols/etc) are complicated by UI issues (Unicode, case sensitivity, locale dependence, etc). It seems particularly bad given that Unicode changes over time, and locales differ between users, while filesystems are persistent and shared - there's a fundamental mismatch there.

Surely there must be a better way to design the system, if legacy compatibility didn't matter, where the implementation details of storing and referencing files are more cleanly separated from the UI concept of naming files? (Though of course legacy compatibility does matter more than almost anything else, so this is hypothetical and probably pointless.)

(There are other cases where filenames aren't UI, they're well-known identifiers like "/etc/passwd" or "c:\autoexec.bat" - the name is needed as a portable way for programs to refer to a particular file. But they have very different requirements to user-chosen document names, e.g. ASCII is probably fine, and it's not obvious that the same solution should be used for them.)

Case-insensitive filesystem lookups

Posted May 25, 2018 19:03 UTC (Fri) by drag (guest, #31333) [Link]

> Surely there must be a better way to design the system, if legacy compatibility didn't matter, where the implementation details of storing and referencing files are more cleanly separated from the UI concept of naming files?

Since Unix files can be arbitrary strings then just use a hash of the file to store it in the file system. Then you manage names on the application layer by providing a handy dandy API for everybody to use.

Because just imagine that instead of one locale you have to make insensitivity work for ALL locales. A lot of Linux file systems house data that is globally sourced using languages and names from dozens, if not hundreds, of different languages.

Good luck making that work on a file-system layer.

I mean: what are you going to do?

To have any remote chance of making it work in a case sensitive manner is by having the locale of each file embedded right there in the file system's metadata so it can be correctly managed in the way it was intended. And then what are you going to do when you have a English user from North America edit a file somebody made from Greece? Change the locale? Make the insensitivity work differently or now force the English user to understand the character set used by the other person from Greece? How are you going to deal with file names that don't conflict in the original locale, but do after somebody edits it?

So the choice is really:

1. Have a case sensitive file system that always works under all circumstances that is simple, robust, and fast.

2. Have a case insensitive system with massive amounts of extra code and logic that will never actually have a chance of working.

YES; having a sensitive file system is a bad UI. But it's impossible to make it actually work otherwise.

Therefore: If you are looking for a very good user interface exposing a Unix-style file system to the user is not a good solution. You have to do something else.

Case-insensitive filesystem lookups

Posted May 26, 2018 15:48 UTC (Sat) by eru (subscriber, #2753) [Link] (1 responses)

But then, in Turkish the distinction between i and ı is probably a lot stronger than the difference of an accent in Spanish.

I don't know how it is in Turkish, but in my native Finnish you cannot be careless with dieresis on top of "a" or "o". Dropping it can change a word into a different word. For example, "sää" and "saa" are both valid Finnish words with entirely different meanings. Of course, humans usually can figure out words with omitted dots from context, the same way one can mentally correct other kinds of mis-spellings.

By the way, someone jokingly mentioned making "v and "w" equivalent. Actually the normal alphabetical ordering rules of Finnish specify precisely that. However, Linux "ls", "bash" and so on under the Finnish locale does not obey this particular rule. Probably a good thing, I won't be filing a bug report...

Case-insensitive filesystem lookups

Posted May 27, 2018 9:39 UTC (Sun) by epa (subscriber, #39769) [Link]

Even in English the difference of upper and lower case can change one word to another, from polish to Polish.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds