Of bytes and encoded strings
Posted Jan 27, 2014 17:16 UTC (Mon) by njwhite (guest, #51848)
Posted Jan 31, 2014 12:06 UTC (Fri) by tialaramex (subscriber, #21167)
The Linux kernel does not care whether you have files with names that are canonically equivalent in Unicode, so long as the name's byte sequence is distinct. This is convenient for programmers, but it means that you, the user, may be faced with a situation in which you know precisely the name of the file, but you cannot (without trial and error) specify that name to Linux because you need to guess which of potentially many possible "spellings" in Unicode were selected. Suppose the file is named café - was that last character U+00E9 or U+0065 U+0301? Unicode says they're equivalent, but Linux just treats all the names as a series of bytes without comprehending and so calling open() with the wrong "spelling" will fail, whereas on OS X it would always succeed.
In an OS that's case insensitive the Apple choice here makes sense, you're already carrying around huge case conversion tables so why not do full normalisation while you're at it? So what if your filesystem is now a tiny bit slower, the vast majority of your customers will never notice. But it does mean that new precomposed characters in Unicode are a big problem, because either you can never support the new Unicode release, or the set of filenames permitted to exist on disk changes from one release to another, which is a nightmare.
Posted Feb 2, 2014 23:09 UTC (Sun) by sdalley (subscriber, #18550)
I was puzzling for ages why the same file name of an attachment that I'd copy-pasted from the mail window didn't overwrite the same name I'd typed into the "Save File" dialog. The copy-paste copied a "real" hyphen (breaking or nonbreaking I can't recall), the typing produced the oldfashioned ersatz ASCII hyphen. They differed in appearance by about one pixel.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds