Working with UTF-8 in the kernel
Working with UTF-8 in the kernel
Posted Mar 29, 2019 10:05 UTC (Fri) by smurf (subscriber, #17840)In reply to: Working with UTF-8 in the kernel by ikm
Parent article: Working with UTF-8 in the kernel
There's also the problem of composites. Unicode, in its infinite wisdom(*), has multiple ways to store the same character (an 'ä' is either a single latin1 character, or an 'a' followed by a combining diaeresis – any sane designer would have stored the modifiers first, but I digress). You need to agree on one form with which to represent file names because the user typically can't easily generate the other, and even copy+paste tends to get mangled.
There's another problem here. Correct case folding is locale dependent. One example: Turkish has an i and an ı (i without the dot). Unicode helpfully has an İ (capital I with a dot) right next to it. Guess what happens when you case-fold these in Turkey vs. everywhere else.
