A report from the documentation maintainer
A report from the documentation maintainer
Posted Nov 2, 2016 21:28 UTC (Wed) by farnz (subscriber, #17727)In reply to: A report from the documentation maintainer by nybble41
Parent article: A report from the documentation maintainer
It's not too much to ask, but it's a hard problem. To take a couple of examples, using == for "insensitive comparison":
- Should ß == SS? In a German filename, the answer is "yes", because ß is just a different way to write ss; in an English filename, however, it's a symbol, not a letter.
- Should a == ä? In an English or Afrikaans filename, yes; the diacritic is a pronunciation guide only. In a Swedish filename, no, because ä is a different letter to a.
- Should i == I? In most Latin alphabet languages, yes; lowercase I is dotted, uppercase i is not. In Turkish, however, i's uppercase form is İ, not I.
Unicode makes a decent stab at a solution (section 5.18), but then explicitly calls out Lithuanian and Turkic languages as cases where the default algorithm will not include something that users expect it to include; further, the Unicode solution is based on the principle that it's better for the algorithm to match things it shouldn't, than it is for it to miss things it should match. Thus, an English user will be surprised that the glob S* matches ß, but that's better than a German user being surprised when s* does not match ß. Similarly, a Swede is going to be surprised when nä* matches nävi, specifically so that an English user isn't surprised when na* doesn't match nävi.
