The kernel and character set encodings
Posted Feb 20, 2004 22:19 UTC (Fri) by spitzak
In reply to: The kernel and character set encodings
Parent article: The kernel and character set encodings
There is no problem with UTF-8 filenames. The bytes should be stored
unchanged, and unchanged bytes should be used to look up the file. It
does not matter if those bytes are a legal UTF-8 string or not, to say
nothing of what normalization form they are.
Unfortunately there are hordes of people out there who think dumb ideas
like case-insensitivity should be applied at low levels to stuff that
really is binary data. This kind of thinking is what causes complexity,
and complexity causes bugs and security holes.
Any program that takes a string it thinks is UTF-8 and does
<i>ANYTHING</i> other than pass the exact bytes unchanged to another
interface that wants UTF-8 is by definition broken. This simple rule will
completely eliminate all ambiguity about UTF-8.
to post comments)