The kernel and character set encodings

Posted Feb 19, 2004 11:18 UTC (Thu) by ibukanov (subscriber, #3942)
In reply to: The kernel and character set encodings by Cato
Parent article: The kernel and character set encodings

> These strings result in exactly the same visual appearance on screen, yet they can't be compared with a byte comparison.

You do not need even Unicode normalization for that. In most fonts the following two lines would have exactly the same visual presentation (you have to view the page with UTF-8 encoding as LWN does not allow to enter РОТ in HTML comments due to bugs in recognition of &code; escapes):
POT
Ð ÐÐ¢
yet the first uses pure ASCII and the second uses only Cyrillic characters and means mouth in Russian.

IMHO such examples supports the notion that kernel should not impose any policy on file names encoding as in practice there are always more then one way to encode the same visual presentation and UTF-8 with Unicode does not help here.