Wheeler: Fixing Unix/Linux/POSIX Filenames
Posted Apr 2, 2009 15:54 UTC (Thu) by
forthy (guest, #1525)
In reply to:
Wheeler: Fixing Unix/Linux/POSIX Filenames by nix
Parent article:
Wheeler: Fixing Unix/Linux/POSIX Filenames
It is actually not that bad. As collating sequence, ß=ss (i.e. Mass
and Maß sort to the same bin). Except for Austrian telephone books, where
ß follows ss, but comes before st (though St. follows Sankt ;-).
However, there's a huge mess in the CJK part of UCS: short and long
forms of the same character (sometimes even a special variant for the
Japanese character). This should never have happend, the different forms
of the same character should be encoded in fonts, not in UCS. So far, not
even Mac OS X normalizes these characters, but it is obvious that a
mainland China file called "中国" and a Taiwan file called "中國" not only
mean the same, but they also refer to the same word, and can be
interchanged at will (see for example the Chinese wikipedia entry: the
lemma is the short form, the headline is the long form). And it is not
easy to access long and short forms with usual input methods (mainland
China: Pinyin, Canton: Cantonese Pinyin (gives traditional characters,
bug you need to know Cantonese), etc.).
(
Log in to post comments)