The kernel and character set encodings
Posted Feb 19, 2004 9:54 UTC (Thu) by
one2team (guest, #7316)
In reply to:
The kernel and character set encodings by Cato
Parent article:
The kernel and character set encodings
« You say that the only practical choices for character encodings are ISO-8859-1 and UTF-8. In fact, there is a vast range of encodings that will work (basically any encoding that doesn't use NUL and '/' for some other purpose than ASCII semantics). For a start there is ISO-8859-*, KOI8-* (for Cyrillic), EUC-JP, Shift-JIS (both popular in Japan), and so on. »
These encodings are mostly useless in a true multi-user system. Why ? Because they are all incompatible. So there is no way for a user that uses encoding A to read stuff (including filenames) made by another user using encoding B. And this is true even for close stuff (KOI8-U and KOI8-R for example). Not to speak of the poor users that may want to quote another langage (French + Russian, Welsh + Greek etc).
The only thing all those encodings are compatible with is english, which restricts second language to english and english only.
One could argue userspace would have just to use Greek encoding for Greek filenames, Russian for Russian ones and so on. But the crux of the problem is userspace have no way to request or guess what encoding was used to write a filename, since the kernel does not enforce any particular encoding nor provides encoding info to userspace.
One additionnal problem is some byte strings can result in invalid UTF-8 and cause applications to barf if they try to decode them.
(
Log in to post comments)