A few problems

Posted Feb 20, 2004 17:35 UTC (Fri) by Ross (guest, #4065)
In reply to: Method for (mostly) kernel-independant Unicode filenames? by Max.Hyre
Parent article: The kernel and character set encodings

1) What filesystems support per-file character set selection? Which ones
can handle embedded NUL characters? What about maximum filename length
considerations -- you are no longer measuring in characters because the
number of bytes they use depends on the encoding.

2) There are a whole lot of system calls receiving or returning filenames
(the libc routines linke fopen() are a different layer): open(),
getdirentries(), readlink(), stat(), lstat(), rename(), unlink(), link(),
mknod(), chown(), chmod(), utime(), mount() etc. (not to mention Unix
domain sockets). These would all have to change. But POSIX defines them
as taking certain parameters and having certain return types. So you
either have to drop Unix compatability or you have to add duplicate
versions of each one much like Microsoft did when converting to UCS2.

3) What about Unix applications and old Linux apps? They won't even
compile if you change the prototypes. If you don't and make the old
system calls default to UTF8 or something you still have to make them work
with filenames in other encodings.

4) It won't fix the policy problem without involving the kernel anyway.
What about case insensitivity, canonicalizing characters, path delimeters
etc.? You removed the need for the terminating NUL but what about the "/"
character? What about character sets with no slash, or with multiple
slashes? The kernel will need to know what these are and that will depend
on the character set.