Wheeler: Fixing Unix/Linux/POSIX Filenames
Wheeler: Fixing Unix/Linux/POSIX Filenames
Posted Mar 25, 2009 18:36 UTC (Wed) by njs (subscriber, #40338)Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames
The section on Unicode-in-the-filesystem seemed quite incomplete. We know this can work, since the most widely used Unix *already* does it. OS X basically extends POSIX to say "all those char * pathnames you give me, those are UTF-8". However, there are a lot of complexities not mentioned here -- you need to worry about Unicode normalization (whether or not to allow different files to have names containing the same characters but with different bytestring representations), if there is any normalization then you need a new API to say "hey filesystem, what did you actually call that file I just opened?" (OS X has this, but it's very well hidden), and so on.
But these problems all exist now, they're just overshadowed by the terrible awful even worse problems caused by filenames all being in random unguessable charsets. I really dislike many things about Apple, but in this case we could do worse than to sit down and steal (with appropriate modification) most of the stuff in http://developer.apple.com/technotes/tn/tn1150.html#Unico...
Maybe the ext4 folks could add unicode filenames as a mount option -- they haven't done anything controversial lately ;-).
