Wheeler: Fixing Unix/Linux/POSIX Filenames
Posted Mar 28, 2009 16:41 UTC (Sat) by zooko
In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames
"The filenames you get from NT will be sequences of 16-bit code units. They might be Unicode. The filenames you get from Linux will be sequences of 8-bit code units. They might be Unicode (in this case UTF-8) too."
I don't think this is true. The bytes in the filenames in NT are defined to be UTF-16 encodings of characters. The bytes in the filenames in Mac are defined to be UTF-8 encodings. The bytes in the filenames in Linux are not defined to be any particular encoding. It isn't just a definitional issue -- the result is that reading a filename from the Windows or Mac filesystem can't cause you to lose information -- the filename you get in your application is guaranteed to be the same as the filename that is stored in the filesystem. On the other hand, when you read a filename from the filesystem in Linux, then you need to decide how to attempt to decode it, and there is no way to guarantee that your attempt won't corrupt the filename.
Please correct me if I'm wrong, because I'm intending to make the Tahoe p2p disk-sharing app depend on this guarantee from Windows (and from Mac), and to make it painfully work around the lack of this guarantee in Linux.
to post comments)