Wheeler: Fixing Unix/Linux/POSIX Filenames
Wheeler: Fixing Unix/Linux/POSIX Filenames
Posted Mar 28, 2009 18:54 UTC (Sat) by foom (subscriber, #14868)In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by zooko
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames
That's not actually true. The windows APIs take arrays of 16-bit "things". Those are supposed to be
UTF-16, but none of the APIs will check that. So, you can easily create invalid surrogate pair
sequences. Now, it's a *lot* easier to ignore this issue on windows than on linux, because:
a) The set of invalid sequences in UTF-16 is a lot smaller than in UTF-8.
b) Nobody creates those by accident. It won't happen just because you set your LOCALE wrong.
c) the windows Unicode APIs are all 16-bit unicode, so they never try decoding the surrogate pair
sequences anyways
d) Even UTF-16->UTF-32 decoders often decode a lone surrogate pair in UTF-16 into a lone
surrogate pair value in UTF-32 (even though it's theoretically not supposed to do that).
