|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 18:54 UTC (Sat) by foom (subscriber, #14868)
In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by zooko
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

> The bytes in the filenames in NT are defined to be UTF-16 encodings of characters.

That's not actually true. The windows APIs take arrays of 16-bit "things". Those are supposed to be
UTF-16, but none of the APIs will check that. So, you can easily create invalid surrogate pair
sequences. Now, it's a *lot* easier to ignore this issue on windows than on linux, because:

a) The set of invalid sequences in UTF-16 is a lot smaller than in UTF-8.
b) Nobody creates those by accident. It won't happen just because you set your LOCALE wrong.
c) the windows Unicode APIs are all 16-bit unicode, so they never try decoding the surrogate pair
sequences anyways
d) Even UTF-16->UTF-32 decoders often decode a lone surrogate pair in UTF-16 into a lone
surrogate pair value in UTF-32 (even though it's theoretically not supposed to do that).


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds