> The bytes in the filenames in NT are defined to be UTF-16 encodings of characters.
That's not actually true. The windows APIs take arrays of 16-bit "things". Those are supposed to be
UTF-16, but none of the APIs will check that. So, you can easily create invalid surrogate pair
sequences. Now, it's a *lot* easier to ignore this issue on windows than on linux, because:
a) The set of invalid sequences in UTF-16 is a lot smaller than in UTF-8.
b) Nobody creates those by accident. It won't happen just because you set your LOCALE wrong.
c) the windows Unicode APIs are all 16-bit unicode, so they never try decoding the surrogate pair
sequences anyways
d) Even UTF-16->UTF-32 decoders often decode a lone surrogate pair in UTF-16 into a lone
surrogate pair value in UTF-32 (even though it's theoretically not supposed to do that).