|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 17:45 UTC (Sat) by tialaramex (subscriber, #21167)
In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by zooko
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

To be quite clear about what I'm saying and what I'm not saying:

* I am saying that you can't guarantee that the filenames Windows gives you are all legal UTF-16 Unicode strings. Windows makes no such promise. Non-Win32 programs (including Win32 programs which also use native low-level APIs) may create files which don't obey the convention, and filenames on disk or from a network filesystem are not checked to see if they are valid UTF-16.

* I am NOT saying that there are people running Windows whose filenames are all in SJIS or ISO-8859-8 or even Windows codepage 1252. That would be silly because those encodings (and indeed practically all legacy encodings) are 8-bit and all filenames in Windows are 16-bit. When a Windows filename "means something" at all, the meaning will be encoded as UTF-16, or perhaps if you're really unlucky, UCS-2.

So if your problem is "People keep running my program with crazy locale settings and legacy encodings of filenames" well you have my sympathy, and yes you will need to handle this for Linux (even if only by writing a FAQ entry telling them to switch to UTF-8) and might get away without on Windows.

But if the problem is "My program blindly assumes filenames are legal Unicode strings" then you're in a bad way, stop doing that because it's a bug at least on Linux and Windows, and IMO most likely on Mac OS X too (though their documentation claims otherwise).


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds