LWN.net Logo

Thoughts from LWN's UTF8 conversion

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 16:18 UTC (Fri) by cortana (subscriber, #24596)
In reply to: Thoughts from LWN's UTF8 conversion by Cyberax
Parent article: Thoughts from LWN's UTF8 conversion

It's wonderful indeed (at least in its v3 incarnation)--but it's not sufficient. You can't open an ifstream with a boost::filesystem::path, only with a char*, and the fundamental problem we have with Windows is that its API takes char* only as a convenience for backwards-compatibility; there is no way to represent all possible valid paths with char* strings.


(Log in to post comments)

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 16:24 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I seem to remember that there's a wrapper somewhere in Boost that does that. Personally, I have a simple wrapper that does fopen or wfopen depending on whether WIN32 is defined.

(oh, and C++ iostreams are pile of $SWEAR_WORD)

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 16:40 UTC (Fri) by cortana (subscriber, #24596) [Link]

I think you're thinking of boost::filesystem::{i,o,}fstream; however you can only use it with wchar_t* paths if you are building with MS Visual C++, because it relies on std::{i,o,fstream} having overloads that take wchar_t, as well as the standard char* functions, and only MSVC provides those non-standard extensions.

iostreams are only an example from the standard library. The C standard fopen function is another--it takes char* and not wchar_t*, hence if you use it then you're screwed. Working around this by, say, calling _wfopen if _WIN32 is defined only gets you so far--as soon as you hit a library that has a foo_open function taking char* and not wchar_t*, you hit the same problem.

Summary: if you write a library that deals with files, you are only allowed to take filesystem paths as arguments to your functions unless you've ported the library to several different platforms, and made sure it can deal with Chinese and Runic filenames at the same time. :)

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 16:50 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

>Working around this by, say, calling _wfopen if _WIN32 is defined only gets you so far--as soon as you hit a library that has a foo_open function taking char* and not wchar_t*, you hit the same problem.

Yeah, so I mostly use libraries that can accept file descriptors or FILE* instead of file names. That actually covers quite a lot of functionality.

>Summary: if you write a library that deals with files, you are only allowed to take filesystem paths as arguments to your functions unless you've ported the library to several different platforms, and made sure it can deal with Chinese and Runic filenames at the same time. :)

Well, no arguments here. I'd also add working with filenames encoded in an encoding that is 8-bit and not the same as the system encoding.

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 17:19 UTC (Fri) by cortana (subscriber, #24596) [Link]

Be careful with FILE*. Your application might use a different C Runtime from the DLL you're passing the FILE* to (Windows! It's great!). Best stick with HANDLE I think. :)

Thoughts from LWN's UTF8 conversion

Posted Feb 3, 2012 19:32 UTC (Fri) by cmccabe (guest, #60281) [Link]

Maybe what ought to happen is that you write some kind of shim library that has all the C standard library functions in it-- open, fopen, etc. You have some code in the shim library that translates all the UTF8 filenames to UCS-16 or whatever Windows is using. Then you rebuild all the libraries you're using against this shim library.

Of course, this approach basically forces you to bundle all your libraries with your application. But I was under the impression that this was standard operating procedure on Windows anyway, because some evil guy could overwrite your shared copy of the shared library with an older version, etc.

Does that make sense at all? I've never developed on Windows, so it might be nonsense.

Thoughts from LWN's UTF8 conversion

Posted Feb 4, 2012 0:15 UTC (Sat) by cortana (subscriber, #24596) [Link]

It can work and it's one way to do it. You will run into problems if you need to use a DLL that isn't linked against your replacement library itself though (think binary-only third-party DLLs).

Thoughts from LWN's UTF8 conversion

Posted Feb 6, 2012 17:02 UTC (Mon) by jwakely (subscriber, #60262) [Link]

Boost.Filesystem will most likely be in the next C++ standard, at which point std::fstream will get new constructors taking std::path objects. I realise that's not much use to anyone right now, unless you're planning to go into a coma for a few years.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds