Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
Control characters in file names
Posted Nov 23, 2010 22:29 UTC (Tue) by foom (subscriber, #14868)
(If you didn't have any ASCII locales, you could use an EBCDIC locale -- your system just needs to be self-consistent for all the characters in the Portable Character Set, across locales. UTF-7/16/32 are right out, though, since all characters in the Portable Character need to be encoded by a single byte.)
Posted Nov 25, 2010 16:19 UTC (Thu) by Spudd86 (guest, #51683)
Posted Nov 25, 2010 21:03 UTC (Thu) by iabervon (subscriber, #722)
In any case, it still wouldn't use bytes in the 0x00-0x1f range.
Posted Nov 29, 2010 10:09 UTC (Mon) by jamesh (guest, #1159)
Of course, once you start working with Unicode it isn't really enough to just require unique representations for each code point. You can have multiple sequences of unicode code points that have the same meaning. So you really want a normalised code point sequence encoded in a canonical form.
Posted Nov 29, 2010 18:18 UTC (Mon) by iabervon (subscriber, #722)
The code point sequence issue is real (which is why I was careful not to say "character" anywhere), and unfortunately, there are multiple possible normalizations. So not only do you need a normalized code point sequence, you need one with a particular normalization that everything will agree on. (Also, since the availability of characters may affect the normalization, you might in principle have to specify the version of Unicode, although I think they're careful not to introduce new ways of getting the same character.) And, of course, you have to avoid using Apple products, because they silently rename your files to have a different normalization from what everybody else uses.
Posted Dec 1, 2010 2:32 UTC (Wed) by jamesh (guest, #1159)
My point was that if you picked a canonical representation for UTF-7, and required that file names used it, then it would work okay as a file name encoding. That said, it still isn't a very good idea ...
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds