LWN.net Logo

Control characters in file names

Control characters in file names

Posted Nov 27, 2010 13:10 UTC (Sat) by Cato (subscriber, #7643)
In reply to: Control characters in file names by ballombe
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

ISO2022 is a truly horrible encoding that should never be used, and should certainly not be supported - it can embed normal ASCII characters within a "wide" character, making it very difficult to process.

Having looked into many different encodings, I'd agree with the suggestion to use UTF-*, but in reality systems still need to support legacy 8-bit and 16-bit encodings - there are many filesystems out there with filenames in legacy encodings, and often a mix of encodings.

The ability to mix legacy encodings in a single filesystem is sometimes useful for applications but it creates major data conversion issues when users do this.

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.

Putting the encoding into the filesystem is suspect, particularly considering the deep unpleasantness of Apple's use of their own two variants of Unicode normalisation form D (NFD) in HFS+ and other filesystems, whereas the rest of the world including Linux and the Web uses normalisation form C (NFC).


(Log in to post comments)

Control characters in file names

Posted Nov 29, 2010 10:03 UTC (Mon) by quotemstr (subscriber, #45331) [Link]

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.
I don't think it's too late at all. The overwhelming majority of legitimate filenames do not contain characters in the range proposed for blacklisting. As an option that's turned on by default, forbidding control characters would present no practical problems whatsoever. Nobody relies on filenames containing ^V or newline.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds