LWN.net Logo

Control characters in file names

Control characters in file names

Posted Nov 23, 2010 19:50 UTC (Tue) by vonbrand (subscriber, #4458)
In reply to: Control characters in file names by Yorick
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

Please don't. The "control characters" in the filenames could well be regular characters in other encodings, or be part of e.g. an UTF-8 character. "Not all the world's a VAXASCII"


(Log in to post comments)

Control characters in file names

Posted Nov 23, 2010 20:44 UTC (Tue) by Yorick (subscriber, #19241) [Link]

Please don't. The "control characters" in the filenames could well be regular characters in other encodings, or be part of e.g. an UTF-8 character.

Since you ask me not to, please tell me exactly what encoding you are concerned about. Multi-byte UTF-8 characters do not contain byte 0-127.

Control characters in file names

Posted Nov 23, 2010 21:37 UTC (Tue) by ballombe (subscriber, #9523) [Link]

Probably ISO2022 still widely used in Japan (fortunately less than it used to).

Control characters in file names

Posted Nov 27, 2010 13:10 UTC (Sat) by Cato (subscriber, #7643) [Link]

ISO2022 is a truly horrible encoding that should never be used, and should certainly not be supported - it can embed normal ASCII characters within a "wide" character, making it very difficult to process.

Having looked into many different encodings, I'd agree with the suggestion to use UTF-*, but in reality systems still need to support legacy 8-bit and 16-bit encodings - there are many filesystems out there with filenames in legacy encodings, and often a mix of encodings.

The ability to mix legacy encodings in a single filesystem is sometimes useful for applications but it creates major data conversion issues when users do this.

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.

Putting the encoding into the filesystem is suspect, particularly considering the deep unpleasantness of Apple's use of their own two variants of Unicode normalisation form D (NFD) in HFS+ and other filesystems, whereas the rest of the world including Linux and the Web uses normalisation form C (NFC).

Control characters in file names

Posted Nov 29, 2010 10:03 UTC (Mon) by quotemstr (subscriber, #45331) [Link]

Generally I'd agree with banning control characters by default from pathnames in a new OS, but it's too late to do that now with Linux/Unix.
I don't think it's too late at all. The overwhelming majority of legitimate filenames do not contain characters in the range proposed for blacklisting. As an option that's turned on by default, forbidding control characters would present no practical problems whatsoever. Nobody relies on filenames containing ^V or newline.

Control characters in file names

Posted Nov 25, 2010 16:29 UTC (Thu) by Spudd86 (guest, #51683) [Link]

They are also nearly impossible to handle correctly in shell scripts, and you should be using UTF8 for file names.

No one is suggesting this be something done in a non-optional way, but the encodings it would actually break that are also in use on Linux systems are very few and far between (probably largely because EVERYTHING expects those to be control characters, and they break shell scripts, etc. Plus we have UTF8)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds