Control characters in file names
Posted Nov 26, 2010 10:28 UTC (Fri) by Yorick
In reply to: Control characters in file names
Parent article: Ghosts of Unix past, part 4: High-maintenance designs
Of course file names can be handled safely in most languages, but that's not the point. Wheeler describes it better and in more detail, but briefly, the aim is:
- Make it harder to make mistakes, brittle and/or exploitable code. Even flawless programmers are affected by other people's errors.
- Eliminate a dangerous class of control character exploits, mainly when displaying file names on terminals.
- Allow for more design options. Remember, restricting data formats can be a way to give the programmer more freedom, not less.
To illustrate the last point: The only possible delimiter for files names is currently the null byte, which is not very practical in many languages and in shell scripting in particular. Linefeeds would be much more natural and are supported by many more tools.
The benefits are clear, and the costs appear to be very low. The only serious objection I have seen so far concerns existing file names using an ISO 2022-based encoding. There are several possible solutions: allowing the control character restriction to be lifted as a per-mount option (possibly only allowing ESC, SI and SO), or a mount option that recodes into UTF-8.
to post comments)