Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
Let's ban 0x80-0xFF too next to 0x01-0x1F, because they too cannot be accurately be displayed (think of the byte sequence "0x20 0xC2 0x20" when used in contemporary Linux systems)!!11
Control characters in file names
Posted Nov 25, 2010 16:37 UTC (Thu) by Spudd86 (guest, #51683)
It has nothing to do with accurately displaying them, and everything to do with the fact that the cause actual problems and in a utf8 locale you gain NOTHING from being able to use 0x01-0x1F in file names (if you're going to bring up storing 'arbitrary' binary keys in file names, don't, you already CAN'T because you can't use '\0' or '/')
There's an article about exactly this somewhere, IIRC linked from LWN at some point in the past when the patch that allowed to you disable those chars came up I think.
Posted Nov 25, 2010 19:15 UTC (Thu) by jengelh (subscriber, #33263)
Aha. So... everybody knows it is possible to have files with odd filenames, and everybody keeps on using shells or shell constructs that cannot deal with this properly? I can see the flaw in that.
>something that can iterate over files and run one command on each when it must handle files that have those in the name?
for i in *; do cmd "$i"; done;
find . -whatever -exec cmd \;
find . -whateverelse -print0 | xargs -0 cmd;
There are so many safe ways available. I am really not responsible for people doing UUOC or thelike.
Posted Nov 25, 2010 20:31 UTC (Thu) by Spudd86 (guest, #51683)
Posted Nov 25, 2010 21:48 UTC (Thu) by jengelh (subscriber, #33263)
Posted Nov 25, 2010 22:00 UTC (Thu) by Spudd86 (guest, #51683)
see here: http://www.dwheeler.com/essays/filenames-in-shell.html and here: http://www.dwheeler.com/essays/fixing-unix-linux-filename... although for some reason I remember it being much worse than that, though being correct everywhere in your script could eventually be a pain.
Posted Dec 2, 2010 19:19 UTC (Thu) by Ross (subscriber, #4065)
Posted Nov 25, 2010 23:30 UTC (Thu) by cmccabe (guest, #60281)
The for loop should be
for i in *; do cmd "./$i"; done;
In case one of the filenames begins with a dash.
Posted Nov 26, 2010 10:28 UTC (Fri) by Yorick (subscriber, #19241)
To illustrate the last point: The only possible delimiter for files names is currently the null byte, which is not very practical in many languages and in shell scripting in particular. Linefeeds would be much more natural and are supported by many more tools.
The benefits are clear, and the costs appear to be very low. The only serious objection I have seen so far concerns existing file names using an ISO 2022-based encoding. There are several possible solutions: allowing the control character restriction to be lifted as a per-mount option (possibly only allowing ESC, SI and SO), or a mount option that recodes into UTF-8.
Posted Nov 29, 2010 16:30 UTC (Mon) by nix (subscriber, #2304)
Posted Dec 2, 2010 19:17 UTC (Thu) by Ross (subscriber, #4065)
You aren't proposing to remove all the characters that make it difficult to write correct shell scripts. In fact tab and newline and the worst "offenders" in your list of control characters. Most shells don't care about control characters at all. This can't be an argument for implementing the character set limitation because implementing it won't fix the problem -- the same script would still be broken by files with spaces in them (and any number of shell metacharacters).
And even if it did, I'm not sure the features of Bourne shell should dictate how the filesystem interface should work. The existing kernel and shell were designed together -- if you want to redo the filename encoding in the kernel, you should consider how the shell could be changed and also how other tools besides the shell are affected. Only looking at the shell is just too much focus.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds