User: Password:
|
|
Subscribe / Log in / New account

Control characters in file names

Control characters in file names

Posted Nov 23, 2010 20:11 UTC (Tue) by jengelh (subscriber, #33263)
In reply to: Control characters in file names by Yorick
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

Wheeler: "this lack of limitations [...] makes it impossible to consistently and accurately display filenames". So yeah.

Let's ban 0x80-0xFF too next to 0x01-0x1F, because they too cannot be accurately be displayed (think of the byte sequence "0x20 0xC2 0x20" when used in contemporary Linux systems)!!11


(Log in to post comments)

Control characters in file names

Posted Nov 25, 2010 16:37 UTC (Thu) by Spudd86 (guest, #51683) [Link]

There are strong reasons to disallow 0x01-0x1F in file names (do you know how man lines of shell it takes to write something that can iterate over files and run one command on each when it must handle files that have those in the name? Hundreds, nobody does it so pretty much every shell script ever written will explode if it runs across such a file).

It has nothing to do with accurately displaying them, and everything to do with the fact that the cause actual problems and in a utf8 locale you gain NOTHING from being able to use 0x01-0x1F in file names (if you're going to bring up storing 'arbitrary' binary keys in file names, don't, you already CAN'T because you can't use '\0' or '/')

There's an article about exactly this somewhere, IIRC linked from LWN at some point in the past when the patch that allowed to you disable those chars came up I think.

Control characters in file names

Posted Nov 25, 2010 19:15 UTC (Thu) by jengelh (subscriber, #33263) [Link]

>do you know how man[y] lines of shell it takes

Aha. So... everybody knows it is possible to have files with odd filenames, and everybody keeps on using shells or shell constructs that cannot deal with this properly? I can see the flaw in that.

>something that can iterate over files and run one command on each when it must handle files that have those in the name?

for i in *; do cmd "$i"; done;
find . -whatever -exec cmd \;
find . -whateverelse -print0 | xargs -0 cmd;

There are so many safe ways available. I am really not responsible for people doing UUOC or thelike.

Control characters in file names

Posted Nov 25, 2010 20:31 UTC (Thu) by Spudd86 (guest, #51683) [Link]

The for loop won't work... the find examples only work if you want to run a single, non-shell command.

Control characters in file names

Posted Nov 25, 2010 21:48 UTC (Thu) by jengelh (subscriber, #33263) [Link]

In which case will the for loop not work? (Other than * not globbing files starting with a dot.)

Control characters in file names

Posted Nov 25, 2010 22:00 UTC (Thu) by Spudd86 (guest, #51683) [Link]

if there's a file that starts with - or has any sort of control character it will break.

see here: http://www.dwheeler.com/essays/filenames-in-shell.html and here: http://www.dwheeler.com/essays/fixing-unix-linux-filename... although for some reason I remember it being much worse than that, though being correct everywhere in your script could eventually be a pain.

Control characters in file names

Posted Dec 2, 2010 19:19 UTC (Thu) by Ross (guest, #4065) [Link]

Are you proposing to remove hyphens from filenames too, or is this getting off-topic? :)

Control characters in file names

Posted Nov 25, 2010 23:30 UTC (Thu) by cmccabe (guest, #60281) [Link]

> In which case will the for loop not work? (Other than * not globbing files
> starting with a dot.)

The for loop should be

for i in *; do cmd "./$i"; done;

In case one of the filenames begins with a dash.

Control characters in file names

Posted Nov 26, 2010 10:28 UTC (Fri) by Yorick (subscriber, #19241) [Link]

Of course file names can be handled safely in most languages, but that's not the point. Wheeler describes it better and in more detail, but briefly, the aim is:
  • Make it harder to make mistakes, brittle and/or exploitable code. Even flawless programmers are affected by other people's errors.
  • Eliminate a dangerous class of control character exploits, mainly when displaying file names on terminals.
  • Allow for more design options. Remember, restricting data formats can be a way to give the programmer more freedom, not less.

To illustrate the last point: The only possible delimiter for files names is currently the null byte, which is not very practical in many languages and in shell scripting in particular. Linefeeds would be much more natural and are supported by many more tools.

The benefits are clear, and the costs appear to be very low. The only serious objection I have seen so far concerns existing file names using an ISO 2022-based encoding. There are several possible solutions: allowing the control character restriction to be lifted as a per-mount option (possibly only allowing ESC, SI and SO), or a mount option that recodes into UTF-8.

Control characters in file names

Posted Nov 29, 2010 16:30 UTC (Mon) by nix (subscriber, #2304) [Link]

The xargs only works if you have at least one matching file. You want -0r. (Of course this is totally GNU-only.)

Control characters in file names

Posted Dec 2, 2010 19:17 UTC (Thu) by Ross (guest, #4065) [Link]

You repeat at least three times in the thread that it takes hundreds of lines to handle control characters in shell scripts. That's just not true. But worse, it's a terrible argument even if it were true.

You aren't proposing to remove all the characters that make it difficult to write correct shell scripts. In fact tab and newline and the worst "offenders" in your list of control characters. Most shells don't care about control characters at all. This can't be an argument for implementing the character set limitation because implementing it won't fix the problem -- the same script would still be broken by files with spaces in them (and any number of shell metacharacters).

And even if it did, I'm not sure the features of Bourne shell should dictate how the filesystem interface should work. The existing kernel and shell were designed together -- if you want to redo the filename encoding in the kernel, you should consider how the shell could be changed and also how other tools besides the shell are affected. Only looking at the shell is just too much focus.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds