|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 25, 2009 18:09 UTC (Wed) by mrshiny (guest, #4266)
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

You can pry my spaces from my filenames out of my cold dead fingers. But frankly spaces are no different than other shell meta-characters. If a filename is properly handled for spaces, doesn't it automatically work for all the other chars? If not, it should be easy enough to fix the SHELLS in this case.

Mr. Wheeler makes a mistake in the article as well. Windows has no problem with files starting with a dot. It's only Explorer and a handful of other tools that have problems. Otherwise Cygwin would be pretty annoying to use.

Overall, however, I like the idea of restricting certain things, especially the character encoding. The sooner the other encodings can die, the sooner I can be happy.


to post comments

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 25, 2009 19:52 UTC (Wed) by emk (subscriber, #1128) [Link] (3 responses)

If a filename is properly handled for spaces, doesn't it automatically work for all the other chars?

Unfortunately, no. One example mentioned in the article is files with names like "-rf", which will appear at the start of any glob list. To deal with this, you generally need to add "--" before any globs, but different commands behave differently, and not all commands support "--".

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 1:12 UTC (Thu) by mrshiny (guest, #4266) [Link] (1 responses)

I was actually referring to the other special characters that cause problems, such as shell control characters. The dash is a different case because it's actually the programs (not the shell) that are interpreting certain strings as filenames and others as arguments. There can't really be a generic solution to this because of the way file globbing works: the globbing happens outside the program so it has no input into the command line that is passed in. If filenames can't start with a dash, but a command was ported from DOS and uses backslash as its option separator, shell globbing will confuse that program too.

Not that preventing files like '-rf' isn't a bad idea. I think it would prevent a number of mistakes.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 30, 2009 16:41 UTC (Mon) by Hawke (guest, #6978) [Link]

I don't think any DOS applications use backslash for their option marker. Some use dash, and most use slash. But I'm pretty sure that practically none if any use backslash

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 15:38 UTC (Thu) by dwheeler (guest, #1216) [Link]

Actually, there is a general solution for the dash: whenever you glob in the current directory, stick "./" in front of the glob. So always use "cat ./*" instead of "cat *". I do mention that in my article.

Problem is, nobody does that. It's too easy to use "*", it's what all the documents say, and it's what all the users actually do. You have to train GUI programs to do this, too. So instead of constantly trying to get developers to do something "unnatural", let's change the system so the "obvious" way is always correct.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 25, 2009 22:45 UTC (Wed) by epa (subscriber, #39769) [Link] (12 responses)

If not, it should be easy enough to fix the SHELLS in this case.
Three decades of unhappy experience says otherwise. Nobody has a reasonable proposal to fix all the shells, all the scripting languages and all the user applications so that they don't make unsafe assumptions about filenames (e.g. assuming a filename can never begin with - or never contain the \n character).

On the other hand, a kernel-level check for bad characters is simple to implement and obviously solves these problems at a stroke.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 1:16 UTC (Thu) by mrshiny (guest, #4266) [Link] (7 responses)

I was actually thinking more along the lines of:

1. Prevent files that start with dash (technically not a shell problem)
2. Prevent files that contain control characters (newline included)
3. Make the shells easy to use in the face of filenames with spaces, semi-colons, colons, quotes, punctuation, etc.

The first item is more of an interaction between programs and the shell and not specifically a shell problem. If a program doesn't support -- then it can never be used securely.

The second item seems like an obvious step to take with no downside.

The third item is what I meant by fixing the shells: shells should make it braindead-easy to manipulate filenames without them turning into commands or other nonsense. Once a filename is loaded into a variable you shouldn't have to worry about characters in the name turning into shell commands. Once that's in place we can start fixing scripts. Maybe an environment variable can determine how that instance of the shell works: in secure mode or legacy mode.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 14:45 UTC (Thu) by mjthayer (guest, #39183) [Link] (6 responses)

One thing that would help make the shell more solid would be treating -* as hidden files and skip over them when expanding wildcards.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 15:08 UTC (Thu) by mjthayer (guest, #39183) [Link] (5 responses)

It could also recognise the null character as an argument separator as in 'find -print0'. It could even set some environment variable to tell tools like find that this is supported so that they can use it by default when not outputting to the console. And when substituting environment variables and backticked commands to the arguments for other commands, it could sanitise out anything starting with a hyphen. While this would break a few things, it would probably fix many more. While on that subject, the shell could enforce that substitutions that resolve to the arguments for other commands are not allowed to spill over (e.g. VAR='myfile; rm -rf /'; ls $VAR).

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 19:49 UTC (Thu) by dwheeler (guest, #1216) [Link] (3 responses)

[The shell] could also recognise the null character as an argument separator as in 'find -print0'. It could even set some environment variable to tell tools like find that this is supported so that they can use it by default when not outputting to the console.

Yes, I already added the "shell could recognize null as separator". And you're right, adding an environment variable could help (though it could also backfire on older scripts!).

While on that subject, the shell could enforce that substitutions that resolve to the arguments for other commands are not allowed to spill over (e.g. VAR='myfile; rm -rf /'; ls $VAR).

This particular example doesn't do quite what you think; it just passes to ls several values: "myfile;", "rm", "-rf", and "/", and you end up with some error messages and a listing of "/". But with more tweaking, you can definitely get some exploits out of this approach. Which is why removing the space character from IFS is a big help - then VAR would become a single parameter again.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 1:11 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

bash implemented an environment variable to tell subprocesses where
arguments began and ended at one point.

It was removed, but I can't remember why: some sort of compatibility
problem?

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 31, 2009 7:47 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

I was wondering now whether to ask about this on the Bash mailing lists. Just out of interest, are you involved with the development of Bash/the GNU tools in any way? You seem well informed about them.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 31, 2009 19:28 UTC (Tue) by nix (subscriber, #2304) [Link]

I've contributed fixes now and then, but I just read a lot. :) The
projects are public, after all.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Apr 3, 2009 18:49 UTC (Fri) by anton (subscriber, #25547) [Link]

It could also recognise the null character as an argument separator as in 'find -print0'.
A few weeks ago I wanted to process my .ogg files which contain all kinds of characters that are treated as meta-characters by the shell or other programs I use in sheel scripts. I eventually ended up writing a new shell dumbsh that uses NUL as argument separator, and feeding it from find, with some intermediate processing in awk (which is quite flexible about meta-characters).

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 21:11 UTC (Thu) by explodingferret (guest, #57530) [Link] (3 responses)

Well, this is a good point. There are basically two uses of shell scripts:

1) Portable scripts (of a kind), init scripts, and build scripts. In all these cases the scripts need to have #!/bin/sh at the top, and contain just about every fix for every problem ever, including [ "x$var" = x ] and ${1:-"${@}"} and various other monstrosities.

In these scripts, the quotes around variables; ./ in front of filenames; IFS= for read; and filename=`foo; printf x`; filename="${filename%x}" crap will *always* have to be there. So no point trying to fix anything for those.

2) The other use is scripts that are used on either one system (personal scripts) or one "class" of system, like "only Debian GNU/Linux".

These scripts can use a particular shell like #!/bin/bash and assume the existence of -print0 and -printf to find and -d '' to read and all the other little conveniences which make a lot of the problem go away.

Well, other than newlines at the end of filenames. That's the only case that I refuse to take account of in my scripts, unless security issues might arise.

----

I'm not saying that I disagree with the ideas in this article (although I'd like to keep spaces and shell special characters in my filenames, actually). I'm just saying that as far as shell scripting is concerned, it may not actually help all that much. The main gain for me would be the security fixes and less typing in my interactive shell. Even though I'm pretty sure I don't have any newlines or control characters in any of my filenames, I just can't bring myself to write bad scripts, and that's kinda sad.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 1:18 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

At work I co-maintain scripts in a third class: scripts that come with
an 'appliance' (for this purpose, a set of software which is the raison
d'etre of the hardware for which it is bought: this could be a tiny
embedded system or a giant bank database or simulation box). In this case,
they can dictate whatever shell they damn well like.

I dictated zsh 4, simply because for this application C was far too
unpleasant, ksh was too buggy (thanks, Linux, for pdksh, with its broken
propagation of variables out of loops-with-redirection), and there was no
hope of getting the clients' systems people to install Perl: but they were
perfectly happy to install a recent zsh: fewer dependencies and no scary
modules (well, actually zsh *does* have a module system but they didn't
realise that!)

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Nov 15, 2009 1:06 UTC (Sun) by yuhong (guest, #57183) [Link] (1 responses)

"ksh was too buggy (thanks, Linux, for pdksh, with its broken
propagation of variables out of loops-with-redirection)"
Was ksh93 tried?

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Nov 15, 2009 13:15 UTC (Sun) by nix (subscriber, #2304) [Link]

zsh93 was too sodding hard to require because building it was a nightmare.
At the time it wasn't free enough either.

dot files in Windows and

Posted Mar 25, 2009 23:09 UTC (Wed) by pr1268 (guest, #24648) [Link] (4 responses)

Windows has no problem with files starting with a dot.

Oddly enough, Windows will not allow the name of a directory to end in a dot. I discovered this when, back in my Windows days, I had to name an artist directory R.E.M without a final dot. Windows wouldn't allow me to put that trailing dot in the file name. Go figure. Linux doesn't have any issue with it (and since I've abandoned Windows on my home computers, I was able to rename the directory to include that dot).

Going off on a tangent: here are some files in my music directory which would make Mr. Wheeler cringe:

  • Beatles/Help!/01_-_Help!.ogg ('!' in directory and file names)
  • Donald_Fagen/The_Nightfly/01_-_I.G.Y..ogg (two dots before the file extension - not really an issue but interesting)
  • Sugar_Ray/14:59/ (':' in directory name)
  • Coldplay/X&Y/ ('&' in directory name)
  • John_Cage/4'33".ogg (single- and double-quotes - never mind that this is a really quiet song :) )
  • Radiohead/Hail_To_The_Theif/01_-_2_+_2_=_5_(The_Lukewarm.).ogg (A whole bunch of issues here)

In a digital forensics class the professor had us searching through a filesystem that contained directories named "..." (minus quotes). Good times...

dot files in Windows and

Posted Mar 25, 2009 23:42 UTC (Wed) by dwheeler (guest, #1216) [Link] (1 responses)

No cringe. I didn't see any control characters there, nor leading dashes. And you don't seem to require non-UTF-8. If we could get those done, the rest are gravy.

dot files in Windows and

Posted Mar 26, 2009 0:07 UTC (Thu) by pr1268 (guest, #24648) [Link]

Wow, thanks for the reply! And thank you for the original article--I found myself nodding in agreement many times while reading it.

Of course, even with your non-cringing approval, I certainly had lots of shell escaping to do with these files (and many others--my collection is approaching 10,000 audio files from almost 900 music CDs).

dot files in Windows and

Posted Mar 26, 2009 1:04 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Hah, that's nothing: I saw a directory called '.. ' a while back, while
looking at an attacked system's disk image (for fun, I have no life).

dot files in Windows and

Posted Mar 26, 2009 10:21 UTC (Thu) by mjj29 (guest, #49653) [Link]

I've seen ... in that context too

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 30, 2009 19:36 UTC (Mon) by rickmoen (subscriber, #6943) [Link]

mrshiny wrote:

You can pry my spaces from my filenames out of my cold dead fingers.

ObMenInBlack: "Your offer is acceptable."

(I remember having to write AppleScript to recurse through directories cleaning up files created on network shares by MacOS-using munchkins who put space characters at the ends of filenames, in order for them to become valid filenames when seen by MS-Windows-using employees looking at the same network shares. The converse problem was files, from MS-Windows users, with names containing colon, which is a reserved character in MacOS file namespace. What a pain in the tochis.)

Rick Moen
rick@linuxmafia.com


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds