Safename: restricting "dangerous" file names

By Jake Edge
May 11, 2016

There are few restrictions on file names in Linux—essentially just two: no "/" and no "\0"—but that freedom can lead to various problems, including security problems. Vulnerabilities like arbitrary file deletion and denial of service have resulted from programs mishandling file names with unexpected characters, for example. Most users and administrators do not use file names with control characters or other oddities, to the point where some may not even realize it is possible to construct such file names. Protecting those users from these kinds of unexpected problems and vulnerabilities is the target of the Safename Linux security module (LSM) that is being proposed by David A. Wheeler.

There are a myriad of ways that file names can go "wrong" on Linux. Consider a file name that begins with a "-"; if that name ends up on the command line (perhaps via a shell glob pattern), the file could be interpreted as command-line switch. File names containing newlines or other control characters can also lead to unexpected results—and output. Beyond that, file names that are illegal in the system encoding (e.g. UTF-8) cannot be displayed sensibly.

The problems that come from unexpected file names are described on a web page that Wheeler maintains. On that page, he suggests that allowing system administrators to restrict the kinds of file names that can be created would alleviate a whole raft of problems. Safename would provide a mechanism to do just that.

As Wheeler notes on that page (and in the patches), POSIX defines "portable" file names that are quite a bit more restrictive (only ASCII alphanumeric characters, period, underscore, and hyphen if it is not the first character of the name). Other operating systems and some filesystems on Linux already impose more restrictions on file names, including disallowing space and control characters or mandating a particular encoding.

There have been security vulnerabilities caused by unexpected file names, including a denial of service in logrotate caused by newlines or backslashes in file names (CVE-2011-1155) and a remote arbitrary file deletion vulnerability in uscan caused by white-space characters in file names (CVE-2013-7085). Undoubtedly, others lurk in various programs, but the bigger problem is probably contained in scripts and other "one-off" programs that administrators write to solve a problem quickly—without considering the ways that "strange" file names can result in bugs, especially when run on user-controlled directories.

In the patch posting, Wheeler outlines three ways that these potentially dangerous file names might come about. A malicious user or application could directly create a file that is then used by some other non-malicious application leading to an exploit. Or a non-malicious, unprivileged application could be tricked by an attacker into creating a dangerous file name, which could then lead to an exploit when some non-malicious, but buggy, script or program uses the file. Similarly, a privileged application could be tricked into creating one of these file names, which could lead to an exploit when some other code handles the file name—which means that administrators may want a way to stop even privileged code from creating them.

Safename will help administrators avoid these kinds of problems by restricting the kinds of file names that can be created. Notably, it does not enforce any restrictions on existing file names, though that could be added as an (expensive) operation at mount time. It uses the LSM hooks for any operation that can create a new file name (file creation, hard or symbolic link creation, rename, and directory or special-file creation) and enforces a set of restrictions on them.

The behavior of Safename is governed by a number of control files that are currently under /proc/sys/kernel/safename, but will be moving /sys/fs/safename based on a suggestion from Casey Schaufler. Enabling the feature for unprivileged users is done using the mode_for_unprivileged file, while privileged users' file name creation is governed by mode_for_privileged. Currently, "privileged" means has CAP_SYS_ADMIN, though that will change to CAP_MAC_ADMIN, which was also suggested by Schaufler since it is less likely to be given to a process for other purposes (CAP_SYS_ADMIN is something of a catch-all).

There are two settings available that can be combined and written to the two mode files. They are implemented as two bits that govern whether the rules are enforced and whether illegal file names are logged (using printk_ratelimited()). The low-order bit is for the enforcement setting and the other is for the logging setting. So, zero means no enforcement or logging, one is for enforcement without logging, two for logging without enforcement, and three for both actions. For both modes, the default value is zero, which means no enforcement and no reporting (effectively the same as a kernel running without the module loaded).

In addition, there are configuration files to alter the rules for file names. The boolean utf8 file governs whether the file names must be valid UTF-8; it defaults to zero, which turns off UTF-8 checking. There are also three files that govern the character values allowed in various parts of the file name: first character, last character, and the characters in between. Those files are:

permitted_bytes_initial: The permitted set of characters for the first byte of the file name, the default is 33-44,46-125,128-254, which omits control characters, space, hyphen, tilde, delete (0x7f), and 0xff.
permitted_bytes_middle: The permitted set for the characters of the file name that are not the first or last (so file names of one or two characters are not subject to these requirements). By default, the value is 32-126,128-254, which leaves out control characters, delete, and 0xff.
permitted_bytes_final: The set of characters allowed for the last byte of a file name (a one-character file name must pass the initial and final tests). The default is 33-126,128-254, which removes control characters, space, delete, and 0xff.

Any attempt to create a file name that fails the tests will be rejected with an EPERM error, though Schaufler pointed out that EINVAL might be a better choice.

The comments on the patches have been fairly sparse to date, but the proposal is an indicator that the security module stacking feature is leading to more special-purpose LSMs being developed. When the single LSM slot was generally occupied by one of the monolithic LSMs (e.g. SELinux, AppArmor, Smack), there was little point in creating smaller modules that catered to a specific security concern. With the ability to add multiple LSMs that came with module stacking, efforts like LoadPin and Safename will be able to offer specialized tools for administrators who want them.

Index entries for this article
Kernel	Modules/Security modules
Kernel	Security/Security modules
Security	Linux Security Modules (LSM)

Safename: restricting "dangerous" file names

Posted May 12, 2016 6:35 UTC (Thu) by ttonino (guest, #4073) [Link] (60 responses)

I feel that having the name on disk is not the real problem: the problem occurs when the name is processed or used and triggers a bug.

Now, the names are generated by circumstances or by attackers.

- attacker submits a web form. Broken server side script splits name in 2 and executes half of it as a command.
- user tries to delete a file named -R and encounters surprising results.

The second problem would be foiled, because that file could not have been created in the first place. The first problem still exists.

A risk that the file name filtering can trigger is seen in a recent Windows trojan which installs a directory with a reserved name (I think named com1: or similar). With the result that directory cannot be deleted. But this sounds less of a risk if the filtering is done cleanly.

Safename: restricting "dangerous" file names

Posted May 12, 2016 7:58 UTC (Thu) by NAR (subscriber, #1313) [Link]

I think that broken script problem can't be fixed in the kernel. But it's good that the filename problem is being fixed.

Safename: restricting "dangerous" file names

Posted May 12, 2016 12:54 UTC (Thu) by matthias (subscriber, #94967) [Link] (5 responses)

Windows is quite funny in handling the reserved names. This happens if you do not have a directory with device nodes (/dev), but instead interpret every occurence of com1, lpt1, nul, con etc. as "the user wants to access the device", regardless, where the filename appears. This already was fun in DOS times.

This security module should be much better, as it only disallows creation of reserved names. It does not forbid unlink() or open(), so you cannot have files that cannot be accessed or deleted.

Safename: restricting "dangerous" file names

Posted May 12, 2016 13:07 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

To be fair to DOS here, it's back-compat with DOS 1.0 that gives it this problem. In DOS 1.0, you didn't have directories, thus could you not have a /dev equivalent. When DOS 2.0 added directories, you could not assume that the programs users ran were directory-aware; if I did B: CHDIR AWESOME then A:\SAUCE, there was no way to tell whether an access to "COM1" was intended to be a DOS 1.0-style access to the serial port, or a DOS 2.0-style access to B:\AWESOME\COM1. DOS chose to assume that you always meant a DOS 1.0-style access to the device, and Windows chose to keep that constraint.

Safename: restricting "dangerous" file names

Posted May 12, 2016 13:40 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

if I did B: CHDIR AWESOME then A:\SAUCE, there was no way to tell whether an access to "COM1" was intended to be a DOS 1.0-style access to the serial port, or a DOS 2.0-style access to B:\AWESOME\COM1

Really? The fact that DOS 1.0 program would use function 0F and DOS 2.0 program would use function 3D is not bin enough clue?

The fact that MS DOS 2.0 kept all these special files accessible from all places is just lazyness, plain and simple. MS DOS 2.0 could have fixed that problem easily. Now, with MS DOS 3.0 or MS DOS 5.0 it would have been problematic because at that point there were programs which used old behavior - but that's different story.

What MS DOS 2.0 did and what Windows did was clearly a mistake which haunts us to this day. That OS also made another mistake: it kept "/" as a command line switch character by default. Sure, it introduced the ability to change it but kept "/" as default we are still struggling with crazy "/" vs "\" mixup.

Now, it's easy to say that all these things were clearly a mistakes - but they kept MS-DOS usable back then and that's why we fight them now. In an alternate universe "proper" MS-DOS would have just died and we would have blamed some other OS for our miserable present.

Safename: restricting "dangerous" file names

Posted May 12, 2016 14:16 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

At least one program I used (and ported to Linux) used functions 0Fh/14h/15h for all file access - it had been written for DOS 1.0, and a small amount of DOS 2.0 functionality was added as later extensions, using function 3Bh to change directory, but still using the old (working) core to access files. It massively reduced the scope of the rework required if 0Fh still worked as the "DOS 2.0" file access function - in this particular case, it meant that the program worked on both DOS 1.0 and DOS 2.0, but the extra functionality for DOS 2.0 machines simply didn't work (the program ate the errors).

And note that internally, this program was messy (as so many in-house programs are); once DOS 2.0 was deployed company-wide, later functionality was added using function 3Dh/3Fh/40h to access files, except where it was interfacing with the old core, where it used 0Fh/14h/15h. As a result, we needed an access to a file via 3Dh/3Fh/40h to access the same thing that 0Fh/14h/15h would, because otherwise whether (e.g.) COM2 was the second serial port or a file would depend on whether you were using the old bit of the codebase or the new...

Windows maintaining this in the Win16/Win32 API, however, has no excuse.

Safename: restricting "dangerous" file names

Posted May 17, 2016 14:15 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

Windows maintaining this in the Win16/Win32 API, however, has no excuse.

I'm not exactly sure what you mean by that sentence. Have people stopping writing messy programs when Windows was introduced? Or have they stopped using old code? Win16 programs had access to the very same DOS functions in Windows 1.0/2.0, you know. And even with 3.x many INT21 functions were available.

If it made no sense to break compatibility with MS DOS 2.0 then I don't see how Windows could have made any other choice.

As a result, we needed an access to a file via 3Dh/3Fh/40h to access the same thing that 0Fh/14h/15h would, because otherwise whether (e.g.) COM2 was the second serial port or a file would depend on whether you were using the old bit of the codebase or the new...

You mean your code was so convoluted and obfuscated that it was impossible to call wrapper which would give you "compatible" interface on top of the 3Dh/3Fh/40h functions? Were you writing your programs in Malbolge?

Yes, I could easily understand why some programs needed that kludge. They could have easily added it to their own codebase. As temporary one or permanent one - but it would have been their choice. MS DOS 2.0 had no need for it, it could have easily fixed that problem with MS DOS 2.0 but it choose to be compatible. Windows suffered the same fate for the same reason.

Safename: restricting "dangerous" file names

Posted May 17, 2016 14:26 UTC (Tue) by farnz (subscriber, #17727) [Link]

I mean that when you're redoing the program to use Windows file APIs, instead of calling the DOS file APIs, you can lose the back compat behaviour - you're having to do major surgery to bring in a GUI instead of a CLI or TUI anyway, so this is just one more "port from DOS to a GUI" tax that shouldn't cost you much, relative to the huge cost of having to redo every user interface decision in the program to use Win16 calls instead of DOS calls; after all, when I rewrote the program for Linux (in C, instead of a mix of 8080 CP/M assembler that went through an automated translator to get 8086 DOS 1.0 code, hand-written 8086 machine code using DOS APIs - co-worker who thought that assemblers were likely to introduce bugs to his code - and 8086 assembler written for DOS native APIs), I had to find and change all these API calls anyway. In contrast, for a lot of apps, the move from DOS 1.0 to DOS 2.0 was supposed to just bring in directory support, nothing else.

And yes, we could have added it internally - but then we wouldn't have bothered adding in directory support (costs too much for the value provided), and would have stuck with the DOS 1.0 API until we ported to another platform. Arguably, had they done that, we'd have ported to a UNIX earlier than we did - when we looked at adding a GUI, the decision was made to move to Linux/X11 instead of Windows, because we already had X11 servers running on client machines, and we could thus run the program on controlled Linux servers instead of having users able to copy it to floppies and ruin our careful deployment strategy.

Safename: restricting "dangerous" file names

Posted May 12, 2016 13:27 UTC (Thu) by ksandstr (guest, #60862) [Link] (2 responses)

>I feel that having the name on disk is not the real problem: the problem occurs when the name is processed or used and triggers a bug.

Indeed. This is the same issue as UTF-8 filenames that encode slashes and zeroes without using either the byte for the forward slash, or the zero: programs that consume the name by decoding it to latin1 or normalized UTF-8 end up with slashes and early terminators in paths they generate.

I don't see how filtering the bytes could fix this, nor how the UTF-8 case could be addressed without decoding UTF-8 within the kernel (either to pop EINVAL, or to cause a program to be unable to refer a denormal name it knows from having created it).

Safename: restricting "dangerous" file names

Posted May 12, 2016 18:00 UTC (Thu) by mbunkus (subscriber, #87248) [Link]

I would hope that the "is valid UTF-8" check takes care of overlong encodings as overlong encodings aren't allowed by UTF-8. See https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

Safename: restricting "dangerous" file names

Posted May 12, 2016 18:03 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

This is the same issue as UTF-8 filenames that encode slashes and zeroes without using either the byte for the forward slash, or the zero: programs that consume the name by decoding it to latin1 or normalized UTF-8 end up with slashes and early terminators in paths they generate.

I don't see how filtering the bytes could fix this, nor how the UTF-8 case could be addressed without decoding UTF-8 within the kernel (either to pop EINVAL, or to cause a program to be unable to refer a denormal name it knows from having created it).

Just to be clear, safename does counter overlong encodings; just turn on the UTF-8 checking. If someone tries to create an overlong version (e.g., of '/') it will be rejected. That means that safename has to check every byte in the string, but it's quite fast. It only needs to check the new filename (not the whole path), and only when the filename is created (once it's created you're fine).

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:59 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

I feel that having the name on disk is not the real problem: the problem occurs when the name is processed or used and triggers a bug.

That's true to an extent, but Unix was created in 1971, and we still haven't managed to train users and developers to never make a mistake. Constructs like the glob "*.c" are actually dangerous in some circumstances; you're supposed to use ./*.c instead, but people still use the dangerous versions (many don't even know that they are dangerous). Since people still make mistakes, including mistakes caused by their failure to be omniscient, let's allow people to configure their system so the mistake can't have any bad effects.

Obligatory car example: Cars designed around the same time that Unix was designed ran just fine when people didn't make mistakes. However, those cars were death traps in an accident. Modern cars are designed to reduce damage when the inevitable accident occurs. I'm trying to bring this kind of thinking to Linux. This module means that when people make mistakes, the system is no longer a death trap; instead, it actively works to reduce or prevent the damage.

user tries to delete a file named -R and encounters surprising results.... would be foiled, because that file could not have been created in the first place.

Exactly! Hopefully that makes it clear why people might want this.

attacker submits a web form. Broken server side script splits name in 2 and executes half of it as a command... [this] problem still exists.

Correct, safename doesn't prevent that problem. But no single countermeasure solves all problems. What you really need is a set of countermeasures, each of which reduces risks. There are other countermeasures for accidental name-splitting, e.g., modifying shell scripts to eliminate space from IFS. But that IFS-based countermeasure doesn't handle leading "-" in filenames, while safename does. Combining countermeasures, such as safename and removing space from IFS, eliminates both of the problems you listed.

A risk that the file name filtering can trigger is seen in a recent Windows trojan which installs a directory with a reserved name (I think named com1: or similar). With the result that directory cannot be deleted. But this sounds less of a risk if the filtering is done cleanly.

That's a Windows-specific problem (that I do discuss on my page!). It's a nasty problem, but one that Unix and Linux don't have. I'll let Microsoft figure out how they want to deal with that problem in Windows :-).

Safename: restricting "dangerous" file names

Posted May 12, 2016 19:40 UTC (Thu) by dlang (guest, #313) [Link] (48 responses)

This wouldn't help the broken script/website problem. That happens before the filename is passed to the kernel.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:26 UTC (Thu) by viro (subscriber, #7872) [Link] (47 responses)

... and you end up with harder to understand rules, but hey - it's for chiiildrun^Wsafety reasons. Nevermind that it doesn't really fix the problem, every layer of obfuscation helps. Repeat it until you believe it, you'll feel so much safer... </sarcasm> IMO it's complete garbage. For example, it does not protect you from e.g. NFS server containing Bad Words(tm). Or fuse, for that matter. And that's aside of the lovely issues with the scripts dealing with e.g. tar t output, etc. <sarcasm> But it does make you feel good, and that's what security is all about, innit?

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

So what do you suggest to fix this? It's a very clear problem and bites pretty much everybody at least once.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:46 UTC (Thu) by viro (subscriber, #7872) [Link] (4 responses)

a) use proper quoting to deal with the situation when (it's when, not if) you run into weird names.
b) regression testing, including "how does it handle weird names" (c.f. the story about a directory on a Bell Labs system that kept acting as a minefield for all kinds of tree-walking code - very valuable thing, that).
c) don't use such names yourself outside of regression testing.

Incidentally, I would not assume that this EPERM/EINVAL/whatnot will be gracefully handled by the same kind of code. As to which class of misbehaviour ends up nastier... If you have any data (as opposed to anecdotes) concerning that, I'd love to see it.

Safename: restricting "dangerous" file names

Posted May 14, 2016 16:46 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

It at least means that shell scripts that use set -e don't need to care about insane names breaking things, nor do makefiles (which *cannot* compensate for e.g. filenames containing spaces, even if they want to: the tool will not allow it).

Safename: restricting "dangerous" file names

Posted May 14, 2016 17:33 UTC (Sat) by viro (subscriber, #7872) [Link] (2 responses)

If the tree where makefile lives contains attacker-writable directories, you are already almost certainly FUBAR; what attack scenario do you have in mind?

Safename: restricting "dangerous" file names

Posted May 18, 2016 22:24 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

I didn't have one, in particular: I was just pointing out that not all applications *can* use proper quoting, so you're reduced to hoping that people don't use names containing spaces, which seems a forlorn hope: I have multiple source trees on this system alone that contain such names (mostly thanks to MacOS X developers, I think).

Safename: restricting "dangerous" file names

Posted May 18, 2016 22:24 UTC (Wed) by nix (subscriber, #2304) [Link]

(Not, of course, that banning such names would help in the least: it would just stop me untarring the trees at all! And, of course, this proposal doesn't ban spaces in filenames by default in any case.)

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:40 UTC (Thu) by roc (subscriber, #30627) [Link] (40 responses)

Measures that block the exploitation of some bugs on some systems have some value. Whether that value is worth the extra complexity is a judgement call, but dismissing such measures as "doesn't really fix the problem" doesn't make sense unless you believe there's a measure of equal or lower complexity that *does* "really fix the problem". Is there?

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:42 UTC (Thu) by roc (subscriber, #30627) [Link]

And "users stop making mistakes" or "developers stop making mistakes" are not "equal or lower complexity".

Safename: restricting "dangerous" file names

Posted May 12, 2016 22:12 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> ... unless you believe there's a measure of equal or lower complexity that *does* "really fix the problem". Is there?

Well, we could take the opposite approach and create a kernel option to force *unsafe* filenames, randomly adding leading hyphens and embedded control characters, spaces, and other problematic byte sequences. That would quickly demonstrate which programs have issues in their filename-handling logic, perhaps allowing us to finally get to the root of the issue.

Only half joking. I have no problem with the "safename" concept so long as it isn't enabled by default, or used by applications as an excuse for poor filename hygiene.

Safename: restricting "dangerous" file names

Posted May 13, 2016 14:52 UTC (Fri) by dgm (subscriber, #49227) [Link] (37 responses)

Yes, it does make sense. Any non-solution to a problem becomes a problem by itself, just adding to the pile.

If you want to check file names for safety, then you have to check close to where the problem is. In the example about rm and a file called "-R", the right place to check is the shell. No other place makes sense: rm cannot do it (too late), and neither can the kernel (which cannot know about all the possible kinds of applications and their name restrictions).

So no, a half-assed solution is not better than no solution at all. Bugs must be fixed at the right place or they really are not.

Safename: restricting "dangerous" file names

Posted May 13, 2016 18:24 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (36 responses)

> In the example about rm and a file called "-R", the right place to check is the shell.

Why would the shell know that "-R" has special meaning to the "rm" command? Some commands (like "echo") would treat "-R" as a literal string. Also consider that commands exist which take options which do not start with a hyphen; for example, xterm gives special significance to arguments starting with "+", "%", and "#", in addition to "-".

The safest thing to do would be to require all pathnames to start with either "./" or "/", to distinguish them from non-pathnames.

Safename: restricting "dangerous" file names

Posted May 13, 2016 18:59 UTC (Fri) by farnz (subscriber, #17727) [Link] (29 responses)

Ultimately, the problem comes from two design decisions interacting:

All command parameters are just text - there's no "hint bits" to tell you whether a parameter is "user input" or "tab completed filename", or "glob expansion".
The shell does splitting, expansion, globbing etc., but has no idea what parameters actually mean. This enforces consistency ("$PROG $FOO bar*" has the same meaning regardless of $PROG), but stops the program from second-guessing you.

Neither of them are wrong, but when you put them together, 'rm' has no idea that the user typed "rm *", and it expanded to "rm -fr foom"; similarly, the shell does not know that '-fr' is a modifier from rm's point of view.

Safename: restricting "dangerous" file names

Posted May 14, 2016 8:50 UTC (Sat) by dlang (guest, #313) [Link] (20 responses)

for the record, tab completed filenames with spaces in them get escaped by the tab completion.

Safename: restricting "dangerous" file names

Posted May 17, 2016 12:01 UTC (Tue) by farnz (subscriber, #17727) [Link] (19 responses)

Spaces get escaped, yes, but (for example), there's nothing that tells rm that '-fr' in argv[1] was from the user typing '-<TAB>', and getting tab completion on the filename -fr, or that '--no-preserve-root' in argv[2] came from a completion function and is meant to be a command argument, not a filename.

Safename: restricting "dangerous" file names

Posted May 17, 2016 12:30 UTC (Tue) by anselm (subscriber, #2796) [Link] (18 responses)

I consider that a feature, not a bug.

Safename: restricting "dangerous" file names

Posted May 17, 2016 12:40 UTC (Tue) by farnz (subscriber, #17727) [Link] (17 responses)

It's both; it means that the shell and rm cannot communicate about how a particular element in argv[] came to be, so there's no way for rm to know how to disambiguate things that are both options and filenames. This, in turn, means that the command touch -- '-fr' leads to user surprise, as it means that rm * is now recursive.

This could have been avoided (back in deep UNIX history) by convention; options begin '-', and filenames begin '.' (with tab-completion of filenames thus producing './file' instead of 'file'). This would, however, have entailed the early shell authors deciding to enforce that convention; it's now long enough that muscle memory

Safename: restricting "dangerous" file names

Posted May 17, 2016 18:47 UTC (Tue) by flussence (guest, #85566) [Link] (16 responses)

> there's no way for rm to know how to disambiguate things that are both options and filenames

You demonstrated this isn't the case only half a sentence later...

Safename: restricting "dangerous" file names

Posted May 17, 2016 19:12 UTC (Tue) by farnz (subscriber, #17727) [Link] (15 responses)

I didn't. -fr is both a legitimate filename and an option to rm. The user can disambiguate them for rm by using a ./ prefix to say definitely a filename, but there is no equivalent for options, and the prefix is not required (nor is it put in there by shell expansions).

Thus the user can choose to be unambiguous, but if they are ambiguous, rm can't tell what they really meant - did I get -fr into argv[1] via glob expansion or tab completion (probably a filename) or by typing a literal -fr (probably an option).

Safename: restricting "dangerous" file names

Posted May 17, 2016 20:44 UTC (Tue) by hummassa (guest, #307) [Link] (14 responses)

You inadvertently came up with a nice and neat (IMHO) solution.

Just adjust your shell so that * and .* globs expand to ./* and ./.*, respectively

Safename: restricting "dangerous" file names

Posted May 17, 2016 20:51 UTC (Tue) by dlang (guest, #313) [Link]

just make sure you don't change foo* to foo./* in the process.

Safename: restricting "dangerous" file names

Posted May 18, 2016 9:59 UTC (Wed) by farnz (subscriber, #17727) [Link] (10 responses)

That's not enough; you need anything that the shell creates as a "filename" to expand to ./result; otherwise rm -* (typo) is also ambiguous. Plus, you need to do this before the current behaviour gets set in historic tradition, so that people don't write scripts containing things like rm -{f,r,v} and expect it to work.

Safename: restricting "dangerous" file names

Posted May 18, 2016 10:39 UTC (Wed) by itvirta (guest, #49997) [Link] (9 responses)

> ...scripts containing things like rm -{f,r,v} and expect it to work.

Uh, that's hideous. Luckily rm -frv usually works, and even rm -f -r -v is probably easier to write than that.
(Exactly the same amount of characters too.) So hopefully there's no need for that.

Safename: restricting "dangerous" file names

Posted May 18, 2016 10:48 UTC (Wed) by farnz (subscriber, #17727) [Link] (8 responses)

Unfortunately, because this has historically worked, it's now expected behaviour; I've worked with people who consider it the "definitive" style for options, and who prefer to do things like rm --{force,recursive,verbose} because they think that's clearer than rm --force --recursive --verbose. If that's deep in a script, my modified shell is going to break.

Hence coming back to my original point; had early UNIX authors foreseen this gotcha, they could have required relative paths to begin ./, and thus avoided all this pain down the line, because rm would never have been passed a filename without the ./. Now, though, it's too late - too much legacy to fix.

Safename: restricting "dangerous" file names

Posted May 18, 2016 13:05 UTC (Wed) by tao (subscriber, #17563) [Link]

Command line options is something that has always been broken, and is unlikely to ever unbreak.

tar, ar, ps, etc. don't even prefix their options with a dash (though you can).
Some commands require -- for long options, others don't.
Some commands enforce (or at least warn about) the ordering of options (find, for instance), others don't.

Simple fix: Only add ./ if bash know is it a file

Posted May 19, 2016 9:57 UTC (Thu) by gmatht (subscriber, #58961) [Link] (6 responses)

It it seems that there are only two cases
1) Bash knows that the glob is the file, because it had to scan the file system
2) Bash didn't have to scan the file system, so it safe to omit the './'

There are other ways adding ./ could break existing scripts though. For example, the following would no longer give you a list of naughty users:

cd /home; find * -name naughty.jpg | sed s,/.*,,g | sort -u

I guess "bash -n" could warn about unsafe use of "*", warn interactive users, and we could scan all packages for unsafe use of *.

Simple fix: Only add ./ if bash know is it a file

Posted May 19, 2016 9:59 UTC (Thu) by farnz (subscriber, #17727) [Link] (5 responses)

You missed case 3 - Bash did not have to scan the file system, but the user's intent was to match a file. For example rm log-{user,system}.txt. There's no way for Bash to detect that sanely without adding in a file system scan - but then the file system scan can cause bash to do the wrong thing for someone who does rm --{force,recursive}.

Simple fix: Only add ./ if bash know is it a file

Posted May 19, 2016 11:18 UTC (Thu) by anselm (subscriber, #2796) [Link] (2 responses)

Of course brace expansion, by definition, has nothing to do with existing files. There are lots of legimitate cases for brace expansion where the expansion results are file names that a file system scan won't uncover, because they don't exist. Consider

$ mkdir -p quarterly-results/201{5,6,7}q{1,2,3,4}

Doing a file system scan here to “validate” the expansion results would be completely pointless if not counterproductive.

Simple fix: Only add ./ if bash know is it a file

Posted May 19, 2016 11:31 UTC (Thu) by farnz (subscriber, #17727) [Link]

Indeed; hence me saying that this isn't actually fixable now. The chance to insist that paths, other strings, and options were distinct entities (with a - hint as first character of an option, and a . hint as the first character of a path, thus not needing a binary protocol to provide the hints) has long since gone away, especially since there are now programs like ps which use the presence or absence of the - to choose between different option parsers.

Simple fix: Only add ./ if bash know is it a file

Posted May 19, 2016 12:55 UTC (Thu) by tao (subscriber, #17563) [Link]

As an additional bonus this will yield different results depending on whether you run it on a basic POSIX-compliant shell or using bash.

touch a b
dash$ ls {a,b}
ls: cannot access '{a,b}': No such file or directory
bash$ ls {a,b}
a b

Case 3=2b

Posted May 20, 2016 4:22 UTC (Fri) by gmatht (subscriber, #58961) [Link] (1 responses)

Bash doesn't care whether log-user.txt is a file or an option. That is the application's job to decide. Scanning the file system wouldn't even help, for example: `tar x x`.

We don't know that log-{user,system}.txt is an option. We do know that log-{user,system}.txt is a fixed expansion, that doesn't directly depend on any untrusted filenames. So either way we can pass it directly to the application and it handle this ambiguity without worrying too much about the existence of files with malicious names tricking the application.

Case 3=2b

Posted May 20, 2016 8:15 UTC (Fri) by farnz (subscriber, #17727) [Link]

Exactly; if I meant it to be a filename, bash can't tell the application anything that hints that that was my intent. Equally, if I meant it to be an option, bash can't tell the application anything that hints that that was my intent.

Thus, this is currently an insoluble problem, without going back in time and changing the idioms for filenames and options such that filenames *always* began . or / (which then makes log-{user.system}.txt clearly not a filename, as it starts "l"), and options always began -; then, you reserve all other characters for parameters that are neither filenames nor options (e.g. PIDs and IP addresses). This is about 40 years too late now (I wasn't even born when the decisions were being made), so it's an insoluble problem because any shell trying to enforce this needs to cope with the legacy that's already out there.

Safename: restricting "dangerous" file names

Posted May 25, 2016 17:19 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> Just adjust your shell so that * and .* globs expand to ./* and ./.*, respectively

And then you do what we did, and blow up your system (well, we didn't exactly, but our backup system went mad...)

We installed a system ported across from Pr1mos. It used * everywhere as part of a filename ...

(* at the start of a name was sort of the equivalent of .exe in Windows. I'm not even sure it was stored in the directory tree on Pr1mos, but because they couldn't do whatever they did on Pr1mos, they put it in the directory tree on nix ...)

Cheers,
Wol

Safename: restricting "dangerous" file names

Posted May 25, 2016 20:55 UTC (Wed) by tao (subscriber, #17563) [Link]

Honestly, that kind of sounds like you did a bad port. The purpose of a port isn't to copy all stupid things from the source platform, but to adapt it to run well on the target system...

In the same manner I wouldn't limit filenames to 8.3 when porting from DOS to Linux, or allow backslash in filenames when porting the other way around...

Unix was a mistake

Posted May 17, 2016 6:00 UTC (Tue) by felix.s (guest, #104710) [Link] (7 responses)

Exactly. It's just one of many, many pitfalls resulting from the "everything is a text file" principle: when it's in force, one program passing data to another (here — a list of files) has to serialise it into text, which the other program has to parse back. Both serialisation and parsing introduce potential for misinterpretations, bugs and unaccounted corner cases (say, failing to escape things properly). The potential is amplified when each program has its own parsing code written from scratch, which is very tempting when you deal with text (I recall a case, not that long ago, of someone using fscanf() to parse /etc/mtab, introducing an escaping bug, and then being told to use getmntent() instead). This cost of serialisation and parsing may be tolerable for one-time uses (when you know all the inputs that your program will ever process), but unacceptable when writing software that lasts.

If I were to design an operating system, I'd have the shell interpret command line arguments and pass them to programs as flags, abstract objects representing file paths (something like O_PATH file descriptors) and opened files (ordinary file descriptors). Pipelines would be based around passing abstract objects, not meaningless blocks of bytes. Incidentally, this would also improve security (because programs no longer need to access arbitrary file system locations) and solve so many other big and little problems (say, <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19165>; the compiler would no longer output text, it'd output objects representing diagnostics).

I think some of the advantages of this approach may be actually attainable on Linux today, by means of passing file descriptors over sockets and memfd. But to actually get to the root of the problem, I think you'd need to redesign Unix from scratch.

Bah, I got a bit carried away. But I had to let off this steam somewhere.

Unix was a mistake

Posted May 17, 2016 6:37 UTC (Tue) by jem (subscriber, #24231) [Link]

Just as a side note: there is nothing in the Linux (or Unix) kernel that mandates that "everything is a text file". You could very well implement something like Windows Powershell. Pipes do not care if they convey binary data or text.

Unix was a mistake

Posted May 17, 2016 6:51 UTC (Tue) by viro (subscriber, #7872) [Link] (5 responses)

Yaaawn... Wake me up when you come up with mechanism for making that thing of yours strong-typed in any meaningful sense. And avoiding both the "rebuild the universe, we'd decided to change the type of that" problem and that of duhvelopers running wild with "extensible" APIs - the library API stability is already laughable due to that and you would apparently encourage that to even worse degree...

Unix was a mistake

Posted May 17, 2016 7:54 UTC (Tue) by dlang (guest, #313) [Link]

not to mention the idea that whatever object marshalling protocol you pick will be in sue in a few years, let alone a few decades.

Unix was a mistake

Posted May 17, 2016 9:18 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Microsoft actually pulled it off with PowerShell. It's a very nice strongly typed shell language as the result.

Unix was a mistake

Posted May 17, 2016 14:56 UTC (Tue) by edgewood (subscriber, #1123) [Link] (1 responses)

And it's so nice that MS decided to create a Linux emulation layer so that developers could run bash.

Unix was a mistake

Posted May 17, 2016 15:07 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

This is to entice Linux developers to use Azure more. Powershell is pretty nice regardless of that.

Unix was a mistake

Posted May 17, 2016 17:46 UTC (Tue) by dlang (guest, #313) [Link]

We'll see how well PowerShell stands the test of time, it's only been out for a couple releases now, and it's only the the last release that it became more than a toy. With Windows 10/Server 2016 we will have the second release with a full-blown PowerShell, but I am far more concerned about things a release or two down the road from that.

Safename: restricting "dangerous" file names

Posted May 17, 2016 11:56 UTC (Tue) by dgm (subscriber, #49227) [Link] (5 responses)

Because the shell is *the* interface to the command. The shell knows that it is calling rm instead of echo, and in fact, because the shell being the one responsible for gobbling, so it's the shell who should be asking for confirmation for "suspicious" names (like those starting with a dash), or ask for them to be disambiguated by using the full path or a preffix (./ as you suggest).

Safename: restricting "dangerous" file names

Posted May 17, 2016 14:56 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (4 responses)

> Because the shell is *the* interface to the command. The shell knows that it is calling rm instead of echo...

To the shell, any non-builtin command (including "rm") just takes a list of strings. The shell doesn't (and shouldn't) have any knowledge of what these external commands do, or how they interpret their arguments. The convention about "special" arguments starting with a hyphen would be a heuristic at best, one with both false positives and false negatives. Prompts might be useful for an interactive shell, but would not be appropriate for most scripts—which are where the majority of the problem lies. Non-fatal warnings would either be a recurring nuisance when the arguments are known to be safe for a particular command or would come too late, informing the user of the issue after the damage is already done.

There is one safe and accurate solution: On the command side, implement the "--" protocol for marking the end of the "special" options; and when invoking commands through the shell, always use "--" and quote your variables. Use bash arrays and the "${name[@]}" syntax rather than relying on IFS if you need to manipulate lists of files. E.g.:

> # Instead of rm -f $(list-generator-command)
> files=()
> while read -r; do files+=("$REPLY"); done < <(list-generator-command)
> rm -f -- "${files[@]}"

This won't handle filenames containing newlines. I do not know of any way to read NUL-delimited input (as in "find -print0" or "xargs -0") from a shell script, which would be the most obvious way around that restriction. However, it should be proof against anything else that might occur inside a filename.

Safename: restricting "dangerous" file names

Posted May 18, 2016 12:22 UTC (Wed) by NAR (subscriber, #1313) [Link] (2 responses)

The shell doesn't (and shouldn't) have any knowledge of what these external commands do, or how they interpret their arguments.

Actually the shell has some kind of this knowledge, used for completion like at https://github.com/scop/bash-completion.

Safename: restricting "dangerous" file names

Posted May 18, 2016 18:14 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Actually the shell has some kind of this knowledge, used for completion like at https://github.com/scop/bash-completion.

The shell provides a framework for programmable completion as an aid to interactive users. It doesn't have any of that knowledge built in, the database may be incomplete or inaccurate, and even in an interactive setting the programmable completion feature may not be enabled. (I personally find it more annoying than helpful and prefer the more predictable, command-independent traditional completion, but tastes vary.) It also isn't generally available to scripts, which is where the problem lies.

Safename: restricting "dangerous" file names

Posted May 19, 2016 11:11 UTC (Thu) by anselm (subscriber, #2796) [Link]

The shell knows a few things about replacing bits of text with other bits of text. It doesn't really know (or care) what these bits of text mean, and what little knowledge it does have is only used for constructing command lines, not for making sense of them once they're there, which is what this discussion seems to be about.

Safename: restricting "dangerous" file names

Posted May 18, 2016 18:07 UTC (Wed) by flussence (guest, #85566) [Link]

I'd like it if more programs made use of filehandles and envvars. Having a different API as the boundary between input types instead of parsing magic tokens in argv just seems saner.

Just as one example, imagine `find` having an -execv [envvar_name] option instead of having to deal with its horrible inline syntax...

Safename: restricting "dangerous" file names

Posted May 12, 2016 9:21 UTC (Thu) by dlang (guest, #313) [Link] (11 responses)

This doesn't prevetn the badguy from createing a filesystem with the 'bad' name offline and adding it to the system.

What happens if you are trying to get access to the contents of a tar or zip file and some file in it has a bad name?

so many ways this can go wrong..

Safename: restricting "dangerous" file names

Posted May 12, 2016 12:47 UTC (Thu) by matthias (subscriber, #94967) [Link] (10 responses)

It is obvious, that this security module will not fix all problems with filenames. This security module should make it much harder to exploit a common sort of bugs. I would guess that it can succeed in that goal. Of course it would be preferable to have bug free scripts and programs, but this is just wishful thinking.

Regarding your eaxmples:
Usually, a badguy should not be able to add filesystems to the system. If the badguy can, you probably have other problems.

If you are accessing the contents of a tar, then you either have to untar the archive, in which case the module would prevent the creation of those files. Or you access the contents by the means of read() or mmap() and some library for decompressing the contents in RAM. In this case, the filenames usually do not end up being added to some commandline due to shell expansion.

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:37 UTC (Thu) by dlang (guest, #313) [Link] (9 responses)

so if you start blocking untarring files because they contain names that aren't valid in the current systemwide locale setting, you set yourself up for all sorts of problems in interoperability.

The world is not all UTF-8, let alone UTF8-en.

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:40 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (8 responses)

And if you permit untarring files containing byte sequences that aren't valid in the current systemwide locale you're going to run into interoperability issues. In the absence of explicit metadata about the expected encoding, the only way to guarantee interoperability is to use either UTF-8 or ASCII (which is a subset of UTF-8 anyway)

Safename: restricting "dangerous" file names

Posted May 12, 2016 19:42 UTC (Thu) by dlang (guest, #313) [Link] (7 responses)

UTF-8 is not universally compatible.

The issue isn't just with what you choose to use, but with what everyone else you deal with chooses to use.

Safename: restricting "dangerous" file names

Posted May 12, 2016 19:53 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (6 responses)

If I'm running a system in UTF-8 and you give me a tarball that contains ISO-8859-1 filenames, we're going to have a bad time - a bunch of tooling is going to refuse to deal with them. There is no meaningful interoperability between different encodings, and choosing to enforce valid UTF-8 filenames on UTF-8 systems doesn't change that.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:01 UTC (Thu) by dlang (guest, #313) [Link] (5 responses)

actually, most of your tooling will work just fine, it's just the display/input of the filename that will be the problem.

I've dealt with "unable to display" filenames with tab completions and wildcards in the past. It's not fun, but it works. And the programs really don't care what the byte-string that is the filename looks like.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:07 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (4 responses)

There's plenty of tooling that will fail due to using library APIs that validate whether a string is conforming UTF-8 when in a UTF-8 locale. The idea that everything will be happy with arbitrary sequences of bytes as filenames is simply a myth. Better to flag the situation at file creation time rather than wait for something to mysteriously fail later on.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:12 UTC (Thu) by dlang (guest, #313) [Link] (3 responses)

I would consider any tool that does this broken.

This includes things like Python3 that decides that all strings must be UTF8

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:15 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (2 responses)

> I would consider any tool that does this broken.

That doesn't alter the fact that they exist, and as such there's no expectation of interoperability between archives with legacy encodings in filenames and modern UTF-8 systems.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:23 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

> there's no expectation of interoperability between archives with legacy encodings in filenames and modern UTF-8 systems.

except that the users expect things to work. they don't expect that their new update will make it impossible to work with others.

UTF is supposed to make things easier, not cut them off.

the "modern utf-8 systems" are the ones with the problems. Thank goodness there are still options and not everyone ignores backwards compatibility.

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:29 UTC (Thu) by mjg59 (subscriber, #23239) [Link]

Users do not expect it to work because it simply does not work. The best way to handle archives with non UTF-8 filenames is to convert the filenames during the extraction process. But if you're in a constrained environment where you're sure that all your tooling can cope with this case, just don't turn the feature on?

Safename: restricting "dangerous" file names

Posted May 12, 2016 12:08 UTC (Thu) by ssl (guest, #98177) [Link] (4 responses)

permitted_bytes_initial: The permitted set of characters for the first byte of the file name, the default is 33-44,46-125,128-254, which omits control characters, space, hyphen, tilde, delete (0x7f), and 0xff.
permitted_bytes_middle: The permitted set for the characters of the file name that are not the first or last (so file names of one or two characters are not subject to these requirements). By default, the value is 32-126,128-254, which leaves out control characters, delete, and 0xff.

Oh yeah, this will break if any Japanese, Egyptian or Georgian (just as examples) users would like to name files using their native scripts.

Safename: restricting "dangerous" file names

Posted May 12, 2016 12:25 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (3 responses)

> Oh yeah, this will break if any Japanese, Egyptian or Georgian (just as examples) users would like to name files using their native scripts.

How? The filtered bytes are never used in UTF-8 representations of any characters other than space, hyphen and tilde.

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:14 UTC (Thu) by sfeam (subscriber, #2841) [Link] (2 responses)

It is not clear to me from the article whether this filter is only applied after testing the "utf8" Boolean or if it is an alternative mechanism. If the filter is applied to non-utf8 encodings it will definitely cause problems. For example 0xff is the representation of cyrillic character я in encoding iso-8859-5. I learned this the hard way when trying to pin down the cause of bug reports of premature file termination errors on read. And yes, there apparently are linux users in that part of the world whose native environment is iso-8859-5 rather than UTF-8.

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:32 UTC (Thu) by mjg59 (subscriber, #23239) [Link]

The reasonable set of characters on a system is inherently going to depend on the character sets used on that system. Defaulting to UTF-8 is completely reasonable - distributions that support legacy character sets can change the values as appropriate.

Safename: restricting "dangerous" file names

Posted May 12, 2016 17:39 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

Safename requires that new filenames pass all the tests, and the system admin can configure the tests. If you use iso-8859-5 as your filename encoding, then you should leave the UTF-8 checking off, and possibly change the set of allowed bytes (as noted in the article, you can configure which bytes are allowed at the beginning, middle, and end).

I would encourage you to migrate to UTF-8 for filename encoding; it works everywhere, and desktop environments are increasingly requiring it. I believe the vast majority of users already encode filenames using UTF-8; in those cases, they can turn on the UTF-8 encoding checking, and be sure that the filenames really are valid UTF-8. There are even tools to help you automatically transition to UTF-8.

Safename doesn't require that you use UTF-8, though. Its byte checking is there specifically so it support checking non-UTF-8 values.

Safename: restricting "dangerous" file names

Posted May 12, 2016 12:38 UTC (Thu) by robbe (guest, #16131) [Link] (3 responses)

I was interested if there were any files on my desktop that would be forbidden.

I found two with CR at the end, two html files from boost’s documentation begin with a tilde (destructors?). Pretty harmless.

Files names with a dash as their first character are by far the most numerous. systemd’s usage of -.mount and -.slice will run into trouble under this module.

Safename: restricting "dangerous" file names

Posted May 12, 2016 19:55 UTC (Thu) by error27 (subscriber, #8346) [Link] (2 responses)

I ran `find -name \[~-]\* 2> /dev/null` on my system.

I get 169 files.

The systemd files you mentioned.
Some downloaded javascript from when I saved a webpage in firefox
A bunch of files in format NAME-000-999 where the NAME part is missing.
2 auto-generated file from running trinity
A few files called "-1.jpg" or -1.png
-13-degree-weather-has-brought-chicagos-ohare-airport-to-a-n.jpg
A -.orig file that patch created by accident
13 files that originally came from Windows and start with ~$ like "~$hool Policies Manual.doc"

I'm in favour of this change but it could be slightly annoying to transition. Except for the minus one files, all the bad names look to be auto-generated. Some are generated by linux programs which can be fixed but a lot were auto-generated on someone else's system.

Safename: restricting "dangerous" file names

Posted May 12, 2016 22:38 UTC (Thu) by joey (guest, #328) [Link] (1 responses)

Leading and trailing spaces are also not unusual in file names. It's easy to not notice them when editing the name of a file in a desktop environment.

Safename: restricting "dangerous" file names

Posted May 21, 2016 20:30 UTC (Sat) by Wol (subscriber, #4433) [Link]

Especially when it's deliberate :-)

PI-Open used to create directories (which the OS was not supposed to muck about with the contents of) which contained a whole bunch of files, whose name format was <space><backspace><number>.

It did that because Pr1mos (from which it was ported) had "segmented directories". Basically, a directory with no space for the filename, so files had to be referenced by offset number. For the most part, those directories were meant for programs - each file was mapped to a memory segment when the program was loaded (hence "segmented directory"), but as programmers will, plenty of us found other good uses for them :-)

Cheers,
Wol

Safename: restricting "dangerous" file names

Posted May 12, 2016 20:10 UTC (Thu) by flussence (guest, #85566) [Link] (2 responses)

I'd be all for this if it had the option of consulting a userspace daemon over netlink, or something like that. The kernel is never going to be the right place to handle all the things that can go wrong with a UTF-8 locale like homoglyphs, non-ASCII control characters or bad normalization. Expecting user software to get it right individually isn't a productive use of effort either.

Safename: restricting "dangerous" file names

Posted May 12, 2016 21:32 UTC (Thu) by wahern (subscriber, #37304) [Link] (1 responses)

"Expecting user software to get it right individually isn't a productive use of effort either."

Expecting user software to get it right is absolutely necessary. It's not sufficient by itself, but it's by far the most productive strategy available.

Opinions on mitigations can reasonably vary, especially because not all mitigations are equal. But I simply cannot abide an opinion which discounts the value and imperative of encouraging correctly written software, insofar as correctness is definable. Fortunately, there are absolutely correct ways to securely handle filenames, in large part because the vulnerabilities are so well known and identifiable, at every layer and dimension of the software stack. And there are plenty of best practices to drawn upon, albeit some more debatable than others.

Nobody can prevent someone from writing bad software, but experienced engineers can certainly shame and cajole and encourage both the implementation of and informed selection of correct software. Just look back to the quality of software circa 2000 to today. Worlds apart. Still horribly bug ridden, but at least you can cobble together small network services with a much higher degree of confidence. Remote exploits are much less common in low-level software. Good luck finding an RCE in a modern TCP/IP stack. I'm sure they're there, but not as many as there used to be even though the stacks are much more complex. And that's because of a push for correctness, not mitigations. We're incalculably more secure because of that approach.

Safename: restricting "dangerous" file names

Posted May 13, 2016 3:59 UTC (Fri) by david.a.wheeler (subscriber, #72896) [Link]

Expecting user software to get it right is absolutely necessary. It's not sufficient by itself, but it's by far the most productive strategy available.

Clearly this is it's not one or the other. It's not possible to mitigate all possible problems. But when errors have a common pattern, a countermeasure can be helpful.

that's because of a push for correctness, not mitigations. We're incalculably more secure because of that approach.

No, it's because people have worked on both correctness and mitigations. My guess is that more of the improvements have come from platform mitigations, not from efforts focused on correctness. Many programmers still have no idea how to write correct programs. But we have tools that mitigate against common errors. Most programming languages prevent buffer overflows. Many web application framework now automatically protect against SQL injection and cross-site scripting. Address space layout randomization and stack canaries also counter many buffer overflows. These changes have little to do with developers knowing writing correct programs, and everything to do with embedding mitigations into your system. Don't get me wrong, I think it is extremely important to focus on writing correct programs. This is not an either/or situation! But no one is perfect, not even people who know what they're doing, so it's important to also have the underlying system mitigate common problems when it is reasonable to do so.

Safename: restricting "dangerous" file names

Posted May 12, 2016 21:33 UTC (Thu) by micka (subscriber, #38720) [Link] (12 responses)

I'm probably asking the obvious, but the answer doesn't come to me. Why is the tilde excluded as first character?

Safename: restricting "dangerous" file names

Posted May 12, 2016 21:38 UTC (Thu) by sfeam (subscriber, #2841) [Link] (10 responses)

Presumably because the shell would expand ~foo to the login directory of user "foo".

Safename: restricting "dangerous" file names

Posted May 12, 2016 22:33 UTC (Thu) by joey (guest, #328) [Link] (7 responses)

So why is $ not excluded as a first character?

Why not other chars?

Posted May 13, 2016 4:04 UTC (Fri) by david.a.wheeler (subscriber, #72896) [Link]

You could exclude other characters/bytes if you want to. The set of permitted bytes os configurable at run-time.

Safename: restricting "dangerous" file names

Posted May 13, 2016 14:42 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (5 responses)

To prevent expansion of environment variables you would need to block $ everywhere, not just in the first character. So far as I can tell this only affects unquoted paths passed to the shell as part of a command-line; shell metacharacters produced by the expansion of unquoted environment variables or glob patterns are treated as if they were quoted, leaving only word-splitting as a potential source of issues.

$ x='test'
$ a='~user bad$x'
$ echo $a
~user bad$x
$ eval "echo $a"
/home/user badtest

I can see some logic in blocking control codes, leading and trailing whitespace, and non-UTF-8 filenames (for systems with UTF-8 locale), but would stop short of trying to restrict all potential shell metacharacters.

Note that effectively detecting whitespace and control codes in UTF-8 filenames is significantly more complicated than just matching certain bytes; the filter will need to be Unicode-aware.

Safename: restricting "dangerous" file names

Posted May 15, 2016 9:51 UTC (Sun) by neilbrown (subscriber, #359) [Link] (4 responses)

> shell metacharacters produced by the expansion of unquoted environment variables or glob patterns are treated as if they were quoted

Not quite. I agree that leading "~" and "{a,b}" are not expanded after variable substitution, but '*' and '?' and '[...]' are.
% a='*'
% echo $a

At least, that is the case for "bash". For "csh" the rules are different:

% set a="~neilb"
% echo $a
/home/neilb
% echo "$a"
~neilb

Not that anyone would actually write a script using csh would they! Would they??

Safename: restricting "dangerous" file names

Posted May 15, 2016 18:33 UTC (Sun) by nybble41 (subscriber, #55106) [Link] (3 responses)

> Not quite. I agree that leading "~" and "{a,b}" are not expanded after variable substitution, but '*' and '?' and '[...]' are.

Good catch. I really didn't expect to find the shell performing pathname expansion when all the other forms were inhibited, but a more careful reading of the manual page shows that this occurs last, after word-splitting. Tilde expansion occurs in the same pass as parameter expansion, as do arithmetic expansion, command substitution, and process substitution, while brace expansion occurs earlier. Only word-splitting and pathname expansion come afterward.

This can be controlled with "set -f" in bash (disabling pathname expansion), but then the script can't use glob patterns at all. It would be nice to have an explicit pathname expansion syntax that worked independently of "set -f", but this doesn't appear to exist, so scripts have to choose between safe(r) parameter expansion and the ability to use glob patterns. Of course, the safest thing is to only expand parameters inside quoted strings, which prevents both word-splitting and pathname expansion, but this requires extra effort in the normal case rather than the exceptional case.

> Not that anyone would actually write a script using csh would they! Would they??

Hopefully no one is writing *new* scripts for csh, but I have seen a few in legacy environments.

Safename: restricting "dangerous" file names

Posted May 16, 2016 14:50 UTC (Mon) by cortana (subscriber, #24596) [Link] (2 responses)

I wonder what the world would look like if there was a special built-in 'glob' for enabling globbing.

So 'rm *' removes one file with the name '*' but 'glob rm *' expands to 'glob rm foo bar baz' and so on.

You could have different builtins for different kinds of expansions (or flags to the glob built-in).

Safename: restricting "dangerous" file names

Posted May 17, 2016 18:37 UTC (Tue) by flussence (guest, #85566) [Link]

It's already doable:

$ set -f; glob() { set +f; $@; set -f; }
$ ls -d /proc/sys/*
ls: cannot access '/proc/sys/*': No such file or directory
$ glob !!
glob ls -d /proc/sys/*
/proc/sys/abi    /proc/sys/dev  /proc/sys/kernel  /proc/sys/vm
/proc/sys/debug  /proc/sys/fs   /proc/sys/net

The question is, whether enough people could be convinced to work this way to make a difference.

Safename: restricting "dangerous" file names

Posted May 20, 2016 22:11 UTC (Fri) by Wol (subscriber, #4433) [Link]

> I wonder what the world would look like if there was a special built-in 'glob' for enabling globbing.

This sounds like Pr1mos ... :-)

The Pr1mos shell had expansion and completion, and it also had ways for programs to communicate to/from the shell. It's so long ago, I've forgotten the details, but there was "-verify" and "-confirm" or something like that. So if I typed the command

DELETE @@ -NO_CONFIRM

the @@ would expand to all the files in the directory. DELETE told the shell that its defaults were both verify and confirm, so as the shell expanded the @@, it would ask me to verify the expansion. But because I'd said no_confirm, the shell would then execute the command without asking.

Okay, it relied on the guy who wrote DELETE to get it right, but because you set flags in the executable that the shell picked up, it was pretty flexible. Obviously, something non-dangerous like COPY would default to no_verify no_confirm.

I've always felt that Unix completion is crippled compared to Pr1mos. But then, it is Eunuchs - a castrated Multics. Pr1mos was a Multics-derivative too :-)

Cheers,
Wol

Safename: restricting "dangerous" file names

Posted May 13, 2016 7:20 UTC (Fri) by micka (subscriber, #38720) [Link]

Ah, right!

Safename: restricting "dangerous" file names

Posted May 15, 2016 7:55 UTC (Sun) by robbe (guest, #16131) [Link]

I thought safenames was about programs *other* than the shell?

Because, as others have rightly pointed out, you’d have to block tons of stuff to protect from all shell metacharacters.

Which programs (except the shell) interpret ~user in a special way? I think, compared to hypthen, this is negligable.

Safename: restricting "dangerous" file names

Posted May 12, 2016 22:40 UTC (Thu) by viro (subscriber, #7872) [Link]

POSIX:
=========
2.6.1 Tilde Expansion

A "tilde-prefix" consists of an unquoted <tilde> character at the beginning of a word, followed by all of the characters preceding the first unquoted <slash> in the word, or all the characters in the word if there is no <slash>. In an assignment (see XBD Variable Assignment), multiple tilde-prefixes can be used: at the beginning of the word (that is, following the <equals-sign> of the assignment), following any unquoted <colon>, or both. A tilde-prefix in an assignment is terminated by the first unquoted <colon> or <slash>. If none of the characters in the tilde-prefix are quoted, the characters in the tilde-prefix following the <tilde> are treated as a possible login name from the user database. A portable login name cannot contain characters outside the set given in the description of the LOGNAME environment variable in XBD Other Environment Variables. If the login name is null (that is, the tilde-prefix contains only the tilde), the tilde-prefix is replaced by the value of the variable HOME. If HOME is unset, the results are unspecified. Otherwise, the tilde-prefix shall be replaced by a pathname of the initial working directory associated with the login name obtained using the getpwnam() function as defined in the System Interfaces volume of POSIX.1-2008. If the system does not recognize the login name, the results are undefined.

The pathname resulting from tilde expansion shall be treated as if quoted to prevent it being altered by field splitting and pathname expansion.
=========

Mind you, there are other shell metacharacters they hadn't excluded, but when it comes to security theatre ours is not to wonder why...

Safename: breaking compatibility beteen system

Posted May 18, 2016 19:40 UTC (Wed) by ballombe (subscriber, #9523) [Link] (15 responses)

This proposal will break break basic compatibility between systems by having each of them have different rules whether to accept or reject a filename.

Safename: breaking compatibility beteen system

Posted May 18, 2016 22:00 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

This is true of many security features - policy will vary between systems.

Safename: breaking compatibility beteen system

Posted May 19, 2016 11:09 UTC (Thu) by anselm (subscriber, #2796) [Link] (12 responses)

Basic compatibility between systems is covered by POSIX, and POSIX makes very restrictive assumptions as far as file names go. As long as you adhere to those (and there isn't really a compelling reason not to, even if you're only targeting Linux), you should be safe.

Safename: breaking compatibility beteen system

Posted May 19, 2016 17:40 UTC (Thu) by micka (subscriber, #38720) [Link] (1 responses)

Let's take a look.
No accented character.
No, thank you.

Safename: breaking compatibility beteen system

Posted May 20, 2016 16:53 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Almost a haiku there.

Let us see here now.
No accented characters?
No, thank you, kind sir.

Safename: breaking compatibility beteen system

Posted May 19, 2016 18:17 UTC (Thu) by hummassa (guest, #307) [Link] (9 responses)

> and there isn't really a compelling reason not to, even if you're only targeting Linux

Users want to have spaces on their filenames. And accented chars (especially in non-english languages, that need lots of accented vowels). And commas and parentheses. All of those are forbidden by POSIX.

Not only that: even if a certain user (like me) does not like extraordinarily strange chars in his filenames, it will encounter tarballs/zips with such filenames on it, or use a mainstream web browser that appends the non-extension part of a filename with space-openparentheses-number-closeparentheses when a file with the same name already exists on the download directory (if 'file.txt' already exists, it will try to create 'file (1).txt'). Etc, etc.

Safename: breaking compatibility beteen system

Posted May 20, 2016 7:35 UTC (Fri) by jem (subscriber, #24231) [Link] (8 responses)

>Users want to have spaces on their filenames. And accented chars (especially in non-english languages, that need lots of accented vowels). And commas and parentheses. All of those are forbidden by POSIX.

Hear, hear! The proposal to enforce stricter limits on file names would severly limit, say, how a user can name documents produced with a word processor. And yet it is not the word processor's fault, but mostly because of an unrelated program: the shell. We need to go to the root of the problem: what we need is a new shell. A shell that can handle file names as a collection of names without the risk of the names being split up just because they contain spaces. A shell where the collection of names is just that, and won't be mixed up with command options. A shell where zero is as good a number as any other number, i.e. an empty collection of names does not get special treatment (replaced with "*", for instance.)

Another solution would be to invent some new storage for documents which is off limits for the shell.

Safename: breaking compatibility beteen system

Posted May 20, 2016 8:06 UTC (Fri) by NAR (subscriber, #1313) [Link] (2 responses)

In my experience using accented characters always creates problems down the line. There's the problem of different encodings, so if the file was created using ISO-8859-2 encoding, but the terminal/application uses Unicode, the characters won't show properly. If the file was created using Unicode, it won't show properly when the application uses Code Page 852. If the file was created using Code Page 852, it won't show properly when the terminal is set to ISO-8859-2. Sometimes the language-specific encoding is not even available (or badly configured), so users get "inventive" and use õ or ô instead of ő and of course it won't show properly in ISO-8859-2. And then there's the problem when the given computer does not provide ways to enter accented characters, so one can't even type the name of that file.

Accented characters are useful for e-mail, web pages, formatted text files where the encoding can be specified, but the filenames lack this information.

Safename: breaking compatibility beteen system

Posted May 20, 2016 12:34 UTC (Fri) by jezuch (subscriber, #52988) [Link]

> In my experience using accented characters always creates problems down the line. There's the problem of different encodings, so if the file was created using ISO-8859-2

Yeah, yeah, yeah. It used to be like that... in the '90s. I mean, I live in a country where a proper encoding is vital to all our ą's and ź's. It's all in the past now, dead and buried. (Except for an occasional system made by occasional clueless uni-lingual Americans ;) )

Safename: breaking compatibility beteen system

Posted May 21, 2016 23:23 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

I just use UTF-8 and say "boo hoo" to things which don't support it. Which means I send a Tarbell to the phone and extract music files there because MTP is encoding-stupid. And it sends as a single file which is much faster.

Safename: breaking compatibility beteen system

Posted May 25, 2016 17:07 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

You would also need to replace make(1) et al (and what with? ant? no thank you very much.)

Safename: breaking compatibility beteen system

Posted May 25, 2016 18:12 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

Why not Ninja?

https://ninja-build.org/manual.html

Safename: breaking compatibility beteen system

Posted Jun 1, 2016 20:08 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Ninja is explicitly not intended as a general-purpose make replacement. :)

Safename: breaking compatibility beteen system

Posted Jun 1, 2016 22:52 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

While I agree, for software building, it fits the bill just fine. For weird things like triple-pass LaTeX builds or batch automation, keep writing Makefiles :) .

Safename: breaking compatibility beteen system

Posted May 26, 2016 12:52 UTC (Thu) by madscientist (subscriber, #16861) [Link]

You don't need to replace make. You can just set the make SHELL variable to point to whatever shell replacement you prefer.

Safename: breaking compatibility beteen system

Posted May 20, 2016 17:07 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Form we already have this between different filesystems (e.g., vfat, mtp, etc.)?