|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 25, 2009 20:50 UTC (Wed) by clugstj (subscriber, #4020)
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

I would have to disagree.

Most of the things he wants to force on everyone are already available by convention. What is the benefit of disallowing other usages? If you want to imagine that all your filenames are UTF-8, go ahead, who's stopping you! The UNIX kernel contains as little policy as possible. This results in it being more simple than it would be otherwise. Yes, this is a double-edged sword, but doing the things he suggests are not an automatic win.


to post comments

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 25, 2009 23:34 UTC (Wed) by dwheeler (guest, #1216) [Link]

Sure, almost all files already follow these conventions. Except when they don't. And when they don't, millions of programs subtly stop working. Everyone who does "find . blah | stuff" is writing bad code, because filenames can contain newlines deep in the directory tree. If we get rid of the nonsense, then it's easy to write correct programs; today, it takes herculean effort, and few people do so. It's a double-edged sword, but users get cut by both sides. I have yet to find a real use case for including control characters in filenames, for example, but plenty of reasons why it shouldn't ever happen.

Conventions are great! Let's go back to FAT!

Posted Mar 26, 2009 8:18 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

Most of the things he wants to force on everyone are already available by convention. What is the benefit of disallowing other usages?

What's the benefit of all these ACL's? Traditional Unix permissions, capabilities, POSIX Acls, memory protection, etc. You can just use conventions for that. And if someone violates convention he or she can be fired.

This is you proposal in nutshell - and it just does not work.

Conventions are great! Let's go back to FAT!

Posted Mar 26, 2009 13:38 UTC (Thu) by clugstj (subscriber, #4020) [Link] (2 responses)

Wow, it does not work? Apparently UNIX is completely broken. And ACL's are so complicated and a drain on performance as to be nearly useless - which is why they are not used much.

Shell scripts are where this is the biggest problem. I do shell scripting for a living and don't see this issue as being anywhere near as big a problem as Mr. Wheeler thinks it is.

Also, I'm completely confused by your title. I suggest conventions and then you suggest, perhaps facetiously, FAT (which is not a convention, but enforcement of a very stupidly limited set of possible filenames).

Conventions are great! Let's go back to FAT!

Posted Mar 26, 2009 14:07 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

Wow, it does not work?
Nope.
Apparently UNIX is completely broken.
Nope. UNIX is not broken. Your head, on the other hand, is.
And ACL's are so complicated and a drain on performance as to be nearly useless - which is why they are not used much.
Traditional unix permissions are used on most systems - and are ARE ACL's too. They are quite limited but often adequate - that's why other forms are not used much. Still they are deficient in many situations and other forms are used more and more.
Shell scripts are where this is the biggest problem. I do shell scripting for a living and don't see this issue as being anywhere near as big a problem as Mr. Wheeler thinks it is.

Number of correct scripts is not important metric. Number of bad scripts is. And it's MUCH higher then warranted: I've fixed tons of scripts which failed on names with spaces, files with dash in first position, etc. If such files are excluded from the start life will be much easier.

Also, I'm completely confused by your title. I suggest conventions and then you suggest, perhaps facetiously, FAT (which is not a convention, but enforcement of a very stupidly limited set of possible filenames).

I propose FAT as a way to get rid of these pesky ACLs. It's one of the few filesystems today with any form of access control (except read-only flag). We can extend it to allow all forms of filenames - it's not hard. Or we can just run all programs with UID==0 - it'll give us the same flexibility. Somehow noone wants to go in this direction, though.

Conventions are great! Let's go back to FAT!

Posted Mar 29, 2009 21:44 UTC (Sun) by clugstj (subscriber, #4020) [Link]

"UNIX is not broken. Your head, on the other hand, is"

Wow, childish personal attacks. How droll.

"Number of correct scripts is not important metric. Number of bad scripts is"

I would think that the percentage of each would (possibly) be a useful metric. But, what is the damage from these "bad scripts"? If you are writing shell scripts that MUST be absoutely bullet-proof from bad input, perhaps because they run setuid-root, then you are already making a much worse mistake than the possible bugs in the script.

Still don't understand the FAT reference. Sorry, maybe I'm just slow.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 9:51 UTC (Thu) by epa (subscriber, #39769) [Link] (6 responses)

'By convention' files do not contain control characters. The problem is that you cannot rely on convention when writing robust, secure software. Either you put in endless sanity checks which cruft up your code and are liable to be forgotten, or you end up with subtle bugs that are tickled by the existence of files called '>foo' or '|/bin/sh' or countless other variations.

Such bugs are made more insidious by the fact that 'by convention', they cannot ever be triggered. But for someone trying to make a working exploit, or widen a small security hole into a larger one, convention is no barrier.

If you want to have certainty that your code works correctly, 100% of the time, no ifs and no buts - rather than just waving your hands and hoping that everyone else in the world makes filenames that follow the same convention as you - then you need a guarantee that the assumptions you make are guaranteed to be true.

If you want to imagine that all your filenames are UTF-8, go ahead, who's stopping you!
You could equally well say that disk quotas are not needed; if you want to limit yourself to use 100 megabytes of space, who's stopping you? Indeed what is the point of file permissions - if you want to pretend that all your files are read-only, who's stopping you? And why should the kernel forbid hard links to directories - surely it should be up to the user to decide whether their filesystem is a tree or a general DAG, and the kernel should not enforce this policy.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 27, 2009 19:23 UTC (Fri) by drag (guest, #31333) [Link] (5 responses)

> 'By convention' files do not contain control characters. The problem is that you cannot rely on convention when writing robust, secure software. Either you put in endless sanity checks which cruft up your code and are liable to be forgotten, or you end up with subtle bugs that are tickled by the existence of files called '>foo' or '|/bin/sh' or countless other variations.

YA.

All I want is for the system to cancel out malicious filename characters and things that obviously make little sense. STuff like control characters, newlines, etc etc.

As for encoding the encoding stuff... meh. Filenames being treated as a string of bytes mostly makes sense, except in a few special cases.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 11:45 UTC (Sat) by epa (subscriber, #39769) [Link] (4 responses)

Of course no existing software treats filenames purely as a string of bytes - that is just rhetoric. At the very least, filenames are treated as ASCII character encoding and displayed to the user as such. Of course, this breaks down when a filename contains control characters.

If Unix really did treat filenames as merely 'a string of bytes', with no implied character set or encoding, and displayed them to the user as a hex dump or something, then it would be truly encoding-agnostic and would have no difficulties with arbitrary byte values in filenames. Of course, it would also have been a total failure that nobody uses. For a filesystem to be useful, it needs to have some amount of meaning (or 'policy' if you will) attached to the filenames it stores. The question is how much: is the current situation of 'ASCII for characters below 128, and above that you're on your own' the best one?

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 16:53 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (3 responses)

The two major pieces of in-house software I develop both treat filenames purely as a string of bytes. The names chosen happen to be meaningful to the programmers, but they are of no importance to the program or its users.

I'd be surprised if the /majority/ of programs other than shell scripts aren't like this. Even in the majority of GUI software, what's needed isn't a revision of the kernel API (in fact that will barely help) but only a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display. Such a function is nearly inevitable anyway - to deal with dozens of other issues unrelated to Wheeler's thesis. And such functions exist today (I can't say if they're bug free of course)

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 14:43 UTC (Sun) by epa (subscriber, #39769) [Link] (2 responses)

a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display
Currently it is impossible to reliably write such a function, because you don't know whether the byte array is encoded in Latin-1, Shift-JIS, UTF-8 or whatever.

Imagine removing the character encoding headers from the http protocol. There would then be no reliable way to take the content of a page and display it to the user - just a panoply of hacks and rules of thumb that differed from one browser to another. This is the situation we have now with filenames, which are *names* and intended for human consumption just as much as the content of a typical web page. The two choices are (a) add headers to the protocol saying what encoding is in use (or in the case of filenames, an extra parameter in all FS calls), or (b) mandate a single encoding everywhere.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 21:58 UTC (Sun) by clugstj (subscriber, #4020) [Link] (1 responses)

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be. This doesn't represent a security problem.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 22:37 UTC (Sun) by epa (subscriber, #39769) [Link]

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be.
Well, yeah. If you allow the function to return the wrong answer, then it is easy to write. But it is not possible to in all cases return the correct filename to the user, matching the original one chosen by the user. If you pick a known encoding everywhere (UTF-8 being the obvious choice) then the problem goes away.
This doesn't represent a security problem.
Correct (at least none that I can think of). The security issue is with special characters and control characters in filenames, and is separate to the issue of how to encode characters that don't fit in ASCII.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds