|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 26, 2009 9:51 UTC (Thu) by epa (subscriber, #39769)
In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by clugstj
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

'By convention' files do not contain control characters. The problem is that you cannot rely on convention when writing robust, secure software. Either you put in endless sanity checks which cruft up your code and are liable to be forgotten, or you end up with subtle bugs that are tickled by the existence of files called '>foo' or '|/bin/sh' or countless other variations.

Such bugs are made more insidious by the fact that 'by convention', they cannot ever be triggered. But for someone trying to make a working exploit, or widen a small security hole into a larger one, convention is no barrier.

If you want to have certainty that your code works correctly, 100% of the time, no ifs and no buts - rather than just waving your hands and hoping that everyone else in the world makes filenames that follow the same convention as you - then you need a guarantee that the assumptions you make are guaranteed to be true.

If you want to imagine that all your filenames are UTF-8, go ahead, who's stopping you!
You could equally well say that disk quotas are not needed; if you want to limit yourself to use 100 megabytes of space, who's stopping you? Indeed what is the point of file permissions - if you want to pretend that all your files are read-only, who's stopping you? And why should the kernel forbid hard links to directories - surely it should be up to the user to decide whether their filesystem is a tree or a general DAG, and the kernel should not enforce this policy.


to post comments

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 27, 2009 19:23 UTC (Fri) by drag (guest, #31333) [Link] (5 responses)

> 'By convention' files do not contain control characters. The problem is that you cannot rely on convention when writing robust, secure software. Either you put in endless sanity checks which cruft up your code and are liable to be forgotten, or you end up with subtle bugs that are tickled by the existence of files called '>foo' or '|/bin/sh' or countless other variations.

YA.

All I want is for the system to cancel out malicious filename characters and things that obviously make little sense. STuff like control characters, newlines, etc etc.

As for encoding the encoding stuff... meh. Filenames being treated as a string of bytes mostly makes sense, except in a few special cases.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 11:45 UTC (Sat) by epa (subscriber, #39769) [Link] (4 responses)

Of course no existing software treats filenames purely as a string of bytes - that is just rhetoric. At the very least, filenames are treated as ASCII character encoding and displayed to the user as such. Of course, this breaks down when a filename contains control characters.

If Unix really did treat filenames as merely 'a string of bytes', with no implied character set or encoding, and displayed them to the user as a hex dump or something, then it would be truly encoding-agnostic and would have no difficulties with arbitrary byte values in filenames. Of course, it would also have been a total failure that nobody uses. For a filesystem to be useful, it needs to have some amount of meaning (or 'policy' if you will) attached to the filenames it stores. The question is how much: is the current situation of 'ASCII for characters below 128, and above that you're on your own' the best one?

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 28, 2009 16:53 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (3 responses)

The two major pieces of in-house software I develop both treat filenames purely as a string of bytes. The names chosen happen to be meaningful to the programmers, but they are of no importance to the program or its users.

I'd be surprised if the /majority/ of programs other than shell scripts aren't like this. Even in the majority of GUI software, what's needed isn't a revision of the kernel API (in fact that will barely help) but only a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display. Such a function is nearly inevitable anyway - to deal with dozens of other issues unrelated to Wheeler's thesis. And such functions exist today (I can't say if they're bug free of course)

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 14:43 UTC (Sun) by epa (subscriber, #39769) [Link] (2 responses)

a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display
Currently it is impossible to reliably write such a function, because you don't know whether the byte array is encoded in Latin-1, Shift-JIS, UTF-8 or whatever.

Imagine removing the character encoding headers from the http protocol. There would then be no reliable way to take the content of a page and display it to the user - just a panoply of hacks and rules of thumb that differed from one browser to another. This is the situation we have now with filenames, which are *names* and intended for human consumption just as much as the content of a typical web page. The two choices are (a) add headers to the protocol saying what encoding is in use (or in the case of filenames, an extra parameter in all FS calls), or (b) mandate a single encoding everywhere.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 21:58 UTC (Sun) by clugstj (subscriber, #4020) [Link] (1 responses)

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be. This doesn't represent a security problem.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 22:37 UTC (Sun) by epa (subscriber, #39769) [Link]

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be.
Well, yeah. If you allow the function to return the wrong answer, then it is easy to write. But it is not possible to in all cases return the correct filename to the user, matching the original one chosen by the user. If you pick a known encoding everywhere (UTF-8 being the obvious choice) then the problem goes away.
This doesn't represent a security problem.
Correct (at least none that I can think of). The security issue is with special characters and control characters in filenames, and is separate to the issue of how to encode characters that don't fit in ASCII.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds