Wheeler: Fixing Unix/Linux/POSIX Filenames
Wheeler: Fixing Unix/Linux/POSIX Filenames
Posted Mar 28, 2009 16:33 UTC (Sat) by tialaramex (subscriber, #21167)In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by mgross
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames
To actually make this work, in the kernel (where you're perf critical and this is all unwanted overhead that's costing everyone who uses your "improved" system) you need to absolutely, as a matter of "Linus will veto if you don't" policy:
* Validate every filename to check that it conforms. This has to be done either at mount time, or when syscalls interact with the filenames (e.g. directory reading, and opening files). As a network file system client the OS must either screen every filename going over the network, or else punt and rely on promises from the server (if available).
* When you find an invalid filename, you need to deal with it, it's not clear what the kernel should or even could do. Perhaps the file should just not exist as far as userspace is concerned, and fsck would unlink it?
Meanwhile application developers get no benefit for many years because of compatibility considerations. It could be a decade before it makes any sense to write a program which assumes one of the restrictions, and that's if EVERY SINGLE OS fixes this tomorrow. Wheeler mistakenly believes this is a POSIX problem, but it isn't, the problem exists everywhere that filenames are treated as opaque, which in fact includes Windows (and I have my doubts about OS X, but its API documentation promises they aren't opaque, so app developers who rely on that promise would be entitled to scream blue murder when someone finds a way to get non-Unicode into an OS X filename...*)
Personally I think the issue to look at is spaces. Spaces are legal. They are undoubtedly going to remain legal. But they are inconvenient. How can we tweak our basic Unix processes (including the shell and many old tools) so that spaces are harmless ? Once you've done this, you'll have the right mindset to tackle initial hyphen, control characters and so on from the same angle, rather than screwing the poor kernel into doing your dirty work and making everybody (including those of us for whom opaque filenames are just dandy) pay.
* Something that should make you pause, OS X's approach to filenames as Unicode strings makes Unicode composition/ decomposition into an OS ABI feature. It had been doing this for years before Unicode actually pledged to stop changing the decomposition rules (ie until that happened new versions of OS X made previously legal filenames illegal and vice versa, with no warning...)
