Not logged in
Log in now
Create an account
Subscribe to LWN
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
I think that many people would appreciate having at least a hint as to the character encoding in use. Although in these days of Utf-8 it is less and less relevant of course.
Control characters in file names
Posted Nov 24, 2010 17:57 UTC (Wed) by ikm (subscriber, #493)
My guess is that unix systems just take the easiest approach here - treat the filename as a binary blob, and let userspace do the rest :) I have got to admit that in practice there's less hassle with FSes which are Unicode-aware (think Microsoft), unless you actually start trying to figure just what is that you are allowed to use there for filenames. Then you'd basically just stick to base64 or percent-encoding, which would be the right thing to do in any case.
Posted Nov 24, 2010 19:15 UTC (Wed) by michaeljt (subscriber, #39183)
There have been a number of complaints on this thread about filesystems that are encoding-aware and the problems that causes. But actually the filesystem could carry encoding hints without being encoding-aware itself. For example, it could tell user space that a file name is Utf-8 but still just treat the name as a binary blob. The hint would just tell applications how best to display the name.
Posted Nov 27, 2010 8:11 UTC (Sat) by cmccabe (guest, #60281)
Well, you could use an extended attribute to represent the encoding of the filename. However, it would be a huge amount of work to change all the applications to check this attribute and act appropriately.
I'm pretty far from being an expert in internationalization, but my understanding is that non-unicode character encodings are considered deprecated. Based on comments made elsewhere in this thread, MacOS and Windows have already decreed that all filenames should be unicode. So is it really worth rewriting all software that dislays filenames in order to better support this legacy stuff? Especially when no other platforms support it at all? As Linus constantly points out, Linux-specific filesystem interfaces don't get used that much, even when they offer great benefits.
I think I agree with Spudd86's solution: there should be some kind of mount option that puts a ruleset in place for filenames. Probably nearly every Linux distribution would disallow filenames that were not UTF-8. A few people running special-purpose systems might mount their rootfs with more restrictive rulesets. Most system administrators already have an unwritten policy about filenames-- they don't create filenames with embedded control characters, crazy stuff like leading dashes, or embedded newlines. Letting system administrators turn their implicit policy into an explicit one would close a lot of security holes.
I wonder if it would be feasible to use the "escaping" option talked about on Wheeler's page. Basically, under this option, the kernel continues to treat filenames as binary blobs on the disk. But when presenting them to userspace, it escapes certain characters in a predictable way. I'm not sure whether this is really feasible, but it seems like the best choice if it is.
Posted Nov 30, 2010 1:39 UTC (Tue) by jamesh (guest, #1159)
As well as being a lot of work, using extended attributes introduces ambiguity. Some extra problems with that suggestion are:
Picking one encoding/normalisation is the only sane option, and it would be nice if the kernel would help enforce such a choice.
Posted Dec 2, 2010 18:22 UTC (Thu) by Wol (guest, #4433)
I've worked on a system where a file was composed of sub-files (Pr1mos). This was emulated on nix by using a directory with "special" names inside, namely all the subfiles were "<space><backspace><number>". Because nobody is supposed to touch these subfiles directly.
So if you enforce a policy like that, you could bust a bunch of apps ...
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds