|
|
Log in / Subscribe / Register

Simplicity is better than complexity.

Simplicity is better than complexity.

Posted Mar 28, 2009 17:03 UTC (Sat) by tialaramex (subscriber, #21167)
In reply to: Simplicity is better than complexity. by epa
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

“Simply checking filenames for bad characters takes about five lines of code in the kernel plus one line for each syscall that accepts a filename from userspace.”

Show me the money. Five lines, plus one per syscall. Not a lot of work to support such a broad and sweeping claim. Write those lines carefully, we wouldn't want you to be hand-waving and have missed 99.9% of the complexity of the problem...


to post comments

Simplicity is better than complexity.

Posted Mar 29, 2009 15:03 UTC (Sun) by epa (subscriber, #39769) [Link]

To check for control characters

for (const char *c = filename; *c; c++)
if (*c < 32) return EINVAL;

Adding a fixed list of 'bad characters' (please excuse lack of indentation, the LWN comment form eats it):

for (const char *c = filename; *c; c++)
if (*c < 32 || *c == '<' || *c == '>' || *c == '|') return EINVAL;
if (filename[0] == '-') return EINVAL;

To check valid UTF-8 is a little more complex, but not much. You do not need to check that assigned Unicode characters are being used, or worry about combining characters, upper and lower case, etc. See <http://www.cl.cam.ac.uk/~mgk25/unicode.html> for a list of valid byte sequences. The code would be something like

/* First pad the filename with 4 extra NUL bytes at the end. Then, */
int is_cont(char c) { return 128 <= c && c < 192 }
const char *p = filename;
while (*p) {
if (*p < 128) ++c;
else if (192 <= *p && *p < 224 && is_cont(p[1])) p += 2;
else if (224 <= *p && *p < 240 && is_cont(p[1]) && is_cont(p[2]) p += 3;
else if (240 <= *p && *p < 248 && is_cont(p[1]) && is_cont(p[2])
&& is_cont(p[3])) p += 4;
else if (248 <= *p && *p < 252 && is_cont(p[1]) && is_cont(p[2])
&& is_cont(p[3]) && is_cont(p[4])) p += 5;
else if (252 <= *p && *p < 254 && is_cont(p[1]) && is_cont(p[2])
&& is_cont(p[3]) && is_cont(p[4]) && is_cont(p[5])) p += 6;
else return EINVAL;
}

For a self-contained system, that takes care of it. Put some code like the above into a function and call it at each place a filename is taken from user space. Coping with 'foreign' filesystems (e.g. NFS servers) returning non-UTF-8 filenames is a bit more complex.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds