|
|
Log in / Subscribe / Register

Wheeler: Fixing Unix/Linux/POSIX Filenames

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 14:43 UTC (Sun) by epa (subscriber, #39769)
In reply to: Wheeler: Fixing Unix/Linux/POSIX Filenames by tialaramex
Parent article: Wheeler: Fixing Unix/Linux/POSIX Filenames

a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display
Currently it is impossible to reliably write such a function, because you don't know whether the byte array is encoded in Latin-1, Shift-JIS, UTF-8 or whatever.

Imagine removing the character encoding headers from the http protocol. There would then be no reliable way to take the content of a page and display it to the user - just a panoply of hacks and rules of thumb that differed from one browser to another. This is the situation we have now with filenames, which are *names* and intended for human consumption just as much as the content of a typical web page. The two choices are (a) add headers to the protocol saying what encoding is in use (or in the case of filenames, an extra parameter in all FS calls), or (b) mandate a single encoding everywhere.


to post comments

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 21:58 UTC (Sun) by clugstj (subscriber, #4020) [Link] (1 responses)

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be. This doesn't represent a security problem.

Wheeler: Fixing Unix/Linux/POSIX Filenames

Posted Mar 29, 2009 22:37 UTC (Sun) by epa (subscriber, #39769) [Link]

No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be.
Well, yeah. If you allow the function to return the wrong answer, then it is easy to write. But it is not possible to in all cases return the correct filename to the user, matching the original one chosen by the user. If you pick a known encoding everywhere (UTF-8 being the obvious choice) then the problem goes away.
This doesn't represent a security problem.
Correct (at least none that I can think of). The security issue is with special characters and control characters in filenames, and is separate to the issue of how to encode characters that don't fit in ASCII.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds