As nix says, the filename encodes a key to what the file contains. The encoding is radix-254 (NUL and '/' excluded.) This fully utilizes the ASCII control characters [\x01-\x1f] and also the sequences such as subsets of [\xfc-\xff]* which are disallowed by UTF-8. Radix-254 is almost 2 bits per byte more dense than the proposed radix-65 (26 upper case, 26 lower case, 10 digits, dot hyphen underscore). The OS imposes an upper bound on the length of a filename, and there are critical points at various shorter lengths where there are jumps in space*time costs. Enough utility is discarded by radix-65 (as opposed to radix-254) that customers complain.
Posted Mar 26, 2009 14:44 UTC (Thu) by dwheeler (guest, #1216)
[Link]
I never proposed radix-65. Radix-65 (26 upper case, 26 lower case, 10 digits, dot hyphen underscore) is what the POSIX standard ALREADY says is all you can depend on; nothing else is portable by that spec.
I want to be able to count on more than what the POSIX spec says;
I want to be able to use the entire Unicode character set, minus the control chars and a few additional constraints to prevent lots of problems for the general-purpose user.