Method for (mostly) kernel-independant Unicode filenames?
Posted Feb 19, 2004 16:28 UTC (Thu) by
Max.Hyre (subscriber, #1054)
In reply to:
The kernel and character set encodings by danscox
Parent article:
The kernel and character set encodings
[Strawman proposal---please point me toward discussions where
it's all been hashed out, shot down, &c. Or, just flame direct.]
How about changing filename semantics (and, of course, every
filesystem known to Linux): make filenames
a three-element struct: a fixed-length specification of the name's
character-set encoding, a fixed-length count of the bytes in
the name, and a variable-length string holding said name:
struct filename {
enum encoding enc;
int cb;
byte *rgb;
};
Now, the kernel doesn't give a fig what the encoding is, or
what it might mean---it's all bytes, with no chance (hah!) of filename
buffer overflows and their attendant dangers to root. The
libraries just use the struct for calls to fopen(),
remove(), rename(),
& friends, with the caller allowed to specify that
- an exact match (on all elements of the struct) is needed for
equality comparisons,
- a bytewise match on the
byte *s, regardless of the
encoding, is sufficient, or
- its own comparator function (supplied) be run on pairs of the structs.
The kernel code is encoding-agnostic, and the rest of the work
(emphatically including sorting) is in
userland.
(
Log in to post comments)