User: Password:
Subscribe / Log in / New account

The kernel and character set encodings

The kernel and character set encodings

Posted Feb 21, 2004 7:49 UTC (Sat) by Cato (subscriber, #7643)
In reply to: The kernel and character set encodings by spitzak
Parent article: The kernel and character set encodings

This problem needs to be addressed somewhere, though not necessarily in the kernel (perhaps in glibc or the GUI layer): two users create identical looking filenames using Vietnamese accented characters (letter + 2 accents in different order, 3 Unicode characters altogher). Then, there are two identical-looking filenames and you don't know how to type the 'right' one. Even if there is only one file involved, without Unicode normalisation you wouldn't be able to use bash filename completion, since you might type the accents in a different order to that used in the filename, though there would be no visual clue as to your mistake.

Given these issues, which affect command line tools as much as GUIs, it may be sensible to put NFC normalisation in glibc or the kernel, despite the complexity. Files created from another system on a Linux NFS filesystem would of course bypass glibc, so the alternatives are batch renormalisation (always an option, convmv may do this) or putting NFC in the kernel.

It's not good enough to say 'case-insensitivity should not be in the kernel' - you need to address these use cases and say how and where you would solve them.

(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds