There is no problem with UTF-8 filenames. The bytes should be stored
unchanged, and unchanged bytes should be used to look up the file. It
does not matter if those bytes are a legal UTF-8 string or not, to say
nothing of what normalization form they are.
Unfortunately there are hordes of people out there who think dumb ideas
like case-insensitivity should be applied at low levels to stuff that
really is binary data. This kind of thinking is what causes complexity,
and complexity causes bugs and security holes.
Any program that takes a string it thinks is UTF-8 and does
<i>ANYTHING</i> other than pass the exact bytes unchanged to another
interface that wants UTF-8 is by definition broken. This simple rule will
completely eliminate all ambiguity about UTF-8.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds