It'll be fascinating to see what that breaks when someone throws in a
character with the high bit set :) stuff that relies upon the C locale
rarely makes a distinction between bytes and characters, even where it
should... of course, one would hope that not much such software is left.
Posted May 8, 2009 2:02 UTC (Fri) by spitzak (guest, #4593)
[Link]
Nothing will break when a byte has a high bit set, since it will just be copied to the output unchanged.
Don't panic about UTF-8. The biggest problem with it is people who do not understand it, some of them are good enough programmers that they might write some code that is very damaging, where they actually try to interpret the UTF-8 encoding.
The only real bug in Unix with UTF-8 is a whole lot of documentation that says "character" where it should say "byte". There is nothing wrong with the current implementations.
Debian switching to EGLIBC
Posted May 8, 2009 13:57 UTC (Fri) by nix (subscriber, #2304)
[Link]
I covered this 'nothing will care if you feed UTF-8 to a program expecting
a byte stream' canard in my other response. It's trivially wrong.