"Unicode"
Posted Oct 31, 2009 0:24 UTC (Sat) by
nix (subscriber, #2304)
In reply to:
"Unicode" by spitzak
Parent article:
Proposal: Moratorium on Python language changes
In fact we are regressing to earlier than the 1980's by going ASCII-only.
I don't know who 'we' is, but it doesn't describe any software development
shop I know of. Everyone is
more i18n-aware than they used to be,
not less.
As for text being relegated to multiple encodings, well, Unicode is
rapidly conquering over there, as well. Yes, you have to distinguish
between UCS-2 and UTF-8, but you've had to do that for ages, and there are
pretty accurate heuristics now. Needing heuristics to detect encodings is
nothing new, either: we've always needed them for EBCDIC-versus-ASCII,
even before ISO-8859 was heard of.
And this problem of illegal UTF-8 characters which you claim is so
catastrophic? I've never once seen them outside fuzz tests,
attempted attacks, and while debugging a heuristic charset detector. They
just don't occur in normal use of a system, at all. Catastrophe? No. The
security implications are interesting, but not as significant as problems
with equality-comparing UTF-8 characters without considering that they may
not be in canonical form -- a problem you didn't mention.
(
Log in to post comments)