Python 3 is doing the EXACT SAME STUPID MISTAKE. It is going to be a disaster and the developers are too blinded to realize it.
There will be the annoying overhead of converting every bit of data on input and output. But far more important will be the fact that errors in the UTF-8 will either be lost or will cause exceptions to be thrown, producing a whole universe of ugly bugs and DOS attacks. This is going to suck bad!
Strings should be UTF-8 and string[n] should return the n'th byte in the string. That is the TRUTH and Microsoft and Python and PHP and Java and everybody else is WRONG.
But how do I get the N'th character???? You are probably sputtering this nonsense question right now, right? You need to ask yourself: where did "N" come from? I can guarantee you it came from an iterative process that looked at every character between some other point and this new point. The proper interface to look at "characters" is ITERATORS. They can move by one in each direction in O(1) time. And different iterators can return composed or decomposed characters, and if the byte is an error they can clearly return that error and also return suggested replacement values.
Unfortunatly Unicode and UTF and perhaps some kind of politically-correct rule that we can only have equality and world peace if some people don't get the "better" shorter encodings, seems to turn quite intelligent programmers into complete morons. Or more like idiot savants: they are dangerously talented enough to write these horrible things and foist them on everybody.