The "Unicode" is the reason I am not using Python 3.
They are living in a fantasy world where UTF-8 magically has no errors in it.
In the real world, if those errors are not preserved, data will be corrupted. This means that the "Unicode" is USELESS because we cannot store text in it. Therefore everybody will have to use byte strings and because the easiest way to avoid "errors" is to say the strings are ISO-8859-1 then we will revert to non-Unicode really fast. This is a terrible result and it is shameful that the people causing it are under the delusion that they are "helping Unicode".
For some reason the ability to do character = string[int] is somehow so drilled into programmers brains that they turn into complete idiot savants, doing incredible amounts of insanely complex and error-prone work, rather than dare to question their initial assumption and come up with the obvious solution if they thought about any other piece of data that is stored in a stream, such as words.
The strings should be BYTES and there should not be two types. If you want "characters" (in the TINY TINY TINY percentage of the cases where you do) then you use an ITERATOR!!!! Ie "for x in string", and x is set to a special item that can compare to characters and also can encode errors. To change the codec you make a different object, but the bytes just get their reference count incremented so there is no copying and changing coded is trivial and O(1). "Unicode strings" would mean the codec is set to UTF-8 and "bytes" would mean the codec is set to some byte version, or possibly the iterator is disallowed. Also the parser needs to translate "\uXXXX" in a string to the UTF-8 representation and "\xNN" to a byte with that value.