Posted Feb 10, 2011 6:39 UTC (Thu) by ras (subscriber, #33059)
Parent article: Moving to Python 3
I'll be avoiding Python 3 as long as I can. The adoption of UCS2 for strings is a major stuff-up. There was a good way to improve string handling in Python 3. Drop UCS2 strings entirely. Their introduction was a mistake.
I think the mistake arose from a common misconception. It seems popular to equate UCS2 with unicode support. This is an abuse of terminology. The old strings represented unicode perfectly well as UTF-8. The new strings style strings use UCS2 instead. That may have been a good idea when Java introduced it, because back then unicode only occupied one code plane in UCS2. Now UCS2, just like UTF-8 must use multibyte sequences for some unicode code points. So the one good point is gone.
The major downside remains however: UCS2 is almost never found in the real world. So you spend 1/2 your time converting between whatever the outside world is using and UCS2, and then back again. The lines of code increase, memory requirements almost double, the execution time increases and in my experience the bugs sky rocket.