LWN.net Logo

"Unicode"

"Unicode"

Posted Oct 31, 2009 1:29 UTC (Sat) by nix (subscriber, #2304)
In reply to: "Unicode" by foom
Parent article: Proposal: Moratorium on Python language changes

Sorry, I misinterpreted you. Of course it's more complicated to iterate
over strings now, but really not much more, and UTF-8 (unlike the
fixed-width multibyte encodings) is easy to resync to if you start from an
arbitrary byte, so things like binary searches in long strings are still
possible with a tiny bit of extra tweaking.

And, agreed, the ability to treat a string as a fixed-width array is
really quite unimportant: generally people iterate over strings rather
than leaping to position N. (You meant 'position' or 'offset', though,
not 'codepoint', which is entirely different. Codepoint 'access' isn't
even a particularly meaningful concept: what does it mean to 'access'
ASCII codepoint 65? Codepoints just *are*.)


(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds