UTF-16

Posted Mar 24, 2010 21:53 UTC (Wed) by elanthis (guest, #6227)
In reply to: UTF-16 by flewellyn
Parent article: Resetting PHP 6

Don't be facetious. That a string is internally represented as an array of
bytes or codepoints is an entirely different thing than its client API being
purely iterator based. The problem he was talking about was that the client
API to strings in most languages is to expose it as an array of characters
even though internally it _isn't_ an array of characters, it's an array of
bytes or codepoints. The accessors for strings really don't make much sense
as an array, either, because an array is something indexable by offset, which
makes no sense: what exactly is the offset support to represent? Bytes?
Codepoints? Characters? You can provide a firm answer to this question, of
course, but the answer is only going to be the one the user wants some of the
time. A purely iterator based approach would allow the client to ask for a
byte iterator, a codepoint iterator, or even a character iterator, and get
exactly the behaviour they expect/need.

UTF-16

Posted Mar 24, 2010 21:56 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

I was not being facetious. The idea of representing strings using a different data structure was actually something I was thinking was an interesting idea.

But, you're right, there's no need for internal and external representations to match. At least, on some levels. At some level you do have to be able to get at the array of bytes directly.