UTF-16

Posted Mar 25, 2010 2:27 UTC (Thu) by Nahor (subscriber, #51583)
In reply to: UTF-16 by wahern
Parent article: Resetting PHP 6

> Unicode was never 16-bits

http://en.wikipedia.org/wiki/Unicode#History:
Unicode could be roughly described as "wide-body ASCII" that has been stretched to 16 bits [...]
In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits.

> nor is it now 32-bits

Indeed it's less (http://en.wikipedia.org/wiki/Unicode#Architecture_and_ter...):
Unicode defines a codespace of 1,114,112 code points in the range 0 to 10FFFF

> Nor was UCS-2 fixed-width per se

http://en.wikipedia.org/wiki/Universal_Character_Set:
UCS-2, uses a single code value [...] between 0 and 65,535 for each character, and allows exactly two bytes to represent that value.

> [...]

Unicode/UTF-16/UCS-2/... may not be perfect, it's still better than what we had before. At least now we have a universal way of display foreign alphabets.
Byte arrays to represent a string may not be ideal but it's not worse than before. Features like word splitting may not be easy but they never were. And not all applications need such features. A lot of them just want to able to display Asian characters on an English OS.