Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for December 5, 2013
Deadline scheduling: coming soon?
LWN.net Weekly Edition for November 27, 2013
ACPI for ARM?
LWN.net Weekly Edition for November 21, 2013
"UCS2 is almost never found in the real world."
Windows 2000/XP/2003/Vista/2008/7 is almost never found in the real world?
Moving to Python 3
Posted Feb 10, 2011 11:52 UTC (Thu) by tialaramex (subscriber, #21167)
So, no, Windows isn't an example of UCS2, and hasn't been for many years.
Posted Feb 10, 2011 17:10 UTC (Thu) by marcH (subscriber, #57642)
In this sense, UCS-2 is extremely often found in the real world.
Posted Feb 11, 2011 4:01 UTC (Fri) by tialaramex (subscriber, #21167)
What were you imagining they should be using java.lang.String.codePointCount() for ? Text is hard, like I said, and a count of Unicode code points is rarely what you need.
Examples of things which are assigned one or more Unicode code points: A harmless, invisible and ignorable marker; indication that subsequent neutral text is intended to be displayed right-to-left; the cedilla accent on a character; a lowercase x; a vertical tab; indication that a non-fatal error occurred in some previous processing.
Posted Feb 10, 2011 11:57 UTC (Thu) by tialaramex (subscriber, #21167)
As the original poster said (even if their terminology is wrong in a bunch of places) UCS-2 looked like it might be clever in the mid-1990s. Once it became clear that Unicode's hyperspace would be populated, and UCS2 wasn't capable of handling that, the choice was no longer between UCS2 and UTF8 (where UCS2 delivers some intuitive-seeming properties, although not as many as sometimes claimed) but between UTF8 and UTF16, where UTF16 is completely horrible.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds