LWN.net Logo

8 byte characters?

8 byte characters?

Posted Aug 13, 2005 2:53 UTC (Sat) by hp (subscriber, #5220)
In reply to: 8 byte characters? by ringerc
Parent article: Our bloat problem

Unicode doesn't fit in 16 bits anymore; most apps using 16-bit encodings would be using UTF-16, which has the same variable-length properties as UTF-8. If you pretend each-16-bits-is-one-character then either you're using a broken encoding that can't handle all of Unicode, or you're using UTF-16 in a buggy way. To have one-array-element-is-one-character you have to use a 32-bit encoding.

UTF-8 has the huge advantage that ASCII is a subset of it, which is why everyone uses it for UNIX.


(Log in to post comments)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.