LWN.net Logo

Locales and UTF-8

Locales and UTF-8

Posted May 8, 2009 21:48 UTC (Fri) by spitzak (guest, #4593)
In reply to: Locales and UTF-8 by nix
Parent article: Debian switching to EGLIBC

Actually the Euro is U+20AC. It is 0xA2 in the CP1252 encoding used by Microsoft but not in official Unicode. However I do thing the Unicode standard should just realize that CP1252 is really common and change the characters 0x80-0xAF to what it defines.

I do hope a program trying to parse for a period only looks for the ASCII period. As soon as you start saying other Unicode characters are "equivalent" then you get a huge mess because different programs may disagree on what is in the equivalent set, and Unicode could add a new character at any time. We already have quite a mess with newlines, lets not make it worse! The only software that should be looking for Unicode punctuation is actual glyph layout and rendering.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds