LWN.net Logo

Locales and UTF-8

Locales and UTF-8

Posted May 11, 2009 16:01 UTC (Mon) by endecotp (guest, #36428)
In reply to: Locales and UTF-8 by nix
Parent article: Debian switching to EGLIBC

> not all the punctuation characters are single bytes

I was referring to the punctuation characters used to delimit CSV, which are all ASCII characters (as are those used in XML).

> The rest of your points stand: if all you want to do is manipulate
> ASCII characters in a UTF-8 stream, you can do that without being
> Unicode-aware

My points were that you can do all of those things (e.g. search and replace) EVEN IF the input is non-ASCII.

Your example of delete key behaviour is an interesting one that comes under my category of "GUI and similar I/O". It is clearly necessary to delete back as far as the last character-starting byte. Doing so is not very hard.

> I suppose misbehaviour from this change is unlikely *if* you're in
> the US. Anywhere else? Bite your knuckles.

I am not in the U.S., and my code works with UTF-8 without the sort of major headaches that you allude to.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds