LWN.net Logo

Locales and UTF-8

Locales and UTF-8

Posted May 11, 2009 16:06 UTC (Mon) by ajross (subscriber, #4563)
In reply to: Locales and UTF-8 by epa
Parent article: Debian switching to EGLIBC

Everything you say is true of ASCII too. You have to validate untrusted input, regardless of what it is. ASCII doesn't have the high bit set, but any ASCII format is by necessity going to have escaping mechanisms that need equivalent validation. For this specific example, you *are* counting your matching quote characters, right? Everything is an "encoding" at some level.

Avoiding UTF-8 in the blind expectation that it somehow makes your code more "secure" is just plain wrong. This kind of mistake is exactly what I'm talking about. People attribute to encoding transformation and I18N all sorts of complexities that aren't actually there in practice.


(Log in to post comments)

Locales and UTF-8

Posted May 19, 2009 9:18 UTC (Tue) by epa (subscriber, #39769) [Link]

I agree that using some hacky alternative instead of UTF-8 will not improve security. Nothing I wrote should be taken as a reason to avoid UTF-8. (Though it's not true that you *always* have to include escaping mechanisms for ASCII input - some file formats such as /etc/passwd can get away with being completely stupid and not supporting escaping or accented characters at all.)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds