Everything you say is true of ASCII too. You have to validate untrusted input, regardless of what it is. ASCII doesn't have the high bit set, but any ASCII format is by necessity going to have escaping mechanisms that need equivalent validation. For this specific example, you *are* counting your matching quote characters, right? Everything is an "encoding" at some level.
Avoiding UTF-8 in the blind expectation that it somehow makes your code more "secure" is just plain wrong. This kind of mistake is exactly what I'm talking about. People attribute to encoding transformation and I18N all sorts of complexities that aren't actually there in practice.