A program that treats bytes with the high bit set as "this may be a piece of a UTF-8 character", and puts all those bytes into a single class such as "may be a part of an identifier", can safely handle UTF-8 strings (including invalid ones) as bytes. This is FAR better than trying to detect and handle errors, in particular because it is a hundred times simper and thus more reliable and less likely to have bugs.
Do NOT throw exceptions on bad strings. This turns a possible security error into a guaranteed DOS error. Working around it (as I have had to do countless times due to stupid string-drawing routines that refuse to draw a string with an error in it) means I have to write my *own* UTF-8 parser, just to remove the errors, before displaying it or using it. I hope you can see how forcing programmers to use their own code to parse the strings rather than providing reusable routines is a bad idea.
And I don't want exceptions thrown when I compare two strings for equality. That way lies madness. It is unfortunate that too much of this stuff is being designed by people who never use it or they (and you) would not make such trivial design errors.