I strongly agree with Forth's solution. The postscript paper describes exactly how easy it was to use UTF-8 if you stop panicking about "characters" and realize that they are just like words and nobody worries that you can't find the ends of words in O(1) time. The listing of the number of lines changed should be very instructive. I hope everybody saying I am wrong might read the paper.
Forth's solution appears to have an interator return an object that they call an "xchar" which is a Unicode code point. I believe such an object is easily extended to return "UTF-8 encoding error" as a different value. You can also make different iterators to return composed or decomposed characters, and to automatically convert UTF-8 errors to CP1252 equivalents, which (though unsafe) will remove any need to "identify the character encoding" since this will reliably recognize UTF-8, ISO-8859-1, and CP1252 automatically, even if variations are pasted together.