I've been pondering on this for a couple of days ... When does the kernel need to log data which is non-7bit-ASCII? Obviously it is related to messages which is non-English or other binary data. But seriously, when does this *really* happen? Can someone point me to some kernel code where this is an important feature? Which other scenarios would non-7bit-ASCII values be valuable? When something above 0x7f needs to be logged, it's often binary data - where a hex notation might even make more sense.
UTF-8 also supports the complete 7 bit ASCII range, so displaying purified 7bit ASCII strings on UTF-8 terminals is not a problem.
In this perspective, it makes most sense to whitelist \t, \n and 0x20-0x7f ... IMO, anything outside this range should be considered potentially harmful, and can and should therefore be escaped.
And *if* this causes an issue later on with UTF-8, it is easier to expand a valid range - than to do the reverse. Which is kind of the "pain" in this discussion, reducing the number of valid characters is always painful.
Of course user-space should do their sanitation as well. *But* security is about layers. If kernel can provide sane log data, and user-space also can filter out gibberish which sneaked under the kernel radar, that's when you have redundant security. Tossing the ball back and forth claiming user-space or kernel-space should have the responsible is just silly, especially the day when something breaks.