HTML-rendering-engines live in a world where a common use for them, is to render a string of HTML where different parts of the string comes from different entities which you trust to differing degrees. LWN itself is an example of this, the text you're reading right now is a string entered by me, while the layout and rest of the window is markup and text made by LWN.
This is the reality of the situation. The same isn't true for (say) an image-viewer. It's not, infact, common to be viewing a jpg-image where the pixels in a certain part of the picture are created by one entity, and the pixels elsewhere by another entity. If it was, then yes, the situation would be parallell.
Often it doesn't matter. But sometimes it matters very much indeed. If I pay a bill by net-bank, I can enter a string that is displayed (in the net-bank) to the person receiving the payment. If it was possible for me to enter a string that would, for example, change what the recipient sees in the "amount" field, not merely the "comment"-field, that'd be a major problem.
Yes, they sanitize the strings (allow only [a-zA-Z 0-9]). This reduces the attack-surface, and makes sense. But the basic principle of the attack remains: bugs in the rendering-engine may make one part of the string able to affect other parts which it shouldn't by the standard, such bugs are security-bugs.
Yes that means many bugs in rendering untrusted content are security-bugs. A bug in OpenOffice that somehow causes what is displayed on screen (for a specially crafted document) to differ from what is printed, is a security-bug. (imagine what would happen if someone read a contract on-screen, then printed and signed the paper-copy without validating that the paper-contract match the on-screen contract)