Posted May 13, 2013 0:34 UTC (Mon) by rcweir (guest, #48888)
Posted May 13, 2013 9:03 UTC (Mon) by tialaramex (subscriber, #21167)
Posted May 13, 2013 11:50 UTC (Mon) by rcweir (guest, #48888)
The pragmatism extends far beyond considerations of the Apache CMS and our tool set. In my experience there are very many 3rd party tools, services, large and small, that scrape website <title> tags, and many of them have naive encoding logic. You've probably seen your share of scrapes that mess things up, putting ', etc., into visible text rather than coding things properly. Since we cannot control such 3rd party tools, it makes great sense to be defensive and aim for the lowest common denominator for the <title> of our home page, the one pages most likely to be shared via 3rd party tooling. This, in practice, would recommend limiting it to 8-bit ASCII. So yes, we sacrifice the elegance of the em-dash for the en-dash. (Oh, the pity!) But we gain greater assurance that it won't be lost in translation.
Keep in mind Postel's Law of Robustness: "Be conservative in what you send, be liberal in what you accept".
Of course, we'll look into why exactly our toolset got this wrong as well. This isn't an either-or thing. We can fix bugs and be more robust to limitations in 3rd party tools at the same time. In fact I'd argue we should do that.
In any case, thanks again for reporting the error, even if the resolution was not as simple as you thought it would be.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds