Hyphens, minus, and dashes in Debian man pages
Hyphens, minus, and dashes in Debian man pages
Posted Oct 24, 2023 3:31 UTC (Tue) by rra (subscriber, #99804)In reply to: Hyphens, minus, and dashes in Debian man pages by branden
Parent article: Hyphens, minus, and dashes in Debian man pages
This is the point where my own struggles with problems like this over the last ten years have given me a lot of respect for the amount of thought that's gone into Unicode. They took a careful and pragmatic decision to provide code points that represent the ambiguous merged character, and then separate code points that more precisely indicate intent. This to some extent means that within a Unicode world, both options are possible and the document author gets to choose how much to care.
If you want very nice typesetting, you can use hyphens, minuses, and en-dashes in the ways they were intended to be used. If you want to be lazy and not think about it, you can use a Unicode hyphen-minus and you get a compromise character that looks "okay" and, importantly, is clearly marked as a semantic compromise. Any typesetting system gets the correct information that the user was either talking about code or decided not to care about the distinctions between dashes, and therefore the typesetting system probably shouldn't try to care more than the user did.
This is similar to what they did with apostrophe and single quotes: the preferred characters in Unicode are U+2018 and U+2019, and U+0027 is defined as a neutral character that is intentionally left ambiguous, for users who don't care enough to draw the distinction.
You can't force users to care. The best you can do is provide them with the tools and make it clear whether they chose to use them or not. (And indeed, despite knowing all of this, I always use neutral single and double quotes and a hyphen-minus, because I don't care enough. Although I have started using real em-dashes, and I will occasionally use a real en-dash, so maybe eventually I'll come around.)
I'm simplifying a bit, and the Unicode world is not quite as shiny as all of that. Typesetting and human languages are messy and there are still sharp edges and ambiguities. But it's a system that a whole lot of people put a whole lot of thought into, and the results embed more practical wisdom than I think people realize.
Posted Oct 24, 2023 3:55 UTC (Tue)
by branden (guest, #7029)
[Link] (1 responses)
I concur with this. I don't think _anyone_ involved with groff development views Unicode as anything less than a tremendous boon to the sanity of glyph and character repertoires. (Oh, how I wish James Clark had decided to store groff characters internally as ints instead of C++ chars. But we'll get that refactored, knock wood.)
I have seen _one_ person grouse that apostrophes (however rendered) and right single quotation marks should be kept logically separate, and I have some sympathy for that view, because they _are_ logically separate--but it seems no English typesetting tradition ever sees fit to distinguish them in print. If I regard were to regard occasional man page authors as intransigent with respect to correct glyph choices, I dread to measure the inertia of commercial publishers.
Posted Oct 25, 2023 14:02 UTC (Wed)
by smoogen (subscriber, #97)
[Link]
Again thank you for teaching and making this conversation something enjoyable to read.
Posted Oct 24, 2023 7:26 UTC (Tue)
by smurf (subscriber, #17840)
[Link]
Depends on your locale; don't forget about U+201A. And then there's places where they use U+2039/U+203A … and other places where they use U+203A/U+2039. See https://en.wikipedia.org/wiki/Quotation_mark for even more enlightening examples.
Posted Oct 24, 2023 16:24 UTC (Tue)
by gray_-_wolf (subscriber, #131074)
[Link]
Maybe in some areas. The whole Han unification thing is in my opinion still a mistake. Having to know what language the text is in in order to be able to render it correctly is... annoying.
Hyphens, minus, and dashes in Debian man pages
Hyphens, minus, and dashes in Debian man pages
Hyphens, minus, and dashes in Debian man pages
Hyphens, minus, and dashes in Debian man pages