|
|
Subscribe / Log in / New account

Hyphens, minus, and dashes in Debian man pages

Hyphens, minus, and dashes in Debian man pages

Posted Oct 24, 2023 3:31 UTC (Tue) by rra (subscriber, #99804)
In reply to: Hyphens, minus, and dashes in Debian man pages by branden
Parent article: Hyphens, minus, and dashes in Debian man pages

> As you say, the keyboard is not large enough. At some point we run into not a technological problem, but a human one; it's hard to make people care about typographical distinctions that they don't want to care about, especially if their horizons stretch no farther than a terminal window.

This is the point where my own struggles with problems like this over the last ten years have given me a lot of respect for the amount of thought that's gone into Unicode. They took a careful and pragmatic decision to provide code points that represent the ambiguous merged character, and then separate code points that more precisely indicate intent. This to some extent means that within a Unicode world, both options are possible and the document author gets to choose how much to care.

If you want very nice typesetting, you can use hyphens, minuses, and en-dashes in the ways they were intended to be used. If you want to be lazy and not think about it, you can use a Unicode hyphen-minus and you get a compromise character that looks "okay" and, importantly, is clearly marked as a semantic compromise. Any typesetting system gets the correct information that the user was either talking about code or decided not to care about the distinctions between dashes, and therefore the typesetting system probably shouldn't try to care more than the user did.

This is similar to what they did with apostrophe and single quotes: the preferred characters in Unicode are U+2018 and U+2019, and U+0027 is defined as a neutral character that is intentionally left ambiguous, for users who don't care enough to draw the distinction.

You can't force users to care. The best you can do is provide them with the tools and make it clear whether they chose to use them or not. (And indeed, despite knowing all of this, I always use neutral single and double quotes and a hyphen-minus, because I don't care enough. Although I have started using real em-dashes, and I will occasionally use a real en-dash, so maybe eventually I'll come around.)

I'm simplifying a bit, and the Unicode world is not quite as shiny as all of that. Typesetting and human languages are messy and there are still sharp edges and ambiguities. But it's a system that a whole lot of people put a whole lot of thought into, and the results embed more practical wisdom than I think people realize.


to post comments

Hyphens, minus, and dashes in Debian man pages

Posted Oct 24, 2023 3:55 UTC (Tue) by branden (guest, #7029) [Link] (1 responses)

> This is the point where my own struggles with problems like this over the last ten years have given me a lot of respect for the amount of thought that's gone into Unicode.

I concur with this. I don't think _anyone_ involved with groff development views Unicode as anything less than a tremendous boon to the sanity of glyph and character repertoires. (Oh, how I wish James Clark had decided to store groff characters internally as ints instead of C++ chars. But we'll get that refactored, knock wood.)

I have seen _one_ person grouse that apostrophes (however rendered) and right single quotation marks should be kept logically separate, and I have some sympathy for that view, because they _are_ logically separate--but it seems no English typesetting tradition ever sees fit to distinguish them in print. If I regard were to regard occasional man page authors as intransigent with respect to correct glyph choices, I dread to measure the inertia of commercial publishers.

Hyphens, minus, and dashes in Debian man pages

Posted Oct 25, 2023 14:02 UTC (Wed) by smoogen (subscriber, #97) [Link]

As a side note. I want to say thank you to both rra and branden. A lot of conversations about fonts and layout can lead to ill-chosen words between participants, because even a slightly off font can cause the brain to think 'lion, get ready to fight'. This conversation had instead a lot of 'we agree', and 'we can agree to disagree', and also a LOT of documents I have not read. [I need to get a copy of the updated Kernighan troff manual to add to my Kernighan collection!]

Again thank you for teaching and making this conversation something enjoyable to read.

Hyphens, minus, and dashes in Debian man pages

Posted Oct 24, 2023 7:26 UTC (Tue) by smurf (subscriber, #17840) [Link]

> the preferred characters in Unicode are U+2018 and U+2019

Depends on your locale; don't forget about U+201A. And then there's places where they use U+2039/U+203A … and other places where they use U+203A/U+2039. See https://en.wikipedia.org/wiki/Quotation_mark for even more enlightening examples.

Hyphens, minus, and dashes in Debian man pages

Posted Oct 24, 2023 16:24 UTC (Tue) by gray_-_wolf (subscriber, #131074) [Link]

> This is the point where my own struggles with problems like this over the last ten years have given me a lot of respect for the amount of thought that's gone into Unicode.

Maybe in some areas. The whole Han unification thing is in my opinion still a mistake. Having to know what language the text is in in order to be able to render it correctly is... annoying.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds