|
|
Log in / Subscribe / Register

A report from the documentation maintainer

A report from the documentation maintainer

Posted Nov 7, 2016 10:08 UTC (Mon) by farnz (subscriber, #17727)
In reply to: A report from the documentation maintainer by tdz
Parent article: A report from the documentation maintainer

Aha, someone who can actually answer the underlying question for me!

Would you expect a case-insensitive equality operator to have "groß" == "gross" == "GROSS" == "GROß" == "GROẞ" (which the case-insensitive OS I've played with chooses to do in a German locale)?

Put differently, would you expect that if you searched for "groß" in a text document, you would not find matches for "GROSS" but would for "GROß"? Equally, if you searched a text document for "GROSS", would you expect to see matches for "groß", or only for "gross" in a case-insensitive search?


to post comments

A report from the documentation maintainer

Posted Nov 7, 2016 10:27 UTC (Mon) by johill (subscriber, #25196) [Link] (4 responses)

I'm in the same situation as tdz, being a native German speaker and never having seen the ẞ (upper-case) before. I actually appreciate if ss ends up being equivalent to ß in all cases, for multiple reasons:
  1. sometimes I don't have German keyboard settings available immediately, making it awkward to enter ß
  2. a document may use old or new orthography, so words like "Fluss" (river; this is the currently correct spelling) may be spelled as "Fluß" (old spelling)
  3. when spelled in headings/etc., "SS" will frequently be used to replace "ß"
So I'd argue that treating things as in your example ("groß" == "gross" == "GROSS" == "GROß" == "GROẞ") is helpful.

A report from the documentation maintainer

Posted Nov 7, 2016 11:20 UTC (Mon) by idrys (subscriber, #4347) [Link] (3 responses)

(native speaker here as well)

While this matching helps with words that can simply be written in two ways, I'd be rather surprised to get a match for a different word (like in Maßen vs. in Massen). And I think the new orthography is, for the most part, horrible (it emphasized writing over reading while neglecting that you know what you're writing but your reader doesn't). But adherents of the old orthography will die out over time anyway :/

A report from the documentation maintainer

Posted Nov 7, 2016 13:45 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

Hmm. Are you saying that, when doing a case-insensitive match, you'd really want the computer to be aware of the intended dictionary word? So that a search for "groß" matches all of "GROSS", "groß" and "gross" (leaving your human intelligence to determine which ones are "good" matches), while a search for "maßen" should match "Maßen" but not "MASSEN" or "Massen", because "Maßen" and "Massen" are different words? Or is there an underlying rule that I'm not seeing (something like "Maßen" should match "MASSEN" as an all-caps Maßen and "maßen" as missing the initial capital, but not "Massen" or "massen" because the casing rules let you see that ss was deliberate, not the result of round-tripping through upper case back to lower case)?

A report from the documentation maintainer

Posted Nov 7, 2016 14:30 UTC (Mon) by idrys (subscriber, #4347) [Link] (1 responses)

I'd prefer to not match eszet vs. double-s at all, generally. I understand and to a degree follow the reasoning, but I think it would cause more confusion than not. (And your example neatly illustrates this; too much side-knowledge required.)

I _could_ imagine an exception for eszet vs. upper-case double-s, but I'd be surprised if 'grep -i maßen' would find MASSEN as well... (And what about 'SZ' as a capitalization for 'ß'? It is now extremely uncommon, but I've seen this in documents up to the mid-20th century.)

[As an aside, old documents are sometimes inconsistent for eszet vs. double-s in people's names as well, as they sometimes capitalized names and sometimes not, so this is not a new issue. We are not 100% sure what the family name on my mother's side is for that reason. Oh well...]

A report from the documentation maintainer

Posted Nov 7, 2016 16:52 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

FYI, there also exist ligature codepoints like `fi` would need to be split apart on uppercase.

A report from the documentation maintainer

Posted Nov 10, 2016 14:39 UTC (Thu) by tdz (subscriber, #58733) [Link]

These are all different words, so they should probably not compare equal by default. Having the option of treating ß and ss that same could be useful, though. OTOH I never had this problem in practice.

In English, people sometimes (frequently?) confuse "its" and "it's". Treating them the same in text searches seems a comparable use case.

I thought about your question about ß in capital-letter advertising messages, but I can't remember having seen that anywhere. I could imagine that advertisers avoid using ß and ss in capital letters, because it doesn't look good either way.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds