Debian switching to EGLIBC
Debian switching to EGLIBC
Posted May 8, 2009 7:08 UTC (Fri) by anselm (subscriber, #2796)In reply to: Debian switching to EGLIBC by spitzak
Parent article: Debian switching to EGLIBC
With respect, I think that you're oversimplifying things here.
It turns out that »the obvious way« to sort strings doesn't work for many languages other than English, which is precisely one of the reasons the locale concept was invented in the first place. Look up the collating rules for German and Swedish, for example, to see three different ways of collating the »ä« character, none of which corresponds to »the obvious way«. IMHO it does make some sense to put this sort of arcane knowledge into the standard library so that programmers (who are usually not also linguists) do not have to wonder where »ø« goes in the Danish alphabet, and so that a program has half a chance of doing string collation correctly in languages that the original programmers didn't even know existed, let alone catered for in their code. (I'm saying »half a chance« because of the next paragraph.)
Also it turns out that strcmp(3) doesn't, in fact, care about locales at all, so if you use strcmp(3) only in your programs you will not be surprised if the user changes their locale — it's the strcoll(3) function that is supposed to be used for locale-dependent string comparisons. (I do agree with you about the decimal separator issue in printf(3), though.)
I18N is a difficult issue at best, and it isn't helped by people who try cutting corners. Unicode/ISO-10646 and UTF-8 play an important role in making the problem easier to handle, but they're a fairly low-level part of the grand scheme of things. They're like the wheels on a car — indispensable for a smooth ride, but one would generally still like seats and a steering wheel, too.
Posted May 8, 2009 16:45 UTC (Fri)
by spitzak (guest, #4593)
[Link]
You are right that strcmp() does what is wanted. I believe I was remembering some scripting langauges where the string comparison changed depending on the locale, which was a nightmare because people rarely test in other locales.
The printf problem is really a pain and forces me to always force the locale to C at startup. I need to use printf, sometimes hidden inside scripting languages where I can't change it, to write data files that are expected to be readable by the same program even if the locale is different.
strcoll() is approximatly the right idea. Make it perfectly clear that this is some human-oriented sorting function. I think the real solution is to make all such functions take the locale as an argument, rather than using a static variable.
Debian switching to EGLIBC