Debian switching to EGLIBC
I have just uploaded Embedded GLIBC (EGLIBC) into the archive (it is currently waiting in the NEW queue), which will soon replace the GNU C Library (GLIBC)." The EGLIBC project has produced a version of the C library aimed primarily at embedded situations. Evidently the Debian developers feel that it is good enough for wider use, though, and they seem to strongly prefer the way that project is run upstream. (Thanks to Paul Wise).
Posted May 6, 2009 14:21 UTC (Wed)
by kragil (guest, #34373)
[Link] (24 responses)
Posted May 6, 2009 14:34 UTC (Wed)
by kragil (guest, #34373)
[Link]
Posted May 6, 2009 18:06 UTC (Wed)
by ejr (subscriber, #51652)
[Link] (22 responses)
I'm not saying he couldn't communicate his lack of time a tad more tactfully, but he still deserves respect. Also, I suspect the move to git is meant to make project-local forks easier.
Posted May 6, 2009 21:14 UTC (Wed)
by stevenb (guest, #11536)
[Link] (21 responses)
Posted May 6, 2009 22:30 UTC (Wed)
by alankila (guest, #47141)
[Link] (20 responses)
Not that I know anything of mr. Drepper. This has nothing to do with him.
Posted May 7, 2009 0:56 UTC (Thu)
by ofeeley (guest, #36105)
[Link] (10 responses)
The second instance[2] seems to show people misusing the comment facilities on bugzilla in order to harass the developer.
1. http://sourceware.org/bugzilla/show_bug.cgi?id=4403
2. http://sourceware.org/bugzilla/show_bug.cgi?id=4980
Anyway, that's just an outsiders perspective taking a random, lazy look at some of the evidence presented by the prosecution.
Posted May 7, 2009 1:07 UTC (Thu)
by jordanb (guest, #45668)
[Link] (4 responses)
A friend of mine sent in a patch to Emacs Tetris, to fix a bug that allowed you to cheat. RMS replied with "when you cheat at solitaire, who are you cheating?" but applied the patch anyway. Something like that would have been the proper response.
Anyway, it wasn't the strfry incident that lead Debian to this point, clearly, but more his refusal to accept patches that fixed bugs on ARM on account of it being "crap."
Posted May 7, 2009 1:23 UTC (Thu)
by ofeeley (guest, #36105)
[Link]
Posted May 7, 2009 14:25 UTC (Thu)
by Felix.Braun (guest, #3032)
[Link] (2 responses)
To be fair, if you read the relevant bug report, you'll see that Mr. Drepper fixed the bug in a different way. He even re-fixed his first implementation after that was discovered to be sub-optimal. So, there should be no complaints here.
Posted May 11, 2009 4:31 UTC (Mon)
by dirtyepic (guest, #30178)
[Link] (1 responses)
Posted May 11, 2009 6:04 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted May 7, 2009 6:52 UTC (Thu)
by nix (subscriber, #2304)
[Link] (4 responses)
Proof that it's a security improvement:
Posted May 7, 2009 10:02 UTC (Thu)
by viro (subscriber, #7872)
[Link] (3 responses)
Oddly enough, it *does* affect something. Such as the output of compiled binary (with stock glibc). Without -std=c99: out: 0x0p-15234. With it:
And a look at strtold(3) shows where the hell had that suggestion come from:
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
strtof(), strtold(): _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE; or cc -std=c99
Experiment with -D_ISOC99_SOURCE or -D_XOPEN_SOURCE=600 shows the same output as -std=c99. And that output looks a lot saner (10.5 * 2 * 2) than the crap produced without any of those.
I can't be arsed to dig through the macro hell in glibc headers, but that looks suspiciously like compat with some pre-c99 GNUism. Don't know, don't care and _really_ don't want to debug that shite. However, I'm quite certain that by this point you see the reasons for LART as well as I do. For the benefit of the other folks:
1) failure to RTFM
Nix, you *do* know better. The above is more than enough for you to construct quite a lovely verbal clue-by-four of your own, so I'll happily leave that as an exercise for reader.
Posted May 7, 2009 13:50 UTC (Thu)
by viro (subscriber, #7872)
[Link] (1 responses)
Posted May 7, 2009 14:38 UTC (Thu)
by nix (subscriber, #2304)
[Link]
(And absence of caffeine excuses all!)
Posted May 7, 2009 13:58 UTC (Thu)
by k8to (guest, #15413)
[Link]
Posted May 7, 2009 5:16 UTC (Thu)
by PaulWay (subscriber, #45600)
[Link] (8 responses)
There are three fallacies with this approach that make it wrong. The first one is the idea that there's some magic number that you can derive that gives you the worth of a person and eir contribution to a specific project or goal. There's a whole sociology course of reasons there, but a simpler one might be that it gets down to the 'mythical man-month' problem - you can't just hire another 100 programmers and make the project go 100 times faster, or even as fast.
Secondly, you're biasing our reactions by picking a large number. If Ulrich is worth 1.1 other programmers, what do you do?
However, what it really comes down to for me is that there are a bunch of groups - programmers, Red Hat, open source enthusiasts - who should just say "no" to antisocial behaviour. Calling people idiots or defining their work as a waste of time should never be excusable. If Ulrich was being sexist, defamatory or racist, would that still be OK (even if he was worth 100 other programmers)? I personally say no.
When we implicitly or explicitly excuse this kind of obnoxious behaviour, we make it acceptable. That to me is just not on. We can hardly say "show me the code" if we then scare everyone off who tries to contribute code. There are plenty of ways to deal with the bug reports that the EGLIBC people link to that would have been polite and helpful but still firm in his opinion and would have taken even less time to do than Ulrich expended on his inflammatory outbursts. We should learn better ways to interact, rather than excuse bad behaviour.
Have fun,
Paul
Posted May 7, 2009 10:06 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (5 responses)
Posted May 7, 2009 21:55 UTC (Thu)
by man_ls (guest, #15091)
[Link] (1 responses)
About character flaws I think you are quite right. In some contexts buffering the guru is an effective way to dealing with an ill-mannered engineer.
Posted May 8, 2009 1:55 UTC (Fri)
by PaulWay (subscriber, #45600)
[Link]
Have fun,
Paul
Posted May 8, 2009 1:54 UTC (Fri)
by PaulWay (subscriber, #45600)
[Link] (1 responses)
Yeah, right. Try being sexist, defamatory or racist in the workplace and see how your contract covers you. Try being sexist, defamatory or racist on a public internet forum and see how the forum's or your ISP's conditions of use work. Try getting up in a public place and being sexist, defamatory or racist and see how you go with the police.
People have the right to their own thoughts, which I think is where you're coming from. But public behaviour is what's in question here. And I'll bet you a beer that Ulrich's contract with Red Hat allows them to fire him for being sexist, defamatory or racist in public.
And even if there is some specific condition where being sexist, defamatory or racist is allowed, I still reckon it's bad behaviour. I don't care if someone is a thousand times better at coding than anyone else, being rude and obnoxious to other people is bad for that person,for the projects they work for and for the community they're involved in.
So to me it seems pretty counterproductive to hide behind some technical legal argument to excuse behaviour which, as I said originally, is completely unnecessary and caused Ulrich more hassle rather than less.
Have fun,
Paul
Posted May 8, 2009 18:08 UTC (Fri)
by khim (subscriber, #9252)
[Link]
If you are working for US Company - it'll be written in contract. In
other countries it's free right (some European countries are emulation US
but to smaller degree). You can be fired if you disrupt the team,
but just for being sexist or racist? Puhlease. When our company was bought
by US one we had special sessions to explain exactly what natural things
(like sexist jokes or jokes about other minorities) we shouldn't do from
now on. Everyone agreed that it's idiotic, but hey - these guys are paying
me, they establish rules. Probably you are visiting totally different forms from me - because I'm
seeing sexist, defamatory and racist remarks quite often. And not just from
trolls. Again: if we are not talking about US where (as Heinlein noted
half-sentury ago) everyone is so proud to point out that they have
absolutely nothing against your skin color, face or sexual
orientation... For being sexist? This is a joke. For being defamatory or racist...
possibility is there but you need to spend A LOT OF effort to reach this
point.
Posted May 11, 2009 11:01 UTC (Mon)
by incase (guest, #37115)
[Link]
In Germany, the topmost grant our constitution grants it dignity. Freedom of thought, religion and speech is also in their of course, but it is below the grants for dignity, and some other things like prohibition of discrimination because of race, sex/gender, religion and other things, your right not to be injured in physical or mental well-being etc.
regards,
Posted May 8, 2009 10:46 UTC (Fri)
by alankila (guest, #47141)
[Link] (1 responses)
Fair enough. It was a sketch of an argument. I picked a large, round number to make it obvious that I was not serious. What do you think 100 times more rude even means? It doesn't really mean anything.
"Secondly, you're biasing our reactions by picking a large number. If Ulrich is worth 1.1 other programmers, what do you do?"
In the argument, he is allowed to be up to 1.1 times more rude to compensate for technical prowess.
"Calling people idiots or defining their work as a waste of time should never be excusable."
I'll pick on the word "never". Let's take it literally. If a person is an idiot, and the work is waste of time, perhaps it would be good idea to call it such? I don't see what denying reality gets you, other than blinders that cause you to make mistakes.
Posted May 8, 2009 10:53 UTC (Fri)
by alankila (guest, #47141)
[Link]
To skillfully use this palette is completely another matter: it's horribly crude to use wrong kind of expression, and doesn't reflect well on the speaker. Communication is, after all, a process where you need to make yourself understood to another. Whether I just transgressed remains to be seen...
Posted May 6, 2009 14:28 UTC (Wed)
by BrucePerens (guest, #2510)
[Link] (5 responses)
Posted May 6, 2009 16:19 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted May 6, 2009 17:06 UTC (Wed)
by ajross (guest, #4563)
[Link] (2 responses)
Posted May 6, 2009 19:57 UTC (Wed)
by nix (subscriber, #2304)
[Link] (1 responses)
It is fated.
:)
Posted May 6, 2009 20:57 UTC (Wed)
by njs (subscriber, #40338)
[Link]
Posted May 6, 2009 17:07 UTC (Wed)
by herodiade (guest, #52755)
[Link]
For other users, maybe not (there may be some attractiveness in avoiding to deal witch such an upstream, or just to shake things up). But for the Debian project, the Ulrich Drepper's un-willingness to cooperate on basics embedded needs is a real problem.
So much that the DPL (Debian Project Leader) had to (unsuccessfully) ask the "GNU C Library Steering Committee" for help/mediation on this topic.
http://lists.debian.org/debian-glibc/2007/10/msg00038.html
Posted May 6, 2009 14:34 UTC (Wed)
by dcg (subscriber, #9198)
[Link]
Posted May 6, 2009 14:35 UTC (Wed)
by alex (subscriber, #1355)
[Link] (8 responses)
Still fork's live or die on their user base and people using it. I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!
Posted May 6, 2009 14:38 UTC (Wed)
by mbanck (subscriber, #9035)
[Link] (5 responses)
No idea where you got that idea from but this page says it is a subversion repository. Likely they will follow suit and switch to git as well after the glibc git switch has been finished. Michael
Posted May 6, 2009 15:31 UTC (Wed)
by alex (subscriber, #1355)
[Link]
Hopefully when glibc goes git it will make maintenance a lot easier. Decent DVCS semantics make maintaining patch sets against a baseline so much easier.
Posted May 6, 2009 16:02 UTC (Wed)
by dwmw2 (subscriber, #2063)
[Link] (3 responses)
Posted May 6, 2009 16:31 UTC (Wed)
by mbanck (subscriber, #9035)
[Link]
Michael
Posted May 6, 2009 16:45 UTC (Wed)
by epa (subscriber, #39769)
[Link]
Posted May 6, 2009 19:50 UTC (Wed)
by brouhaha (subscriber, #1698)
[Link]
Having used both extensively, I must strongly disagree. SVN is a huge improvement over CVS. The main potential issue with SVN is that it uses the central repository model. For many projects there's no problem with that; it really depends on how the project is organized. I have no idea whether it makes sense for eglibc.
Posted May 6, 2009 14:41 UTC (Wed)
by mbanck (subscriber, #9035)
[Link]
Eglibc is not a different implementation - it is basically a branch of glibc which is regularly synced and which you can fine-configure how you like. Plus its maintainers say they will be more friendly towards users etc. Michael
Posted May 6, 2009 14:47 UTC (Wed)
by maks (guest, #32426)
[Link]
guess eglibc will follow asap.
Posted May 6, 2009 16:06 UTC (Wed)
by kjp (guest, #39639)
[Link] (42 responses)
*glares at selinux infecting everything in fedora*
Posted May 6, 2009 16:23 UTC (Wed)
by nix (subscriber, #2304)
[Link] (36 responses)
(I resisted it for years, but it makes *so* many things so much easier.)
Posted May 6, 2009 16:46 UTC (Wed)
by ajross (guest, #4563)
[Link] (25 responses)
Posted May 6, 2009 19:50 UTC (Wed)
by nix (subscriber, #2304)
[Link] (1 responses)
Most of the space consumption is charmaps (which we can't do without, and
Posted May 8, 2009 1:56 UTC (Fri)
by spitzak (guest, #4593)
[Link]
Any intelligent person would have just made a different %-command. It really is not very hard for a programmer to choose what to do based on the locale. The C library, if it provides anything at all, should only provide a "this is the locale" call. It should have ZERO effect on the behavior of any functions that do not actually take a locale as an argument.
Posted May 6, 2009 20:47 UTC (Wed)
by rfunk (subscriber, #4054)
[Link] (18 responses)
Posted May 6, 2009 21:35 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted May 8, 2009 1:54 UTC (Fri)
by spitzak (guest, #4593)
[Link] (16 responses)
UTF-8 is TRIVIAL if people would just WAKE UP and realize that it *is* trivial. The ONLY people who care where character boundaries are is people writing low-level rendering routines that have to look up font glyphs.
But for some reason the fact that a byte array represents a series of characters causes otherwise intelligent programmers to turn into complete morons. It suddenly becomes IMPOSSIBLE to work with the bytes, just because of the type of data in the string!
Here is a thought experiment: why in the world are we capable of making files containing English text when all the *words* are different sizes! Why it must be impossible! Counting words will be so slow and inefficient! How could the programs ever work?
Posted May 8, 2009 13:55 UTC (Fri)
by nix (subscriber, #2304)
[Link] (15 responses)
Touching individual bytes in a unicode string outside of something like
This is all library stuff, yes, sure... except when it isn't.
Posted May 8, 2009 15:18 UTC (Fri)
by endecotp (guest, #36428)
[Link] (11 responses)
For example, if I'm parsing a UTF-8 CSV file into rows and columns then I can treat it as a byte stream, since the punctuation characters (eg ,"\NL) are all single bytes and those bytes are guaranteed not to occur in multi-byte characters.
As another example, I can search-and-replace one character sequence with another character sequence by treating the text, pattern and replacement as byte sequences - even if there are multibyte characters in the text, pattern, or replacement.
My experience is that the only places where UTF-8 cannot be treated as byte streams are: GUI and similar I/O, sorting and case conversion when the result needs to look right for a human, and interfaces that specify an encoding other than UTF-8.
Posted May 8, 2009 16:56 UTC (Fri)
by spitzak (guest, #4593)
[Link]
I do not understand why so many otherwise intelligent and experienced software engineers turn into such complete morons when they think about UTF-8.
Even more annoying: programmers do not seem to have this mental block when presented with the older multibyte Asian encodings, or with UTF-16 which is variable length as well. For some reason people only assign these made-up problems to UTF-8.
Posted May 8, 2009 17:28 UTC (Fri)
by nix (subscriber, #2304)
[Link] (5 responses)
The rest of your points stand: if all you want to do is manipulate ASCII
I suppose misbehaviour from this change is unlikely *if* you're in the US.
Posted May 8, 2009 17:35 UTC (Fri)
by ajross (guest, #4563)
[Link] (2 responses)
The product I work on for my day job does natural language processing of internet content in arbitrary languages and encodings. I did the encoding transformation and "word breaker" lexical analyzer for it. The whole system works by transforming the data into UTF-8 and operating on it at the byte level. So sorry to pull the "domain expert" card here, but you're basically just wrong. This stuff has its subtleties, but it's absolutely not something that requires special API support. And if we *had* to pick an API, I can guarantee you it wouldn't be ANSI C's locale stuff, which is a complete non-starter for many of the reasons already detailed.
Posted May 8, 2009 18:47 UTC (Fri)
by nix (subscriber, #2304)
[Link] (1 responses)
-- N., just wasted three months auditing and fixing countless places in a
Posted May 8, 2009 21:55 UTC (Fri)
by spitzak (guest, #4593)
[Link]
1. bytes that are not allowed in UTF-8.
I think first & second bytes should pass the isalpha() test. This will allow UTF-8 letters to be put into identifiers and keywords (of course it also allows UTF-8 punctuation and lots of other stuff but that is about the best that can be done). I also think ctype should not vary depending on locale, this is another thing that causes me nothing but trouble, most programmers revert to doing ">='a' && <='z'" and thus make their software even less portable.
Probably the ctype tables should add some bits to identify these byte types.
Posted May 8, 2009 21:48 UTC (Fri)
by spitzak (guest, #4593)
[Link]
I do hope a program trying to parse for a period only looks for the ASCII period. As soon as you start saying other Unicode characters are "equivalent" then you get a huge mess because different programs may disagree on what is in the equivalent set, and Unicode could add a new character at any time. We already have quite a mess with newlines, lets not make it worse! The only software that should be looking for Unicode punctuation is actual glyph layout and rendering.
Posted May 11, 2009 16:01 UTC (Mon)
by endecotp (guest, #36428)
[Link]
I was referring to the punctuation characters used to delimit CSV, which are all ASCII characters (as are those used in XML).
> The rest of your points stand: if all you want to do is manipulate
My points were that you can do all of those things (e.g. search and replace) EVEN IF the input is non-ASCII.
Your example of delete key behaviour is an interesting one that comes under my category of "GUI and similar I/O". It is clearly necessary to delete back as far as the last character-starting byte. Doing so is not very hard.
> I suppose misbehaviour from this change is unlikely *if* you're in
I am not in the U.S., and my code works with UTF-8 without the sort of major headaches that you allude to.
Posted May 10, 2009 14:46 UTC (Sun)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted May 11, 2009 16:06 UTC (Mon)
by ajross (guest, #4563)
[Link] (1 responses)
Avoiding UTF-8 in the blind expectation that it somehow makes your code more "secure" is just plain wrong. This kind of mistake is exactly what I'm talking about. People attribute to encoding transformation and I18N all sorts of complexities that aren't actually there in practice.
Posted May 19, 2009 9:18 UTC (Tue)
by epa (subscriber, #39769)
[Link]
Posted May 11, 2009 17:27 UTC (Mon)
by spitzak (guest, #4593)
[Link]
Errors in UTF-8 should be treated as single byte entities. Four four-byte prefixes in a row are 4 errors, not a single 4-byte error. You can't split an error if it is only one byte long.
This also means that ASCII characters cannot be "inside an error" so that errors have zero effect on programs that are looking for ASCII only.
It also means it is impossible to make a pointer "inside" an error or to split one. It is also vital to treat errors this way (even if converting to other encodings) so that concatenation to a string ending in an error cannot convert a good character at the start of the next string into an error.
Posted May 8, 2009 16:49 UTC (Fri)
by spitzak (guest, #4593)
[Link] (2 responses)
UTF-8 is in fact trivial. You are basically doing exactly what I am complaining about: panicking that there is some magical problem with not looking for the character boundaries. Try comparing it to words: how much of a word processor is able to ignore word boundaries? Almost all of it. But that does not somehow make it impossible for word wrap and word deletion to work.
It's not rocket science. The problem is people who are so convinced it is that they complicate things to no end and are hurting I18N and everybody.
Posted May 8, 2009 16:57 UTC (Fri)
by ajross (guest, #4563)
[Link] (1 responses)
Really, this stuff is easy once you get used to it.
Posted May 8, 2009 17:33 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted May 6, 2009 23:37 UTC (Wed)
by rleigh (guest, #14622)
[Link] (3 responses)
http://lists.debian.org/debian-policy/2009/04/msg00018.html
For the various reasons outlined in the text, we are considering
This will give us native UTF-8 end-to-end from source code to
Regards,
Posted May 7, 2009 6:47 UTC (Thu)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted May 8, 2009 2:02 UTC (Fri)
by spitzak (guest, #4593)
[Link] (1 responses)
Don't panic about UTF-8. The biggest problem with it is people who do not understand it, some of them are good enough programmers that they might write some code that is very damaging, where they actually try to interpret the UTF-8 encoding.
The only real bug in Unix with UTF-8 is a whole lot of documentation that says "character" where it should say "byte". There is nothing wrong with the current implementations.
Posted May 8, 2009 13:57 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted May 6, 2009 18:43 UTC (Wed)
by jordanb (guest, #45668)
[Link]
Programs still need to know what language the user(s) of the system prefer to see responses in, if their radix mark is a ',' or a '.', if they count days using the Gregorian or Chinese calendar, etc.
I agree that the world would be a brighter place if it could be said "all text on disk or streamed on the network MUST be in Unicode's UTF-8 encoding" and then locales could just say "en_US" instead of "en_US-UTF-8" but issues of representation and encoding are only half of the localization problem.
Posted May 6, 2009 19:18 UTC (Wed)
by jreiser (subscriber, #11027)
[Link] (5 responses)
Posted May 6, 2009 20:15 UTC (Wed)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Posted May 6, 2009 22:13 UTC (Wed)
by vmole (guest, #111)
[Link] (1 responses)
Are you using localepurge, by chance? It removes undesired locales after each apt run.
Posted May 7, 2009 0:48 UTC (Thu)
by ABCD (subscriber, #53650)
[Link]
Posted May 6, 2009 20:29 UTC (Wed)
by nix (subscriber, #2304)
[Link]
(Debian has a /usr/sbin/locale-gen script and /etc/locale.gen file for
Posted May 12, 2009 4:59 UTC (Tue)
by dirtyepic (guest, #30178)
[Link]
Posted May 8, 2009 1:45 UTC (Fri)
by spitzak (guest, #4593)
[Link] (2 responses)
Locales are an obsolete idea that UTF-8 is intended to solve. (yea I know locales also change the format some printing and strcmp does which has caused nothing but grief and could be done trivially by the program writer if they really wanted it rather than predictable results).
Posted May 8, 2009 7:08 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (1 responses)
With respect, I think that you're oversimplifying things here.
It turns out that »the obvious way« to sort strings doesn't work for many
languages other than English, which is precisely one of the reasons the
locale concept was invented in the first place. Look up the collating
rules for
German and Swedish, for example, to see three different ways of collating
the »ä« character, none of which corresponds to »the obvious way«. IMHO it
does make some sense to put this sort of arcane knowledge into the
standard library so that programmers (who are usually not also linguists)
do not have to wonder where »ø« goes in the Danish alphabet, and so that a
program has half a chance of doing string collation correctly in languages
that the original programmers didn't even know existed, let alone catered
for in their code. (I'm saying »half a chance« because of the next
paragraph.)
Also it turns out that strcmp(3) doesn't, in fact, care
about locales at all, so if you use strcmp(3) only in your programs
you will not be surprised if the user changes their locale —
it's the strcoll(3) function that is supposed to be
used for locale-dependent string
comparisons. (I do agree with you about the
decimal separator issue in printf(3), though.)
I18N is a difficult issue at best, and it isn't helped by people who try
cutting corners. Unicode/ISO-10646 and UTF-8 play an important role in
making the problem easier to handle, but they're a fairly low-level part
of the grand scheme of things. They're like the wheels on a car —
indispensable for a smooth ride, but one would generally still like seats
and a steering
wheel, too.
Posted May 8, 2009 16:45 UTC (Fri)
by spitzak (guest, #4593)
[Link]
You are right that strcmp() does what is wanted. I believe I was remembering some scripting langauges where the string comparison changed depending on the locale, which was a nightmare because people rarely test in other locales.
The printf problem is really a pain and forces me to always force the locale to C at startup. I need to use printf, sometimes hidden inside scripting languages where I can't change it, to write data files that are expected to be readable by the same program even if the locale is different.
strcoll() is approximatly the right idea. Make it perfectly clear that this is some human-oriented sorting function. I think the real solution is to make all such functions take the locale as an argument, rather than using a static variable.
Posted May 6, 2009 18:59 UTC (Wed)
by atai (subscriber, #10977)
[Link] (2 responses)
If you happen to be the maintainer, there will be a fork tomorrow because you are the best example why project forks... you only care about your particular situation.
No locale? What about people in non-English countries?
Posted May 6, 2009 19:14 UTC (Wed)
by dlang (guest, #313)
[Link] (1 responses)
today that's _very_ hard to do (you may be able to do it on a gentoo system, I'm not sure)
Posted May 6, 2009 19:53 UTC (Wed)
by drag (guest, #31333)
[Link]
And locales do matter in desktop situations. How else is suppose to know which dictionaries I want to use or other things of that nature?
Posted May 7, 2009 5:25 UTC (Thu)
by PaulWay (subscriber, #45600)
[Link] (1 responses)
So why aren't you using Gentoo, where you get to customise and optimise everything? Why aren't you using text modes and avoiding the bloat of graphics? Do you even know when you're being tacitly saved from confusion and hassle - or, in the case of SELinux, from actual bugs - by things 'just working' because of this "infective" bloat?
Sadly, it would seem you're too busy using pejoratives.
Have fun,
Paul
Posted May 13, 2009 18:20 UTC (Wed)
by kjp (guest, #39639)
[Link]
Posted May 6, 2009 16:33 UTC (Wed)
by epa (subscriber, #39769)
[Link] (8 responses)
Posted May 6, 2009 17:23 UTC (Wed)
by nevyn (guest, #33129)
[Link] (1 responses)
Probably, although they aren't really that much better (if you want a real string API in C, go use one). Having a compatible version of asprintf() will be nice though (assuming he takes that), and I imagine there's a bunch of other minor things on other people's wishlist.
Posted May 6, 2009 17:27 UTC (Wed)
by notting (guest, #28878)
[Link]
Posted May 6, 2009 17:31 UTC (Wed)
by aurel32 (subscriber, #7059)
[Link] (3 responses)
Posted May 7, 2009 8:03 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted May 7, 2009 9:25 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted May 7, 2009 12:05 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted May 7, 2009 8:46 UTC (Thu)
by ncm (guest, #165)
[Link] (1 responses)
Posted May 7, 2009 14:46 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted May 6, 2009 17:30 UTC (Wed)
by ikm (guest, #493)
[Link] (7 responses)
Posted May 6, 2009 19:27 UTC (Wed)
by rahulsundaram (subscriber, #21946)
[Link]
Posted May 6, 2009 19:54 UTC (Wed)
by drag (guest, #31333)
[Link] (5 responses)
The XML thing was just used as a patch to deal with a VERY badly designed POSIX API feature.
Posted May 6, 2009 20:04 UTC (Wed)
by nix (subscriber, #2304)
[Link] (4 responses)
Posted May 6, 2009 20:09 UTC (Wed)
by ikm (guest, #493)
[Link] (3 responses)
Posted May 6, 2009 20:37 UTC (Wed)
by nix (subscriber, #2304)
[Link] (2 responses)
Some also argued apparently seriously that malloc() and dynamic memory
Posted May 6, 2009 20:45 UTC (Wed)
by ikm (guest, #493)
[Link]
Posted May 7, 2009 4:05 UTC (Thu)
by csamuel (✭ supporter ✭, #2624)
[Link]
Posted May 6, 2009 22:34 UTC (Wed)
by smoogen (subscriber, #97)
[Link] (5 responses)
I wish Debian with their move to eglibc. It will be interesting to see how much code will need to be fixed due to various glibc assumptions people may have made over the years.
Posted May 6, 2009 23:12 UTC (Wed)
by mbanck (subscriber, #9035)
[Link] (4 responses)
I assume the day eglibc will become incompatible with glibc, Debian will redecide which upstream to follow.
Posted May 7, 2009 6:49 UTC (Thu)
by drag (guest, #31333)
[Link] (3 responses)
Now that I've had time to think about it some it seems that the move to eglibc isn't as much
Maybe the term should be 'SPORK' instead of 'FORK'. I've noticed that this is happenned with a
There has formed a multitude of MySQL sporks over the time. Stuff like OurDelta:
As a response to the slow and somewhat negative behavior with Sun Microsystems regarding
Another example coming from Sun would be Go-OO.org
In both cases they don't really want to _fork_ their projects. But they have requirements or
Posted May 7, 2009 8:31 UTC (Thu)
by nhippi (subscriber, #34640)
[Link]
The upstream certainly has all the rights to put up whatever QA and codestyle requirements they want. The maintainer of the package has the duty to fulfill such requirements and correct any issues in the patch noted by upstream. The end result is better for everyone - debian gets a better fix, and all the other users get the fix too when upstream releases new version.
But what do when upstream plainly refuses to accept a patch? Or tells you to "Go away, stop wasting everyones time"?
One option in such cases is to start maintaining a explicit fork (or spork, as mentioned here). It is more honest for endusers than maintaining a ever-growing stack of patches hidden in a distribution source package. And if others have the same problem(s) with upstream, the spork allows sharing the maintenance burden.
Other options could be switching the maintainer (it is always the upstream that has co-operation problems...) or dropping the package all together (if there are better alternatives).
Posted May 7, 2009 9:26 UTC (Thu)
by pabs (subscriber, #43278)
[Link]
Posted May 7, 2009 13:19 UTC (Thu)
by btraynor (guest, #26672)
[Link]
EGLIBC has been around since November 2006 or so. It is a fork of glibc, given this definition, "a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct piece of software" -- http://en.wikipedia.org/wiki/Fork_(software_development).
Also, #eglibc on freenode is alive.
Posted May 6, 2009 23:29 UTC (Wed)
by sbergman27 (guest, #10767)
[Link] (6 responses)
Unlike with the XFree86 situation, the *BSDs won't continue to use glibc for years afterward, because they've never used glibc in the first place. Ulrich and his project could become irrelevant remarkably quickly if Debian really goes through with this.
Hopefully also unlike the Xfree86 situation, we won't spend years waiting for some big architectural overhaul to be completed before the real progress can begin.
Posted May 6, 2009 23:48 UTC (Wed)
by stevenj (guest, #421)
[Link] (3 responses)
Except for the fact that Drepper is employed by Red Hat, which plays a major role in Fedora governance. If Red Hat wanted to switch glibc maintainers, they would presumably have fired him long ago.
Posted May 7, 2009 0:06 UTC (Thu)
by sbergman27 (guest, #10767)
[Link] (2 responses)
Posted May 7, 2009 0:22 UTC (Thu)
by jordanb (guest, #45668)
[Link] (1 responses)
Posted May 7, 2009 0:36 UTC (Thu)
by sbergman27 (guest, #10767)
[Link]
Posted May 7, 2009 6:07 UTC (Thu)
by pjm (guest, #2080)
[Link]
I can't argue with what could happen, but that doesn't seem likely. eglibc will continue to sync from glibc; thus, glibc will continue to have relevance. Ulrich has considerable experience and (I gather) skill in working on glibc; eglibc adoption simply reduces his interactions with users, which is probably good for both Ulrich as much as anyone. So I think Ulrich will continue to head glibc development (and hence be relevant to eglibc and everyone using it) for quite some time to come.
Posted May 7, 2009 6:39 UTC (Thu)
by nix (subscriber, #2304)
[Link]
My biggest hope here is that better docs can be written. Ulrich has a
Posted May 7, 2009 2:22 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (1 responses)
Even with this fork, he'll still be doing a large amount of the work going into each eglibc release, via merges with glibc upstream. If eglibc takes off and proves to work well, he'll hopefully consider merging things from it once they've matured that he might not have been too happy about introducing untried into glibc.
That sounds like a win to me. I just hope these don't become antagonistic forks where work is wasted on duplicating already-completed fixes and features, and on pointless flaming.
Posted May 7, 2009 6:55 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted May 7, 2009 10:51 UTC (Thu)
by lmb (subscriber, #39048)
[Link] (3 responses)
These people sometimes bring exceptional technical skills to the community (alas, yours truly is not affected here), which they would have not been able to leverage in a more typical day-to-day office setting with pair-programming, lots of communication, and so on.
That is by no means an excuse for being rude, and sometimes apparently not even trying to overcome the issue (but apparently rejoicing in it), as often evident on LKML - but I dare say it is advisable to find ways how to integrate them, instead of forking away from them.
Personally, having been on some projects affected by such people, I think they make wonderful to exceptional engineers. It becomes difficult when they remain project leaders of a growing community.
Of course, it's not always possible for them to accept switching roles (control issues anyone?), but it should at least be considered by everyone. And if so, proposed in a face saving way for all involved - the comments on the terse (and arguably rude) bugzilla responses were just equally rude and juvenile, and certainly unlikely to yield a positive response.
If even that fails, sure, fork - or build up a trailing repository which pulls from the former upstream frequently as done here. Maybe even do that in parallel, to demonstrate seriousness (and ability). But don't forget the other side.
(And yes, before someone reminds me about what we did with Linux-HA, yes, sometimes, after all this has failed, running for your life is the only way to remain sane.)
Posted May 7, 2009 14:50 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted May 7, 2009 18:08 UTC (Thu)
by drag (guest, #31333)
[Link]
Posted May 7, 2009 22:39 UTC (Thu)
by man_ls (guest, #15091)
[Link]
Posted May 7, 2009 18:49 UTC (Thu)
by mohitsingh (guest, #58480)
[Link]
Then we remind of the language being used all over. EGLIBC site says "Debian will be switching from GLIBC to EGLIBC". That seems marriage declaration. No scope for an engagement which may/may not lead to marriage.
Final Decision! Embedded Flavor in Open System! So the next version of my desktop distro may well be inspired by one running my washing machine. Child suddenly seem to be the father of man. Is the GOD of small things listening?
MS
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
The guy fixed a bug in [strfry()] though. Clearly it was important enough to him to do that analysis, and then fix what was wrong. All Drepper had to do is get over himself long enough to apply it.
Debian switching to EGLIBC
Debian switching to EGLIBC
(forever) and refusing to explain why. That's happened at least once.
Debian switching to EGLIBC
<http://sourceware.org/bugzilla/show_bug.cgi?id=7065>. A security
improvement, marked as 'never going to happen' without reason. (Repeated
requests by several people to provide *any* kind of rationale for
rejecting a zero-effect-when-compiled-out security improvement went
unanswered. Ulrich would be much more tolerable if he said *why* he did
what he did occasionally, but he seems to assume everyone else is
telepathic.)
<http://sourceware.org/bugzilla/show_bug.cgi?id=7066>. A buffer overrun,
analysis ignored. Who knows why, at least he didn't say 'never going to
happen' about this one.
Debian switching to EGLIBC
out: 0xa.8p+2.
2) dismissing relevant "did you have $FOO in arguments?" with "oh, it can't matter at all"
3) failure to RTFM even after that (if nothing else, to see WTF had that question been about)
4) reporting nasal daemons as security hole, instead of (if I've parsed that bug report correctly) some crap somewhere in the clusterfsck of makefiles around glibc testsuite, either present in the original or introduced by yourself.
5) refering to the entire sad story as to evidence of security hole found by proposed patch.
Debian switching to EGLIBC
My apologies - I've misparsed your reply to Petr, so the idiocy in this case is sure as hell mine.
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
> was worth, say, 100 of your other programmers, perhaps you should
> tolerate up to 100 times more rudeness in compensation and still count
> yourself coming ahead.
Debian switching to EGLIBC
I still think that PaulWay is right about the first fallacy, but maybe not the way he intended. Sometimes an individual's worth cannot be measured as 100 normal fellows, because sometimes he or she can enable things which were not even feasible before. There is no replacement even with hordes of developers. Think about Science and the contributions of great individuals -- the kind of leap of a Pasteur or a Newton cannot be done by a thousand clever fellows. Engineering is sometimes the same, as is software engineering.
Character flaws
Character flaws
Debian switching to EGLIBC
Don't be so sure
Yeah, right. Try being sexist, defamatory or racist in the
workplace and see how your contract covers you.
Try being sexist, defamatory or racist on a public internet
forum and see how the forum's or your ISP's conditions of use
work.
Try getting up in a public place and being sexist, defamatory
or racist and see how you go with the police.
Right of being defamatory?
And this is how it ought to be in my opinion.
Sven
Debian switching to EGLIBC
Debian switching to EGLIBC
This sounds like the X fork away from David Dawes. I can't believe that it's the embedded nature of the project that is attractive.
Debian switching to EGLIBC
Debian switching to EGLIBC
third-party contributors. :/
Debian switching to EGLIBC
Debian switching to EGLIBC
the same letters.
Perhaps more to the point, it appears to involve some of the same people (eglibc development is organized by CodeSourcery; I suspect Mark Mitchell is familiar with gcc's history), and has the same copyright assignment rules.
Debian switching to EGLIBC
Debian switching to EGLIBC
Had to happen
Debian switching to EGLIBC
I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!Debian switching to EGLIBC
SVN/CVS
Debian switching to EGLIBC
Ew, that's even worse. CVS may be a throwback to the previous century, but SVN never had any justification for existing at all. It was never enough of an improvement over CVS to be worth the difference, and it's verging on insane to use it for new projects these days instead of git.
I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!
No idea where you got that idea from but this page says it is a subversion repository.Debian switching to EGLIBC
Debian switching to EGLIBC
[SVN] was never enough of an improvement over CVS to be worth the difference,
Debian switching to EGLIBC
I had no idea things where so bad in mainline for Debian to consider switching libc implementations.Debian switching to EGLIBC
Debian switching to EGLIBC
http://sourceware.org/ml/libc-alpha/2009-04/msg00034.html
Debian switching to EGLIBC
Debian switching to EGLIBC
without UTF-8?
Debian switching to EGLIBC
Debian switching to EGLIBC
wchar stuff is decidedly minority and doesn't use more than a few tens of
Kb.
the largest ones are Far Eastern and Unicode), converters to/from UTF-8
(obviously necessary if we want to handle other encodings at *all*) and
timezones (which we can't do without, although it would be nice if glibc
contained an interface to let us use the historical data in them properly:
Ulrich has explicitly (and bluntly) ruled this out without giving a
rationale. As usual. Maybe eglibc can add an appropriate interface.)
Debian switching to EGLIBC
Locales and UTF-8
can be up to four bytes. It's only the old ASCII characters that are still
only one byte in UTF-8; the Latin-1 extensions, for example, are two
bytes. Therefore the old APIs that assume 1 byte = 1 character are not so
useful with UTF-8.
Locales and UTF-8
and all the interfaces to deal with that.
Locales and UTF-8
Locales and UTF-8
stringare people writing routines taking textual input, routines producing
textual output, routines modifying text strings, routines manipulating
text strings in *any* way that depends on anything a human would care
about. I can see how this could be considered rare.
serialization makes as much sense as touching individual bits in it does
(except of course that you have to touch both in order to convert the
UTF-8 into actual Unicode code points and back).
Locales and UTF-8
Locales and UTF-8
Locales and UTF-8
punctuation is up above U+2000, for instance, including U+2010 (the
hyphen) and U+2003 (the em space). Helpfully this is somewhat jumbled up
with nonpunctuation stuff like numeric superscripts.
characters in a UTF-8 stream, you can do that without being Unicode-aware
at all. But this will tend to annoy your users when they type in a € and
find that your program can't manipulate it because it's U+20AC. It'll
annoy your users even more to find that they can remove some characters,
but that others take several keystrokes to remove and miraculously
transmogrify into other characters as they do so. (More mess: the Euro
cent sign is U+00A2!)
Anywhere else? Bite your knuckles.
Locales and UTF-8
Locales and UTF-8
even *much*, it's pretty nasty). And, as I said, it'll be interesting to
see what breaks. (I suspect not much will: most things that need to be
*are* Unicode-aware, on Debian at least. But it might get hair-raising.)
horrible financial application to allow for UTF-8 awareness (the simplest
example: lots of places in that software cared if something
was 'alphanumeric', for instance, and isalpha() really doesn't work). It
could have been worse: before I came along they were planning to move to
UCS-2, hark at the forward planning and lovely C-compatibility...
Locales and UTF-8
2. "second" bytes
3. "first" bytes
Locales and UTF-8
Locales and UTF-8
> ASCII characters in a UTF-8 stream, you can do that without being
> Unicode-aware
> the US. Anywhere else? Bite your knuckles.
Locales and UTF-8
For example, if I'm parsing a UTF-8 CSV file into rows and columns then I can treat it as a byte stream, since the punctuation characters (eg ,"\NL) are all single bytes and those bytes are guaranteed not to occur in multi-byte characters.
This is true if you know that your input is valid UTF-8. However if it might be malformed, then your program could end up splitting a row in the middle of an (invalid) character sequence and producing different invalid sequences as output. This is often fine: garbage in, garbage out. But there can be interesting security holes where malformed UTF-8 is treated differently by different code. Luckily, checking for valid UTF-8 is a fast operation, so there is no reason not to check every string that comes from the user before doing anything with it - even if the processing you do is just treating it as a byte stream.
Locales and UTF-8
Locales and UTF-8
Locales and UTF-8
Locales and UTF-8
Locales and UTF-8
Locales and UTF-8
streams of UTF-8 chars. It's trivial to decode, and it's just as trivial
to interpose a wrapper so that your strings *appear* to contain single
bytes with arbitrarily large values :) but it does require a bit of extra
work. (I'm just thinking here of how long it took to get zsh's
Unicode-awareness right. Its ZLE wheel-reimplementation of readline was
the trickiest part, which is not surprising.)
Debian switching to EGLIBC
moving the C locale to using UTF-8 rather than US-ASCII as its
locale codeset. This won't be done immediately; we will create
a C.UTF-8 for testing before considering the full switch to default it.
compiled binary to program output and subsequent terminal display.
Roger
Debian switching to EGLIBC
character with the high bit set :) stuff that relies upon the C locale
rarely makes a distinction between bytes and characters, even where it
should... of course, one would hope that not much such software is left.
Debian switching to EGLIBC
Debian switching to EGLIBC
a byte stream' canard in my other response. It's trivially wrong.
Debian switching to EGLIBC
/usr/lib/locale/locale-archive is 79MB, and subsetting is not supported actively. Many developers would be overjoyed to have only LANG=C, or to select just the 5 locales that cover 99.99% of the users for their product.
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
every single locale in the world, only for those you use.
exactly this reason.)
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
> *glares at selinux infecting everything in fedora*
Debian switching to EGLIBC
b. we don't enable or compile selinux in kernel (but the selinux libs are still statically linked in to the other rpms pointlessly)
c. if starting project today, gentoo would get consideration. fedora only picked for a certain reason not used today.
d. we do rm -rf /usr/share/locale/* at postinstall to save space.
I wonder if the EGLIBC maintainer will be less obstructive about including the sorely needed strlcpy and strlcat functions.
Here's hoping strlcat gets in
Here's hoping strlcat gets in
Here's hoping strlcat gets in
Here's hoping strlcat gets in
Here's hoping strlcat gets in
Here's hoping strlcat gets in
can no longer reliably replace eglibc with glibc. (Whether this is
problematic is another matter.)
Here's hoping strlcat gets in
Here's hoping they don't. A more misbegotten function would be hard to imagine if strtok
didn't exist already.
Here's hoping strlcat gets in
Here's hoping strlcat gets in
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
Debian switching to EGLIBC
just a bunch of printf()s. That, at least, is not worth forking over.
Debian switching to EGLIBC
Debian switching to EGLIBC
tended to be BSD people.
allocation in general was unacceptably inefficient. They preferred to
statically size *everything* and recompile when needed. This was
apparently more efficient, in some world...
Debian switching to EGLIBC
Debian switching to EGLIBC
Ulrich Drepper's personality
Ulrich Drepper's personality
Ulrich Drepper's personality
about creating a fork of Glibc.. it's just that it is easier to share the burden of managing patch
sets with other people who are experiencing the same problem.
few different peices of software when people are trying to figure out how to deal with difficult
situations created by upstream developers who otherwise are valuable.
http://ourdelta.org/
MySQL releases.
http://ourdelta.org/
desires that are simply not being addressed by the upstream folks in a timely manner. So in
both cases they try to shove code back upstream, but is a community way of maintaining
their own patches.
Ulrich Drepper's personality
Ulrich Drepper's personality
Ulrich Drepper's personality
As goes Debian...
Fedora switching seems unlikely
And something about the concept of switching to another libc just seems to have Fedora's name written all over it, too.
Fedora switching seems unlikely
Fedora switching seems unlikely
Fedora switching seems unlikely
As goes Debian...
As goes Debian...
Architecturally glibc is very nice indeed (downside: the makefiles are
astonishingly powerful... and astonishingly complex, and utterly
undocumented: I suspect that no other makefile on the face of the Earth
uses as many GNU Make features as glibc's).
habit of writing amazing stuff and never documenting any of it in any way
at all except sometimes in PDFs on his homepage. Manuals? Why should they
be updated? (Updates from other people are summarily ignored, too.)
Debian switching to EGLIBC
Debian switching to EGLIBC
*because* he drives everyone else away so almost nobody helps other than
Roland (who is helpful to a fault: if software developers had saints he'd
be one). If you actively drive people off, you can't really decide not to
do the parts of your maintenance position that *everyone* expects you to
do because you're so busy: that you're busy is entirely your doing, and so
should be its consequences.
Debian switching to EGLIBC
Debian switching to EGLIBC
Ya.. People have their own plusses and minuses and should be put into positions were they
are going to be the most benefit. And get their egos out of the way... not everybody should be
the front-man for their projects. I doubt there are many people that love hacking and getting
into code and details and all that that really want to double as a public relations person. And I
doubt that there are many people good at mediating disputes and doing social networking
and whatnot that really want to spend all their time coding some low-level C libraries.
Debian switching to EGLIBC
-------------------------
BTW...
If your acting like a asshole and your completely right about something.. your still a asshole.
A lot of people seem to think that if a they are right about a subject, argument, talking point,
or whatever then that gives them allowances to be jerks. Like winning a argument is a victory
and the reward is a license to be a asshat. Which it doesn't... they are perfectly within their
rights to act like a jerk when they are wrong or when they are right. Being wrong or right
doesn't really enter into it and nobody should be surpised when people react negatively to
their negative behavior.
I don't think that this fork has to be seen as a drama. NAR suggested above that hiring two or three people to act as buffers or as interface to other people is a good way to deal with ill-mannered engineers. Well, think of this eglibc project as such a buffer. They sync from the genius but interface with the world. It could also be a good way of isolating Ulrich from user requests, and git enables such a workflow beautifully.
Debian switching to EGLIBC
Debian marrying EGLIBC