|
|
Subscribe / Log in / New account

Debian switching to EGLIBC

Here's a weblog posting with an interesting statement: "I have just uploaded Embedded GLIBC (EGLIBC) into the archive (it is currently waiting in the NEW queue), which will soon replace the GNU C Library (GLIBC)." The EGLIBC project has produced a version of the C library aimed primarily at embedded situations. Evidently the Debian developers feel that it is good enough for wider use, though, and they seem to strongly prefer the way that project is run upstream. (Thanks to Paul Wise).

to post comments

Debian switching to EGLIBC

Posted May 6, 2009 14:21 UTC (Wed) by kragil (guest, #34373) [Link] (24 responses)

Wow, reading those sourceware bugs it seems Ulrich has the last name DEPPer for a reason.(german meaning of Depp is not very nice)

Debian switching to EGLIBC

Posted May 6, 2009 14:34 UTC (Wed) by kragil (guest, #34373) [Link]

Ooops, I overlooked a "r" there. It would have been too fitting anyways :)

Debian switching to EGLIBC

Posted May 6, 2009 18:06 UTC (Wed) by ejr (subscriber, #51652) [Link] (22 responses)

Bear in mind that Mr. Drepper not only maintains glibc, he also does a good deal of work on the single-unix specs and on the libc/kernel boundary. When he tells people to pay, that's likely a very poorly worded way of saying: Hire people, 'cause I'm beyond tapped. And if you look into the amount of explanation he gives many items, well, there are only so many hours in the day... Read the papers on his site. They're very, very details and quite lucid.

I'm not saying he couldn't communicate his lack of time a tad more tactfully, but he still deserves respect. Also, I suspect the move to git is meant to make project-local forks easier.

Debian switching to EGLIBC

Posted May 6, 2009 21:14 UTC (Wed) by stevenb (guest, #11536) [Link] (21 responses)

I suppose mr. Drepper's technical abilities are not in question. But the choice of words he makes to express his disagreement with others is outright rude. If he were on my pay-roll, I'd fire him for that, regardless of his technical abilities.

Debian switching to EGLIBC

Posted May 6, 2009 22:30 UTC (Wed) by alankila (guest, #47141) [Link] (20 responses)

I doubt you should. Any sane cost function weighs between costs and benefits. If this guy was worth, say, 100 of your other programmers, perhaps you should tolerate up to 100 times more rudeness in compensation and still count yourself coming ahead.

Not that I know anything of mr. Drepper. This has nothing to do with him.

Debian switching to EGLIBC

Posted May 7, 2009 0:56 UTC (Thu) by ofeeley (guest, #36105) [Link] (10 responses)

I'm not so sure that he's as rude as made out either. Choosing just two of the links from the list given which are supposed to show an attitude problem I'm not in agreement. For the first one[1] I initially read it as a bit terse: "this function is a joke [...] don't you have better things to do" until I took a look at the function in question: strfry. It does seem to be possibly a joke, or at least mildly entertaining non-useful function. If there are other higher priority things to patch then it does seem odd to waste developer time on it.

The second instance[2] seems to show people misusing the comment facilities on bugzilla in order to harass the developer.

1. http://sourceware.org/bugzilla/show_bug.cgi?id=4403

2. http://sourceware.org/bugzilla/show_bug.cgi?id=4980

Anyway, that's just an outsiders perspective taking a random, lazy look at some of the evidence presented by the prosecution.

Debian switching to EGLIBC

Posted May 7, 2009 1:07 UTC (Thu) by jordanb (guest, #45668) [Link] (4 responses)

The guy fixed a bug in it though. Clearly it was important enough to him to do that analysis, and then fix what was wrong. All Drepper had to do is get over himself long enough to apply it.

A friend of mine sent in a patch to Emacs Tetris, to fix a bug that allowed you to cheat. RMS replied with "when you cheat at solitaire, who are you cheating?" but applied the patch anyway. Something like that would have been the proper response.

Anyway, it wasn't the strfry incident that lead Debian to this point, clearly, but more his refusal to accept patches that fixed bugs on ARM on account of it being "crap."

Debian switching to EGLIBC

Posted May 7, 2009 1:23 UTC (Thu) by ofeeley (guest, #36105) [Link]

Sounds like fair comment.

Debian switching to EGLIBC

Posted May 7, 2009 14:25 UTC (Thu) by Felix.Braun (guest, #3032) [Link] (2 responses)

The guy fixed a bug in [strfry()] though. Clearly it was important enough to him to do that analysis, and then fix what was wrong. All Drepper had to do is get over himself long enough to apply it.

To be fair, if you read the relevant bug report, you'll see that Mr. Drepper fixed the bug in a different way. He even re-fixed his first implementation after that was discovered to be sub-optimal. So, there should be no complaints here.

Debian switching to EGLIBC

Posted May 11, 2009 4:31 UTC (Mon) by dirtyepic (guest, #30178) [Link] (1 responses)

the complaint is that no one deserves to be severely ridiculed for simply trying to fix a bug. maybe it's the wrong fix, or is truthfully not a bug, but if the only thing that you can hope to accomplish by filing a bug report is public humiliation then why bother.

Debian switching to EGLIBC

Posted May 11, 2009 6:04 UTC (Mon) by nix (subscriber, #2304) [Link]

Don't forget rescinding a significant contributor's commit access
(forever) and refusing to explain why. That's happened at least once.

Debian switching to EGLIBC

Posted May 7, 2009 6:52 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

One random personal example here:
<http://sourceware.org/bugzilla/show_bug.cgi?id=7065>. A security
improvement, marked as 'never going to happen' without reason. (Repeated
requests by several people to provide *any* kind of rationale for
rejecting a zero-effect-when-compiled-out security improvement went
unanswered. Ulrich would be much more tolerable if he said *why* he did
what he did occasionally, but he seems to assume everyone else is
telepathic.)

Proof that it's a security improvement:
<http://sourceware.org/bugzilla/show_bug.cgi?id=7066>. A buffer overrun,
analysis ignored. Who knows why, at least he didn't say 'never going to
happen' about this one.

Debian switching to EGLIBC

Posted May 7, 2009 10:02 UTC (Thu) by viro (subscriber, #7872) [Link] (3 responses)

FWIW, I'm not fond of Ulrich's style for a lot of reasons, and this case demonstrates one of those nicely. That is to say, "sod off" is not a substitute for a proper LART. And you've earned one in that bug report, by dismissing Petr's reference to -std=c99 with "it can't affect anything".

Oddly enough, it *does* affect something. Such as the output of compiled binary (with stock glibc). Without -std=c99: out: 0x0p-15234. With it:
out: 0xa.8p+2.

And a look at strtold(3) shows where the hell had that suggestion come from:

Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

strtof(), strtold(): _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE; or cc -std=c99

Experiment with -D_ISOC99_SOURCE or -D_XOPEN_SOURCE=600 shows the same output as -std=c99. And that output looks a lot saner (10.5 * 2 * 2) than the crap produced without any of those.

I can't be arsed to dig through the macro hell in glibc headers, but that looks suspiciously like compat with some pre-c99 GNUism. Don't know, don't care and _really_ don't want to debug that shite. However, I'm quite certain that by this point you see the reasons for LART as well as I do. For the benefit of the other folks:

1) failure to RTFM
2) dismissing relevant "did you have $FOO in arguments?" with "oh, it can't matter at all"
3) failure to RTFM even after that (if nothing else, to see WTF had that question been about)
4) reporting nasal daemons as security hole, instead of (if I've parsed that bug report correctly) some crap somewhere in the clusterfsck of makefiles around glibc testsuite, either present in the original or introduced by yourself.
5) refering to the entire sad story as to evidence of security hole found by proposed patch.

Nix, you *do* know better. The above is more than enough for you to construct quite a lovely verbal clue-by-four of your own, so I'll happily leave that as an exercise for reader.

Debian switching to EGLIBC

Posted May 7, 2009 13:50 UTC (Thu) by viro (subscriber, #7872) [Link] (1 responses)

Gyah... And I should've known better than replying while low on caffeine.
My apologies - I've misparsed your reply to Petr, so the idiocy in this case is sure as hell mine.

Debian switching to EGLIBC

Posted May 7, 2009 14:38 UTC (Thu) by nix (subscriber, #2304) [Link]

The problem in this case is really that my sentence construction is so baroque that nobody else can understand it, sort of like the glibc makefiles. :)

(And absence of caffeine excuses all!)

Debian switching to EGLIBC

Posted May 7, 2009 13:58 UTC (Thu) by k8to (guest, #15413) [Link]

Uhm, what? none of that is in the bug entry.

Debian switching to EGLIBC

Posted May 7, 2009 5:16 UTC (Thu) by PaulWay (subscriber, #45600) [Link] (8 responses)

> Any sane cost function weighs between costs and benefits. If this guy
> was worth, say, 100 of your other programmers, perhaps you should
> tolerate up to 100 times more rudeness in compensation and still count
> yourself coming ahead.

There are three fallacies with this approach that make it wrong. The first one is the idea that there's some magic number that you can derive that gives you the worth of a person and eir contribution to a specific project or goal. There's a whole sociology course of reasons there, but a simpler one might be that it gets down to the 'mythical man-month' problem - you can't just hire another 100 programmers and make the project go 100 times faster, or even as fast.

Secondly, you're biasing our reactions by picking a large number. If Ulrich is worth 1.1 other programmers, what do you do?

However, what it really comes down to for me is that there are a bunch of groups - programmers, Red Hat, open source enthusiasts - who should just say "no" to antisocial behaviour. Calling people idiots or defining their work as a waste of time should never be excusable. If Ulrich was being sexist, defamatory or racist, would that still be OK (even if he was worth 100 other programmers)? I personally say no.

When we implicitly or explicitly excuse this kind of obnoxious behaviour, we make it acceptable. That to me is just not on. We can hardly say "show me the code" if we then scare everyone off who tries to contribute code. There are plenty of ways to deal with the bug reports that the EGLIBC people link to that would have been polite and helpful but still firm in his opinion and would have taken even less time to do than Ulrich expended on his inflammatory outbursts. We should learn better ways to interact, rather than excuse bad behaviour.

Have fun,

Paul

Debian switching to EGLIBC

Posted May 7, 2009 10:06 UTC (Thu) by NAR (subscriber, #1313) [Link] (5 responses)

I think software development is unique in the sense that there are actually people who are 10 times as productive as some of their peers, even in the same workgroup. So probably it's worth to employ a couple of other people who hide the "guru" from the users. Don't forget that people actually have the right to be sexist, defamatory or racist.

Character flaws

Posted May 7, 2009 21:55 UTC (Thu) by man_ls (guest, #15091) [Link] (1 responses)

I still think that PaulWay is right about the first fallacy, but maybe not the way he intended. Sometimes an individual's worth cannot be measured as 100 normal fellows, because sometimes he or she can enable things which were not even feasible before. There is no replacement even with hordes of developers. Think about Science and the contributions of great individuals -- the kind of leap of a Pasteur or a Newton cannot be done by a thousand clever fellows. Engineering is sometimes the same, as is software engineering.

About character flaws I think you are quite right. In some contexts buffering the guru is an effective way to dealing with an ill-mannered engineer.

Character flaws

Posted May 8, 2009 1:55 UTC (Fri) by PaulWay (subscriber, #45600) [Link]

You raise a great point! I think that's another good reason why saying "well, he's a good coder, so he's allowed to annoy other people" is just wrong.

Have fun,

Paul

Debian switching to EGLIBC

Posted May 8, 2009 1:54 UTC (Fri) by PaulWay (subscriber, #45600) [Link] (1 responses)

> Don't forget that people actually have the right to be sexist, defamatory or racist.

Yeah, right. Try being sexist, defamatory or racist in the workplace and see how your contract covers you. Try being sexist, defamatory or racist on a public internet forum and see how the forum's or your ISP's conditions of use work. Try getting up in a public place and being sexist, defamatory or racist and see how you go with the police.

People have the right to their own thoughts, which I think is where you're coming from. But public behaviour is what's in question here. And I'll bet you a beer that Ulrich's contract with Red Hat allows them to fire him for being sexist, defamatory or racist in public.

And even if there is some specific condition where being sexist, defamatory or racist is allowed, I still reckon it's bad behaviour. I don't care if someone is a thousand times better at coding than anyone else, being rude and obnoxious to other people is bad for that person,for the projects they work for and for the community they're involved in.

So to me it seems pretty counterproductive to hide behind some technical legal argument to excuse behaviour which, as I said originally, is completely unnecessary and caused Ulrich more hassle rather than less.

Have fun,

Paul

Don't be so sure

Posted May 8, 2009 18:08 UTC (Fri) by khim (subscriber, #9252) [Link]

Yeah, right. Try being sexist, defamatory or racist in the workplace and see how your contract covers you.

If you are working for US Company - it'll be written in contract. In other countries it's free right (some European countries are emulation US but to smaller degree). You can be fired if you disrupt the team, but just for being sexist or racist? Puhlease. When our company was bought by US one we had special sessions to explain exactly what natural things (like sexist jokes or jokes about other minorities) we shouldn't do from now on. Everyone agreed that it's idiotic, but hey - these guys are paying me, they establish rules.

Try being sexist, defamatory or racist on a public internet forum and see how the forum's or your ISP's conditions of use work.

Probably you are visiting totally different forms from me - because I'm seeing sexist, defamatory and racist remarks quite often. And not just from trolls. Again: if we are not talking about US where (as Heinlein noted half-sentury ago) everyone is so proud to point out that they have absolutely nothing against your skin color, face or sexual orientation...

Try getting up in a public place and being sexist, defamatory or racist and see how you go with the police.

For being sexist? This is a joke. For being defamatory or racist... possibility is there but you need to spend A LOT OF effort to reach this point.

Right of being defamatory?

Posted May 11, 2009 11:01 UTC (Mon) by incase (guest, #37115) [Link]

Honestly, your freedom of speech ends where you are hurting the dignity of others. You can be an asshole if you want to, but you should not insult others (actually, insulting rather than critizising is a legal offense here)

In Germany, the topmost grant our constitution grants it dignity. Freedom of thought, religion and speech is also in their of course, but it is below the grants for dignity, and some other things like prohibition of discrimination because of race, sex/gender, religion and other things, your right not to be injured in physical or mental well-being etc.
And this is how it ought to be in my opinion.

regards,
Sven

Debian switching to EGLIBC

Posted May 8, 2009 10:46 UTC (Fri) by alankila (guest, #47141) [Link] (1 responses)

"The first one is the idea that there's some magic number that you can derive that gives you the worth of a person"

Fair enough. It was a sketch of an argument. I picked a large, round number to make it obvious that I was not serious. What do you think 100 times more rude even means? It doesn't really mean anything.

"Secondly, you're biasing our reactions by picking a large number. If Ulrich is worth 1.1 other programmers, what do you do?"

In the argument, he is allowed to be up to 1.1 times more rude to compensate for technical prowess.

"Calling people idiots or defining their work as a waste of time should never be excusable."

I'll pick on the word "never". Let's take it literally. If a person is an idiot, and the work is waste of time, perhaps it would be good idea to call it such? I don't see what denying reality gets you, other than blinders that cause you to make mistakes.

Debian switching to EGLIBC

Posted May 8, 2009 10:53 UTC (Fri) by alankila (guest, #47141) [Link]

I already regret the last paragraph. Let's just say that I agree that it's good to be a polite, reasonable person. I just am not so sure that hard-fast rules such as "it's never ok to be rude" are reasonable. I see the matter such: there is a palette of expressions, and every one of these may have its uses.

To skillfully use this palette is completely another matter: it's horribly crude to use wrong kind of expression, and doesn't reflect well on the speaker. Communication is, after all, a process where you need to make yourself understood to another. Whether I just transgressed remains to be seen...

Debian switching to EGLIBC

Posted May 6, 2009 14:28 UTC (Wed) by BrucePerens (guest, #2510) [Link] (5 responses)

This sounds like the X fork away from David Dawes. I can't believe that it's the embedded nature of the project that is attractive.

Debian switching to EGLIBC

Posted May 6, 2009 16:19 UTC (Wed) by nix (subscriber, #2304) [Link]

False comparison. David Dawes was a thousand times friendlier to random
third-party contributors. :/

Debian switching to EGLIBC

Posted May 6, 2009 17:06 UTC (Wed) by ajross (guest, #4563) [Link] (2 responses)

It seems like a better analogy might be the egcs project. The X.org fork was a revolt against the organization that owned the copyrights, and involved a ton of bureaucratic mess in addition to the code changes. This, like egcs, seems more like an attempt to fork away from the existing maintenance structure while still working underneath the FSF umbrella. I'm sure their hope is that, after a few more releases and distro conversions, eventually EGLIBC just gets blessed as the "official" branch and we all forget about the old one.

Debian switching to EGLIBC

Posted May 6, 2009 19:57 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

eglibc even starts with the same letters as egcs, and includes *three* of
the same letters.

It is fated.

:)

Debian switching to EGLIBC

Posted May 6, 2009 20:57 UTC (Wed) by njs (subscriber, #40338) [Link]

Perhaps more to the point, it appears to involve some of the same people (eglibc development is organized by CodeSourcery; I suspect Mark Mitchell is familiar with gcc's history), and has the same copyright assignment rules.

Debian switching to EGLIBC

Posted May 6, 2009 17:07 UTC (Wed) by herodiade (guest, #52755) [Link]

> I can't believe that it's the embedded nature of the project that is attractive.

For other users, maybe not (there may be some attractiveness in avoiding to deal witch such an upstream, or just to shake things up). But for the Debian project, the Ulrich Drepper's un-willingness to cooperate on basics embedded needs is a real problem.

So much that the DPL (Debian Project Leader) had to (unsuccessfully) ask the "GNU C Library Steering Committee" for help/mediation on this topic.

http://lists.debian.org/debian-glibc/2007/10/msg00038.html

Had to happen

Posted May 6, 2009 14:34 UTC (Wed) by dcg (subscriber, #9198) [Link]

I think a lot of people suspected that at some point something like this had to happen. I mean, it's not that those bugzilla links are an exception, and everybody knew it. It's like what happened with Xfree86 or with some BSDs - bad maintainers make impossible to contribute useful code and slow down the progress of the project (ask to the people that needed to use Google's tcmalloc for years to get decent performance in many scenarios), so at some point either a fork or something like this needed to happen.

Debian switching to EGLIBC

Posted May 6, 2009 14:35 UTC (Wed) by alex (subscriber, #1355) [Link] (8 responses)

I had no idea things where so bad in mainline for Debian to consider switching libc implementations. Some of the bugs linked in the article do seem to have very curt replies (although this could be a biased selection).

Still fork's live or die on their user base and people using it. I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!

Debian switching to EGLIBC

Posted May 6, 2009 14:38 UTC (Wed) by mbanck (subscriber, #9035) [Link] (5 responses)

I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!

No idea where you got that idea from but this page says it is a subversion repository.

Likely they will follow suit and switch to git as well after the glibc git switch has been finished.

Michael

SVN/CVS

Posted May 6, 2009 15:31 UTC (Wed) by alex (subscriber, #1355) [Link]

Ahh, I was thrown by the viewcvs.cgi in the repository browser URL.

Hopefully when glibc goes git it will make maintenance a lot easier. Decent DVCS semantics make maintaining patch sets against a baseline so much easier.

Debian switching to EGLIBC

Posted May 6, 2009 16:02 UTC (Wed) by dwmw2 (subscriber, #2063) [Link] (3 responses)

I do have to wonder why for the love of ${DEITY} the new fork is hosted in CVS. It is the 21st century you know!
No idea where you got that idea from but this page says it is a subversion repository.
Ew, that's even worse. CVS may be a throwback to the previous century, but SVN never had any justification for existing at all. It was never enough of an improvement over CVS to be worth the difference, and it's verging on insane to use it for new projects these days instead of git.

Debian switching to EGLIBC

Posted May 6, 2009 16:31 UTC (Wed) by mbanck (subscriber, #9035) [Link]

EGLIBC is two years old. It is just Debian which recently decided to switch. Again, glibc is currently (as in, yesterday, today) switching to git and it doesn't make sense to discuss EGLIBC's SCM right now before they have or have not decided to follow the switch (I guess they will follow sooner or laster, but who knows).

Michael

Debian switching to EGLIBC

Posted May 6, 2009 16:45 UTC (Wed) by epa (subscriber, #39769) [Link]

One argument for svn over cvs is that git-svn exists, but there is no equally straightforward way to do git-cvs...

Debian switching to EGLIBC

Posted May 6, 2009 19:50 UTC (Wed) by brouhaha (subscriber, #1698) [Link]

[SVN] was never enough of an improvement over CVS to be worth the difference,

Having used both extensively, I must strongly disagree. SVN is a huge improvement over CVS. The main potential issue with SVN is that it uses the central repository model. For many projects there's no problem with that; it really depends on how the project is organized. I have no idea whether it makes sense for eglibc.

Debian switching to EGLIBC

Posted May 6, 2009 14:41 UTC (Wed) by mbanck (subscriber, #9035) [Link]

I had no idea things where so bad in mainline for Debian to consider switching libc implementations.

Eglibc is not a different implementation - it is basically a branch of glibc which is regularly synced and which you can fine-configure how you like. Plus its maintainers say they will be more friendly towards users etc.

Michael

Debian switching to EGLIBC

Posted May 6, 2009 14:47 UTC (Wed) by maks (guest, #32426) [Link]

glibc is still hosted in cvs. 2.10 will switch to git:
http://sourceware.org/ml/libc-alpha/2009-04/msg00034.html

guess eglibc will follow asap.

Debian switching to EGLIBC

Posted May 6, 2009 16:06 UTC (Wed) by kjp (guest, #39639) [Link] (42 responses)

I wish the site had some numbers showing disk space savings when various features are turned off. I'd love to kill off locales and encoding. My hat's always off to people fighting software dependency bloat hell.

*glares at selinux infecting everything in fedora*

Debian switching to EGLIBC

Posted May 6, 2009 16:23 UTC (Wed) by nix (subscriber, #2304) [Link] (36 responses)

Kill off locales? How can you do anything useful in this day and age
without UTF-8?

(I resisted it for years, but it makes *so* many things so much easier.)

Debian switching to EGLIBC

Posted May 6, 2009 16:46 UTC (Wed) by ajross (guest, #4563) [Link] (25 responses)

And what exactly do you need to do with UTF-8 for which you need locale support? I mean, the whole point of UTF-8 is that it works great, unchanged, with code that assumes LANG=C. If we'd picked it from the start, we'd never have had "locale" (well, the wide/multibyte character APIs, which I assume is what you mean) support in the C library to begin with.

Debian switching to EGLIBC

Posted May 6, 2009 19:50 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

There's a lot more to locale support than the wchar stuff. In fact the
wchar stuff is decidedly minority and doesn't use more than a few tens of
Kb.

Most of the space consumption is charmaps (which we can't do without, and
the largest ones are Far Eastern and Unicode), converters to/from UTF-8
(obviously necessary if we want to handle other encodings at *all*) and
timezones (which we can't do without, although it would be nice if glibc
contained an interface to let us use the historical data in them properly:
Ulrich has explicitly (and bluntly) ruled this out without giving a
rationale. As usual. Maybe eglibc can add an appropriate interface.)

Debian switching to EGLIBC

Posted May 8, 2009 1:56 UTC (Fri) by spitzak (guest, #4593) [Link]

I would be extremely happy if I could rely on printf not changing the periods to commas when I don't expect it, and strcmp not changing. I have to force the C locale at startup just so our software can write files that I can be sure can be read.

Any intelligent person would have just made a different %-command. It really is not very hard for a programmer to choose what to do based on the locale. The C library, if it provides anything at all, should only provide a "this is the locale" call. It should have ZERO effect on the behavior of any functions that do not actually take a locale as an argument.

Locales and UTF-8

Posted May 6, 2009 20:47 UTC (Wed) by rfunk (subscriber, #4054) [Link] (18 responses)

You still need wide/multibyte character APIs with UTF-8. A UTF-8 character
can be up to four bytes. It's only the old ASCII characters that are still
only one byte in UTF-8; the Latin-1 extensions, for example, are two
bytes. Therefore the old APIs that assume 1 byte = 1 character are not so
useful with UTF-8.

Locales and UTF-8

Posted May 6, 2009 21:35 UTC (Wed) by nix (subscriber, #2304) [Link]

What you *don't* need with UTF-8 is the misbegotten horror that is wchar_t
and all the interfaces to deal with that.

Locales and UTF-8

Posted May 8, 2009 1:54 UTC (Fri) by spitzak (guest, #4593) [Link] (16 responses)

Anybody who thinks strlen(utf8) should return anything other than the number of bytes in the string does not know what they are talking about. Sorry.

UTF-8 is TRIVIAL if people would just WAKE UP and realize that it *is* trivial. The ONLY people who care where character boundaries are is people writing low-level rendering routines that have to look up font glyphs.

But for some reason the fact that a byte array represents a series of characters causes otherwise intelligent programmers to turn into complete morons. It suddenly becomes IMPOSSIBLE to work with the bytes, just because of the type of data in the string!

Here is a thought experiment: why in the world are we capable of making files containing English text when all the *words* are different sizes! Why it must be impossible! Counting words will be so slow and inefficient! How could the programs ever work?

Locales and UTF-8

Posted May 8, 2009 13:55 UTC (Fri) by nix (subscriber, #2304) [Link] (15 responses)

The only people who care where character boundaries are in a UTF-8
stringare people writing routines taking textual input, routines producing
textual output, routines modifying text strings, routines manipulating
text strings in *any* way that depends on anything a human would care
about. I can see how this could be considered rare.

Touching individual bytes in a unicode string outside of something like
serialization makes as much sense as touching individual bits in it does
(except of course that you have to touch both in order to convert the
UTF-8 into actual Unicode code points and back).

This is all library stuff, yes, sure... except when it isn't.

Locales and UTF-8

Posted May 8, 2009 15:18 UTC (Fri) by endecotp (guest, #36428) [Link] (11 responses)

nix, maybe you're overlooking some of the helpful properties of UTF-8 that make many, though of course not all, of those textual operations work if you treat it as a byte stream? In my experience, programmers often go to lengths to process a UTF-8 string character-by-character when doing so byte-by-byte would be just as correct, simpler to code and faster.

For example, if I'm parsing a UTF-8 CSV file into rows and columns then I can treat it as a byte stream, since the punctuation characters (eg ,"\NL) are all single bytes and those bytes are guaranteed not to occur in multi-byte characters.

As another example, I can search-and-replace one character sequence with another character sequence by treating the text, pattern and replacement as byte sequences - even if there are multibyte characters in the text, pattern, or replacement.

My experience is that the only places where UTF-8 cannot be treated as byte streams are: GUI and similar I/O, sorting and case conversion when the result needs to look right for a human, and interfaces that specify an encoding other than UTF-8.

Locales and UTF-8

Posted May 8, 2009 16:56 UTC (Fri) by spitzak (guest, #4593) [Link]

Thank you for a breath of fresh air! Somebody who gets it!

I do not understand why so many otherwise intelligent and experienced software engineers turn into such complete morons when they think about UTF-8.

Even more annoying: programmers do not seem to have this mental block when presented with the older multibyte Asian encodings, or with UTF-16 which is variable length as well. For some reason people only assign these made-up problems to UTF-8.

Locales and UTF-8

Posted May 8, 2009 17:28 UTC (Fri) by nix (subscriber, #2304) [Link] (5 responses)

Um, not all the punctuation characters are single bytes. A huge variety of
punctuation is up above U+2000, for instance, including U+2010 (the
hyphen) and U+2003 (the em space). Helpfully this is somewhat jumbled up
with nonpunctuation stuff like numeric superscripts.

The rest of your points stand: if all you want to do is manipulate ASCII
characters in a UTF-8 stream, you can do that without being Unicode-aware
at all. But this will tend to annoy your users when they type in a &#8364; and
find that your program can't manipulate it because it's U+20AC. It'll
annoy your users even more to find that they can remove some characters,
but that others take several keystrokes to remove and miraculously
transmogrify into other characters as they do so. (More mess: the Euro
cent sign is U+00A2!)

I suppose misbehaviour from this change is unlikely *if* you're in the US.
Anywhere else? Bite your knuckles.

Locales and UTF-8

Posted May 8, 2009 17:35 UTC (Fri) by ajross (guest, #4563) [Link] (2 responses)

You're simultaneously overstating the complexity of this problem and the ability of the ANSI C locale facility to solve it.

The product I work on for my day job does natural language processing of internet content in arbitrary languages and encodings. I did the encoding transformation and "word breaker" lexical analyzer for it. The whole system works by transforming the data into UTF-8 and operating on it at the byte level. So sorry to pull the "domain expert" card here, but you're basically just wrong. This stuff has its subtleties, but it's absolutely not something that requires special API support. And if we *had* to pick an API, I can guarantee you it wouldn't be ANSI C's locale stuff, which is a complete non-starter for many of the reasons already detailed.

Locales and UTF-8

Posted May 8, 2009 18:47 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

I certainly don't think the ANSI C locale facility solves everything (or
even *much*, it's pretty nasty). And, as I said, it'll be interesting to
see what breaks. (I suspect not much will: most things that need to be
*are* Unicode-aware, on Debian at least. But it might get hair-raising.)

-- N., just wasted three months auditing and fixing countless places in a
horrible financial application to allow for UTF-8 awareness (the simplest
example: lots of places in that software cared if something
was 'alphanumeric', for instance, and isalpha() really doesn't work). It
could have been worse: before I came along they were planning to move to
UCS-2, hark at the forward planning and lovely C-compatibility...

Locales and UTF-8

Posted May 8, 2009 21:55 UTC (Fri) by spitzak (guest, #4593) [Link]

Yes, isalpha() and ctype is one thing that should be fixed. There are only 3 types of byte with the high bit set:

1. bytes that are not allowed in UTF-8.
2. "second" bytes
3. "first" bytes

I think first & second bytes should pass the isalpha() test. This will allow UTF-8 letters to be put into identifiers and keywords (of course it also allows UTF-8 punctuation and lots of other stuff but that is about the best that can be done). I also think ctype should not vary depending on locale, this is another thing that causes me nothing but trouble, most programmers revert to doing ">='a' && <='z'" and thus make their software even less portable.

Probably the ctype tables should add some bits to identify these byte types.

Locales and UTF-8

Posted May 8, 2009 21:48 UTC (Fri) by spitzak (guest, #4593) [Link]

Actually the Euro is U+20AC. It is 0xA2 in the CP1252 encoding used by Microsoft but not in official Unicode. However I do thing the Unicode standard should just realize that CP1252 is really common and change the characters 0x80-0xAF to what it defines.

I do hope a program trying to parse for a period only looks for the ASCII period. As soon as you start saying other Unicode characters are "equivalent" then you get a huge mess because different programs may disagree on what is in the equivalent set, and Unicode could add a new character at any time. We already have quite a mess with newlines, lets not make it worse! The only software that should be looking for Unicode punctuation is actual glyph layout and rendering.

Locales and UTF-8

Posted May 11, 2009 16:01 UTC (Mon) by endecotp (guest, #36428) [Link]

> not all the punctuation characters are single bytes

I was referring to the punctuation characters used to delimit CSV, which are all ASCII characters (as are those used in XML).

> The rest of your points stand: if all you want to do is manipulate
> ASCII characters in a UTF-8 stream, you can do that without being
> Unicode-aware

My points were that you can do all of those things (e.g. search and replace) EVEN IF the input is non-ASCII.

Your example of delete key behaviour is an interesting one that comes under my category of "GUI and similar I/O". It is clearly necessary to delete back as far as the last character-starting byte. Doing so is not very hard.

> I suppose misbehaviour from this change is unlikely *if* you're in
> the US. Anywhere else? Bite your knuckles.

I am not in the U.S., and my code works with UTF-8 without the sort of major headaches that you allude to.

Locales and UTF-8

Posted May 10, 2009 14:46 UTC (Sun) by epa (subscriber, #39769) [Link] (3 responses)

For example, if I'm parsing a UTF-8 CSV file into rows and columns then I can treat it as a byte stream, since the punctuation characters (eg ,"\NL) are all single bytes and those bytes are guaranteed not to occur in multi-byte characters.
This is true if you know that your input is valid UTF-8. However if it might be malformed, then your program could end up splitting a row in the middle of an (invalid) character sequence and producing different invalid sequences as output. This is often fine: garbage in, garbage out. But there can be interesting security holes where malformed UTF-8 is treated differently by different code. Luckily, checking for valid UTF-8 is a fast operation, so there is no reason not to check every string that comes from the user before doing anything with it - even if the processing you do is just treating it as a byte stream.

Locales and UTF-8

Posted May 11, 2009 16:06 UTC (Mon) by ajross (guest, #4563) [Link] (1 responses)

Everything you say is true of ASCII too. You have to validate untrusted input, regardless of what it is. ASCII doesn't have the high bit set, but any ASCII format is by necessity going to have escaping mechanisms that need equivalent validation. For this specific example, you *are* counting your matching quote characters, right? Everything is an "encoding" at some level.

Avoiding UTF-8 in the blind expectation that it somehow makes your code more "secure" is just plain wrong. This kind of mistake is exactly what I'm talking about. People attribute to encoding transformation and I18N all sorts of complexities that aren't actually there in practice.

Locales and UTF-8

Posted May 19, 2009 9:18 UTC (Tue) by epa (subscriber, #39769) [Link]

I agree that using some hacky alternative instead of UTF-8 will not improve security. Nothing I wrote should be taken as a reason to avoid UTF-8. (Though it's not true that you *always* have to include escaping mechanisms for ASCII input - some file formats such as /etc/passwd can get away with being completely stupid and not supporting escaping or accented characters at all.)

Locales and UTF-8

Posted May 11, 2009 17:27 UTC (Mon) by spitzak (guest, #4593) [Link]

Invalid UTF-8 is not a problem. In fact one HUGE advantage of working with UTF-8 is that you can defer invalid UTF-8 until display, where it can safely be changed into the matching CP1252 glyph or whatever is needed to provide the user with a readable result so they can figure out what went wrong. Converting earlier can result in security and other errors.

Errors in UTF-8 should be treated as single byte entities. Four four-byte prefixes in a row are 4 errors, not a single 4-byte error. You can't split an error if it is only one byte long.

This also means that ASCII characters cannot be "inside an error" so that errors have zero effect on programs that are looking for ASCII only.

It also means it is impossible to make a pointer "inside" an error or to split one. It is also vital to treat errors this way (even if converting to other encodings) so that concatenation to a string ending in an error cannot convert a good character at the start of the next string into an error.

Locales and UTF-8

Posted May 8, 2009 16:49 UTC (Fri) by spitzak (guest, #4593) [Link] (2 responses)

Hmm. I seem to be able to cat a UTF-8 file to my UTF-8 terminal and it works perfectly. Yet cat has no concept whatsoever of UTF-8 and quite likely is splitting the text into blocks right in the middle of UTF-8 characters! How is this possible?

UTF-8 is in fact trivial. You are basically doing exactly what I am complaining about: panicking that there is some magical problem with not looking for the character boundaries. Try comparing it to words: how much of a word processor is able to ignore word boundaries? Almost all of it. But that does not somehow make it impossible for word wrap and word deletion to work.

It's not rocket science. The problem is people who are so convinced it is that they complicate things to no end and are hurting I18N and everybody.

Locales and UTF-8

Posted May 8, 2009 16:57 UTC (Fri) by ajross (guest, #4563) [Link] (1 responses)

Amen. I've found the same thing -- developers who have no trouble with complicated algorithms and who have exhaustive knowledge of their platforms at all levels still turn into quivering voodoo practitioners when it comes to I18N stuff. All I can think is that because the data involved contains indecipherable foreign text, they get fooled into thinking the code for handling it must be equally inscrutable.

Really, this stuff is easy once you get used to it.

Locales and UTF-8

Posted May 8, 2009 17:33 UTC (Fri) by nix (subscriber, #2304) [Link]

Note that at no point did I say that it was horribly hard to deal with
streams of UTF-8 chars. It's trivial to decode, and it's just as trivial
to interpose a wrapper so that your strings *appear* to contain single
bytes with arbitrarily large values :) but it does require a bit of extra
work. (I'm just thinking here of how long it took to get zsh's
Unicode-awareness right. Its ZLE wheel-reimplementation of readline was
the trickiest part, which is not surprising.)

Debian switching to EGLIBC

Posted May 6, 2009 23:37 UTC (Wed) by rleigh (guest, #14622) [Link] (3 responses)

You might find the following thread interesting.

http://lists.debian.org/debian-policy/2009/04/msg00018.html

For the various reasons outlined in the text, we are considering
moving the C locale to using UTF-8 rather than US-ASCII as its
locale codeset. This won't be done immediately; we will create
a C.UTF-8 for testing before considering the full switch to default it.

This will give us native UTF-8 end-to-end from source code to
compiled binary to program output and subsequent terminal display.

Regards,
Roger

Debian switching to EGLIBC

Posted May 7, 2009 6:47 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

It'll be fascinating to see what that breaks when someone throws in a
character with the high bit set :) stuff that relies upon the C locale
rarely makes a distinction between bytes and characters, even where it
should... of course, one would hope that not much such software is left.

Debian switching to EGLIBC

Posted May 8, 2009 2:02 UTC (Fri) by spitzak (guest, #4593) [Link] (1 responses)

Nothing will break when a byte has a high bit set, since it will just be copied to the output unchanged.

Don't panic about UTF-8. The biggest problem with it is people who do not understand it, some of them are good enough programmers that they might write some code that is very damaging, where they actually try to interpret the UTF-8 encoding.

The only real bug in Unix with UTF-8 is a whole lot of documentation that says "character" where it should say "byte". There is nothing wrong with the current implementations.

Debian switching to EGLIBC

Posted May 8, 2009 13:57 UTC (Fri) by nix (subscriber, #2304) [Link]

I covered this 'nothing will care if you feed UTF-8 to a program expecting
a byte stream' canard in my other response. It's trivially wrong.

Debian switching to EGLIBC

Posted May 6, 2009 18:43 UTC (Wed) by jordanb (guest, #45668) [Link]

Unicode doesn't magically make localization issues disappear.

Programs still need to know what language the user(s) of the system prefer to see responses in, if their radix mark is a ',' or a '.', if they count days using the Gregorian or Chinese calendar, etc.

I agree that the world would be a brighter place if it could be said "all text on disk or streamed on the network MUST be in Unicode's UTF-8 encoding" and then locales could just say "en_US" instead of "en_US-UTF-8" but issues of representation and encoding are only half of the localization problem.

Debian switching to EGLIBC

Posted May 6, 2009 19:18 UTC (Wed) by jreiser (subscriber, #11027) [Link] (5 responses)

/usr/lib/locale/locale-archive is 79MB, and subsetting is not supported actively. Many developers would be overjoyed to have only LANG=C, or to select just the 5 locales that cover 99.99% of the users for their product.

Debian switching to EGLIBC

Posted May 6, 2009 20:15 UTC (Wed) by kleptog (subscriber, #1183) [Link] (2 responses)

That weird. I've never done anything special with locales and my locale-archive is only 1.3MB. I just selected the locales I wanted during install, they're listed in /etc/locale.gen and they're the only locales in the archive. How did you get your archive to be so large? (Debian BTW)

Debian switching to EGLIBC

Posted May 6, 2009 22:13 UTC (Wed) by vmole (guest, #111) [Link] (1 responses)

Are you using localepurge, by chance? It removes undesired locales after each apt run.

Debian switching to EGLIBC

Posted May 7, 2009 0:48 UTC (Thu) by ABCD (subscriber, #53650) [Link]

localepurge only touches /usr/share/locale and /usr/share/man; /usr/lib/locale/locale-archive is not modified at all by localepurge, so I'm not sure what would cause it to be so large.

Debian switching to EGLIBC

Posted May 6, 2009 20:29 UTC (Wed) by nix (subscriber, #2304) [Link]

Um, subsetting is trivial and documented. Just don't run localedef for
every single locale in the world, only for those you use.

(Debian has a /usr/sbin/locale-gen script and /etc/locale.gen file for
exactly this reason.)

Debian switching to EGLIBC

Posted May 12, 2009 4:59 UTC (Tue) by dirtyepic (guest, #30178) [Link]

1.5M here, consisting of en_US and en_US.UTF-8. subsetting has been a standard part of Gentoo as long as I've used it (2004-ish). we moved to locale-gen from Debian in 2006.

Debian switching to EGLIBC

Posted May 8, 2009 1:45 UTC (Fri) by spitzak (guest, #4593) [Link] (2 responses)

The purpose of turning off Locales is so you *can* use UTF-8 and rely on the string functions not doing strange things such as not sorting strings in the obvious way.

Locales are an obsolete idea that UTF-8 is intended to solve. (yea I know locales also change the format some printing and strcmp does which has caused nothing but grief and could be done trivially by the program writer if they really wanted it rather than predictable results).

Debian switching to EGLIBC

Posted May 8, 2009 7:08 UTC (Fri) by anselm (subscriber, #2796) [Link] (1 responses)

With respect, I think that you're oversimplifying things here.

It turns out that »the obvious way« to sort strings doesn't work for many languages other than English, which is precisely one of the reasons the locale concept was invented in the first place. Look up the collating rules for German and Swedish, for example, to see three different ways of collating the »ä« character, none of which corresponds to »the obvious way«. IMHO it does make some sense to put this sort of arcane knowledge into the standard library so that programmers (who are usually not also linguists) do not have to wonder where »ø« goes in the Danish alphabet, and so that a program has half a chance of doing string collation correctly in languages that the original programmers didn't even know existed, let alone catered for in their code. (I'm saying »half a chance« because of the next paragraph.)

Also it turns out that strcmp(3) doesn't, in fact, care about locales at all, so if you use strcmp(3) only in your programs you will not be surprised if the user changes their locale — it's the strcoll(3) function that is supposed to be used for locale-dependent string comparisons. (I do agree with you about the decimal separator issue in printf(3), though.)

I18N is a difficult issue at best, and it isn't helped by people who try cutting corners. Unicode/ISO-10646 and UTF-8 play an important role in making the problem easier to handle, but they're a fairly low-level part of the grand scheme of things. They're like the wheels on a car — indispensable for a smooth ride, but one would generally still like seats and a steering wheel, too.

Debian switching to EGLIBC

Posted May 8, 2009 16:45 UTC (Fri) by spitzak (guest, #4593) [Link]

The most common reason to sort strings is so that a set can be implemented and identical strings found. It would not matter if the sorting order had nothing to do with english or any language, what does matter is that every program in the world sort the strings in exactly the same way.

You are right that strcmp() does what is wanted. I believe I was remembering some scripting langauges where the string comparison changed depending on the locale, which was a nightmare because people rarely test in other locales.

The printf problem is really a pain and forces me to always force the locale to C at startup. I need to use printf, sometimes hidden inside scripting languages where I can't change it, to write data files that are expected to be readable by the same program even if the locale is different.

strcoll() is approximatly the right idea. Make it perfectly clear that this is some human-oriented sorting function. I think the real solution is to make all such functions take the locale as an argument, rather than using a static variable.

Debian switching to EGLIBC

Posted May 6, 2009 18:59 UTC (Wed) by atai (subscriber, #10977) [Link] (2 responses)

No doubt there are forks in free software.

If you happen to be the maintainer, there will be a fork tomorrow because you are the best example why project forks... you only care about your particular situation.

No locale? What about people in non-English countries?

Debian switching to EGLIBC

Posted May 6, 2009 19:14 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

he's not saying that locale should not be availble to those who need it, he's saying that for those of us who _don't_ need it, there should be a way to avoid the space and processing overhead.

today that's _very_ hard to do (you may be able to do it on a gentoo system, I'm not sure)

Debian switching to EGLIBC

Posted May 6, 2009 19:53 UTC (Wed) by drag (guest, #31333) [Link]

Debian's localpurge packages works for me on the few machines were I care about saving a few hundred meg on disk space (which is pretty much ONE.. my EEPC with 4GB of disk space)

And locales do matter in desktop situations. How else is suppose to know which dictionaries I want to use or other things of that nature?

Debian switching to EGLIBC

Posted May 7, 2009 5:25 UTC (Thu) by PaulWay (subscriber, #45600) [Link] (1 responses)

> My hat's always off to people fighting software dependency bloat hell.
> *glares at selinux infecting everything in fedora*

So why aren't you using Gentoo, where you get to customise and optimise everything? Why aren't you using text modes and avoiding the bloat of graphics? Do you even know when you're being tacitly saved from confusion and hassle - or, in the case of SELinux, from actual bugs - by things 'just working' because of this "infective" bloat?

Sadly, it would seem you're too busy using pejoratives.

Have fun,

Paul

Debian switching to EGLIBC

Posted May 13, 2009 18:20 UTC (Wed) by kjp (guest, #39639) [Link]

a. we don't use graphics
b. we don't enable or compile selinux in kernel (but the selinux libs are still statically linked in to the other rpms pointlessly)
c. if starting project today, gentoo would get consideration. fedora only picked for a certain reason not used today.
d. we do rm -rf /usr/share/locale/* at postinstall to save space.

Here's hoping strlcat gets in

Posted May 6, 2009 16:33 UTC (Wed) by epa (subscriber, #39769) [Link] (8 responses)

I wonder if the EGLIBC maintainer will be less obstructive about including the sorely needed strlcpy and strlcat functions.

Here's hoping strlcat gets in

Posted May 6, 2009 17:23 UTC (Wed) by nevyn (guest, #33129) [Link] (1 responses)

Probably, although they aren't really that much better (if you want a real string API in C, go use one). Having a compatible version of asprintf() will be nice though (assuming he takes that), and I imagine there's a bunch of other minor things on other people's wishlist.

Here's hoping strlcat gets in

Posted May 6, 2009 17:27 UTC (Wed) by notting (guest, #28878) [Link]

Given a goal of "[striving] to be source and binary compatible with GLIBC", adding functions, or changing the semantics of existing ones, would seem to be out of scope.

Here's hoping strlcat gets in

Posted May 6, 2009 17:31 UTC (Wed) by aurel32 (subscriber, #7059) [Link] (3 responses)

Doing so will break ABI compatibility, so this is unlikely to be done. But those functions are available in libbsd ( http://libbsd.freedesktop.org/ )

Here's hoping strlcat gets in

Posted May 7, 2009 8:03 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

How would adding a function break compatibility? If that were the case, no new function could ever be added to any library without breaking compatibility with the old version.

Here's hoping strlcat gets in

Posted May 7, 2009 9:25 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

It breaks compatibility in that adding a function to eglibc means that you
can no longer reliably replace eglibc with glibc. (Whether this is
problematic is another matter.)

Here's hoping strlcat gets in

Posted May 7, 2009 12:05 UTC (Thu) by epa (subscriber, #39769) [Link]

Obviously, fixing the ARM bug linked in the article also means that you can no longer reliably replace eglibc with glibc, since this would break (or at least break the build) on previously working ARM systems. I think your definition of compatibility is a bit too strict.

Here's hoping strlcat gets in

Posted May 7, 2009 8:46 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

Here's hoping they don't. A more misbegotten function would be hard to imagine if strtok didn't exist already.

Here's hoping strlcat gets in

Posted May 7, 2009 14:46 UTC (Thu) by nix (subscriber, #2304) [Link]

You want misbegotten functions? gets(). Why on earth it wasn't thrown out in the early 1980s along with the rest of the pre-stdio I/O library I have no idea.

Debian switching to EGLIBC

Posted May 6, 2009 17:30 UTC (Wed) by ikm (guest, #493) [Link] (7 responses)

Probably they didn't like that new XML thing in glibc either.

Debian switching to EGLIBC

Posted May 6, 2009 19:27 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

They will have to if they want to be source and binary compatible as promised.

Debian switching to EGLIBC

Posted May 6, 2009 19:54 UTC (Wed) by drag (guest, #31333) [Link] (5 responses)

That's just silly.

The XML thing was just used as a patch to deal with a VERY badly designed POSIX API feature.

Debian switching to EGLIBC

Posted May 6, 2009 20:04 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

Yes. And the compiled code for it is perhaps 200 bytes. It's literally
just a bunch of printf()s. That, at least, is not worth forking over.

Debian switching to EGLIBC

Posted May 6, 2009 20:09 UTC (Wed) by ikm (guest, #493) [Link] (3 responses)

Doh... That was a joke! I guess it contained too much truth for some :)

Debian switching to EGLIBC

Posted May 6, 2009 20:37 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

I've seen people arguing exactly your joke, seriously, in the past. They
tended to be BSD people.

Some also argued apparently seriously that malloc() and dynamic memory
allocation in general was unacceptably inefficient. They preferred to
statically size *everything* and recompile when needed. This was
apparently more efficient, in some world...

Debian switching to EGLIBC

Posted May 6, 2009 20:45 UTC (Wed) by ikm (guest, #493) [Link]

Actually, yes, I'm with them on this one. Oh no, I should stop I think.

Debian switching to EGLIBC

Posted May 7, 2009 4:05 UTC (Thu) by csamuel (✭ supporter ✭, #2624) [Link]

Probably HPC - we're an odd bunch.. ;-)

Ulrich Drepper's personality

Posted May 6, 2009 22:34 UTC (Wed) by smoogen (subscriber, #97) [Link] (5 responses)

I have never met Ulrich, but have exchanged emails with him over time since oooh 1994 I think. He can be a curt SOB in how he writes, but it is a tendency he shares with Al Viro and some others. He has the ability to laser in problems in code and figure out how to make something work.. but he also has very little ability to emote via email which makes people angry.

I wish Debian with their move to eglibc. It will be interesting to see how much code will need to be fixed due to various glibc assumptions people may have made over the years.

Ulrich Drepper's personality

Posted May 6, 2009 23:12 UTC (Wed) by mbanck (subscriber, #9035) [Link] (4 responses)

eglibc is not really a fork, it is a patch-set on top of glibc which is constantly being synced.

I assume the day eglibc will become incompatible with glibc, Debian will redecide which upstream to follow.

Ulrich Drepper's personality

Posted May 7, 2009 6:49 UTC (Thu) by drag (guest, #31333) [Link] (3 responses)

Now that I've had time to think about it some it seems that the move to eglibc isn't as much
about creating a fork of Glibc.. it's just that it is easier to share the burden of managing patch
sets with other people who are experiencing the same problem.

Maybe the term should be 'SPORK' instead of 'FORK'. I've noticed that this is happenned with a
few different peices of software when people are trying to figure out how to deal with difficult
situations created by upstream developers who otherwise are valuable.

There has formed a multitude of MySQL sporks over the time. Stuff like OurDelta:
http://ourdelta.org/

As a response to the slow and somewhat negative behavior with Sun Microsystems regarding
MySQL releases.

Another example coming from Sun would be Go-OO.org
http://ourdelta.org/

In both cases they don't really want to _fork_ their projects. But they have requirements or
desires that are simply not being addressed by the upstream folks in a timely manner. So in
both cases they try to shove code back upstream, but is a community way of maintaining
their own patches.

Ulrich Drepper's personality

Posted May 7, 2009 8:31 UTC (Thu) by nhippi (subscriber, #34640) [Link]

The eglibc switch has it's roots in the openssl patch epic fail. Since then, it was deemed that patching upstream is generally bad. Changes to code should be done with co-operation from upstream. Some developers (such as the glibc maintainer) raised the valid question of what do when upstream refuses to accept patches? (Too lazy to search for the debian-devel specific discussion).

The upstream certainly has all the rights to put up whatever QA and codestyle requirements they want. The maintainer of the package has the duty to fulfill such requirements and correct any issues in the patch noted by upstream. The end result is better for everyone - debian gets a better fix, and all the other users get the fix too when upstream releases new version.

But what do when upstream plainly refuses to accept a patch? Or tells you to "Go away, stop wasting everyones time"?

One option in such cases is to start maintaining a explicit fork (or spork, as mentioned here). It is more honest for endusers than maintaining a ever-growing stack of patches hidden in a distribution source package. And if others have the same problem(s) with upstream, the spork allows sharing the maintenance burden.

Other options could be switching the maintainer (it is always the upstream that has co-operation problems...) or dropping the package all together (if there are better alternatives).

Ulrich Drepper's personality

Posted May 7, 2009 9:26 UTC (Thu) by pabs (subscriber, #43278) [Link]

The word you are thinking of is a branch.

Ulrich Drepper's personality

Posted May 7, 2009 13:19 UTC (Thu) by btraynor (guest, #26672) [Link]

Just to be clear, your statement "the move to eglibc isn't as much about creating a fork of Glibc" implies that Debian's use of eglibc creates a fork of glibc.

EGLIBC has been around since November 2006 or so. It is a fork of glibc, given this definition, "a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct piece of software" -- http://en.wikipedia.org/wiki/Fork_(software_development).

Also, #eglibc on freenode is alive.

As goes Debian...

Posted May 6, 2009 23:29 UTC (Wed) by sbergman27 (guest, #10767) [Link] (6 responses)

As goes Debian, so will likely go *buntus. And Mint, too. And something about the concept of switching to another libc just seems to have Fedora's name written all over it, too.

Unlike with the XFree86 situation, the *BSDs won't continue to use glibc for years afterward, because they've never used glibc in the first place. Ulrich and his project could become irrelevant remarkably quickly if Debian really goes through with this.

Hopefully also unlike the Xfree86 situation, we won't spend years waiting for some big architectural overhaul to be completed before the real progress can begin.

Fedora switching seems unlikely

Posted May 6, 2009 23:48 UTC (Wed) by stevenj (guest, #421) [Link] (3 responses)

And something about the concept of switching to another libc just seems to have Fedora's name written all over it, too.

Except for the fact that Drepper is employed by Red Hat, which plays a major role in Fedora governance. If Red Hat wanted to switch glibc maintainers, they would presumably have fired him long ago.

Fedora switching seems unlikely

Posted May 7, 2009 0:06 UTC (Thu) by sbergman27 (guest, #10767) [Link] (2 responses)

The highlighted bug reports remind me a lot of the Jeff Johnson/RPM affair, come to think of it.

Fedora switching seems unlikely

Posted May 7, 2009 0:22 UTC (Thu) by jordanb (guest, #45668) [Link] (1 responses)

I was thinking the same thing, and I'm getting the impression that Red Hat must be an incredible place to work. I imagine the lunchroom seethes with nerd testosterone. Maybe it all comes from being bottled up in North Carolina.

Fedora switching seems unlikely

Posted May 7, 2009 0:36 UTC (Thu) by sbergman27 (guest, #10767) [Link]

Again, the "I don't see your name on my paycheck" comment comes to mind. I think Jeff was OK as long as he was only insulting CentOS users. But Red Hat doesn't like employees insulting actual RHEL customers. ;-)

As goes Debian...

Posted May 7, 2009 6:07 UTC (Thu) by pjm (guest, #2080) [Link]

> Ulrich and his project could become irrelevant remarkably quickly if Debian really goes through with this.

I can't argue with what could happen, but that doesn't seem likely. eglibc will continue to sync from glibc; thus, glibc will continue to have relevance. Ulrich has considerable experience and (I gather) skill in working on glibc; eglibc adoption simply reduces his interactions with users, which is probably good for both Ulrich as much as anyone. So I think Ulrich will continue to head glibc development (and hence be relevant to eglibc and everyone using it) for quite some time to come.

As goes Debian...

Posted May 7, 2009 6:39 UTC (Thu) by nix (subscriber, #2304) [Link]

There seems unlikely to be a wait for architectural overhaul.
Architecturally glibc is very nice indeed (downside: the makefiles are
astonishingly powerful... and astonishingly complex, and utterly
undocumented: I suspect that no other makefile on the face of the Earth
uses as many GNU Make features as glibc's).

My biggest hope here is that better docs can be written. Ulrich has a
habit of writing amazing stuff and never documenting any of it in any way
at all except sometimes in PDFs on his homepage. Manuals? Why should they
be updated? (Updates from other people are summarily ignored, too.)

Debian switching to EGLIBC

Posted May 7, 2009 2:22 UTC (Thu) by ringerc (subscriber, #3071) [Link] (1 responses)

Ulrich Drepper writes a lot of VERY good explanatory papers, and some great documentation as well as doing a huge amount of work on critical infrastructure we're ALL using. He's also insanely busy.

Even with this fork, he'll still be doing a large amount of the work going into each eglibc release, via merges with glibc upstream. If eglibc takes off and proves to work well, he'll hopefully consider merging things from it once they've matured that he might not have been too happy about introducing untried into glibc.

That sounds like a win to me. I just hope these don't become antagonistic forks where work is wasted on duplicating already-completed fixes and features, and on pointless flaming.

Debian switching to EGLIBC

Posted May 7, 2009 6:55 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes, but this is a self-fulfilling prophecy. Ulrich is insanely busy
*because* he drives everyone else away so almost nobody helps other than
Roland (who is helpful to a fault: if software developers had saints he'd
be one). If you actively drive people off, you can't really decide not to
do the parts of your maintenance position that *everyone* expects you to
do because you're so busy: that you're busy is entirely your doing, and so
should be its consequences.

Debian switching to EGLIBC

Posted May 7, 2009 10:51 UTC (Thu) by lmb (subscriber, #39048) [Link] (3 responses)

It should be noted that one of the strengths of the Linux community (and others, of course) has been to integrate people with less than perfect social skills or lacking in empathy, bordering on asperger or autism sometimes. (Yours truly is affected and thus allowed to comment ;-)

These people sometimes bring exceptional technical skills to the community (alas, yours truly is not affected here), which they would have not been able to leverage in a more typical day-to-day office setting with pair-programming, lots of communication, and so on.

That is by no means an excuse for being rude, and sometimes apparently not even trying to overcome the issue (but apparently rejoicing in it), as often evident on LKML - but I dare say it is advisable to find ways how to integrate them, instead of forking away from them.

Personally, having been on some projects affected by such people, I think they make wonderful to exceptional engineers. It becomes difficult when they remain project leaders of a growing community.

Of course, it's not always possible for them to accept switching roles (control issues anyone?), but it should at least be considered by everyone. And if so, proposed in a face saving way for all involved - the comments on the terse (and arguably rude) bugzilla responses were just equally rude and juvenile, and certainly unlikely to yield a positive response.

If even that fails, sure, fork - or build up a trailing repository which pulls from the former upstream frequently as done here. Maybe even do that in parallel, to demonstrate seriousness (and ability). But don't forget the other side.

(And yes, before someone reminds me about what we did with Linux-HA, yes, sometimes, after all this has failed, running for your life is the only way to remain sane.)

Debian switching to EGLIBC

Posted May 7, 2009 14:50 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Well, yes, the community provides an excellent place for technically gifted asocial geeks :) but such people (me among them) should recognise our limitations and avoid domains we are unsuited for. So being the only hacker on a small project is fine, but being primary maintenance contact on critical and universally-used software *requires* social skills, because much of the role involves communicating with other human beings without pissing them off. If you don't want to or can't do that, find something else to do (there's nothing stopping you from doing most of the hacking work: just fob off the job of saying yea-or-nay to someone who can do it without annoying everyone.)

Debian switching to EGLIBC

Posted May 7, 2009 18:08 UTC (Thu) by drag (guest, #31333) [Link]

Ya.. People have their own plusses and minuses and should be put into positions were they are going to be the most benefit. And get their egos out of the way... not everybody should be the front-man for their projects. I doubt there are many people that love hacking and getting into code and details and all that that really want to double as a public relations person. And I doubt that there are many people good at mediating disputes and doing social networking and whatnot that really want to spend all their time coding some low-level C libraries.

-------------------------

BTW...

If your acting like a asshole and your completely right about something.. your still a asshole.

A lot of people seem to think that if a they are right about a subject, argument, talking point, or whatever then that gives them allowances to be jerks. Like winning a argument is a victory and the reward is a license to be a asshat. Which it doesn't... they are perfectly within their rights to act like a jerk when they are wrong or when they are right. Being wrong or right doesn't really enter into it and nobody should be surpised when people react negatively to their negative behavior.

Debian switching to EGLIBC

Posted May 7, 2009 22:39 UTC (Thu) by man_ls (guest, #15091) [Link]

I don't think that this fork has to be seen as a drama. NAR suggested above that hiring two or three people to act as buffers or as interface to other people is a good way to deal with ill-mannered engineers. Well, think of this eglibc project as such a buffer. They sync from the genius but interface with the world. It could also be a good way of isolating Ulrich from user requests, and git enables such a workflow beautifully.

Debian marrying EGLIBC

Posted May 7, 2009 18:49 UTC (Thu) by mohitsingh (guest, #58480) [Link]

It looks like final decision. When this news is talked about, an argument is most commonly heard - "its not getting into next release, its just getting into development branch and testing is still required to get into main distro".

Then we remind of the language being used all over. EGLIBC site says "Debian will be switching from GLIBC to EGLIBC". That seems marriage declaration. No scope for an engagement which may/may not lead to marriage.

Final Decision! Embedded Flavor in Open System! So the next version of my desktop distro may well be inspired by one running my washing machine. Child suddenly seem to be the father of man. Is the GOD of small things listening?

MS


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds