LWN.net Logo

The OSWatershed.org project

Scott Shawcroft has announced the OSWatershed.org project. "OpenSourceWatershed is a project aimed at understanding the relationship between distributions (downstream) and the individual software components (upstream). It is the basis for a larger study of distributions and their evolution." He concludes that Arch Linux tends to be the least "obsolete," in that only 45% of its packages are behind the leading edge. Debian and openSUSE, instead, are said to be 95% obsolete. The slides from his OSCON talk [PDF] are also available.
(Log in to post comments)

The OSWatershed.org project

Posted Jul 20, 2009 21:01 UTC (Mon) by dagb (subscriber, #30984) [Link]

Yay! Cool stuff! Kudos to Scott for this, it must have been quite a bit of work.

But:
The DIY distros typically have 'tracks' for stable and per-platform stable packages. (At least gentoo has the ACCEPT_KEYWORDS="~$arch" flag, I assume arch got something similar.)

Is this taken care of with the 'future'|'experimental' labels in the graphs, or is that something else?

Otherwise, it is nice to see that my nagging feeling that gentoo was falling behind now is backed by numbers.

Dag

The OSWatershed.org project

Posted Jul 21, 2009 17:53 UTC (Tue) by iabervon (subscriber, #722) [Link]

It seems to me that the ~ keywords are "future" for Gentoo and stable is "current". I wouldn't say that this chart shows that Gentoo is falling behind; ~x86 is listed as the most up-to-date distro available, and it's intentional that stable lags it. Also, since people often use ~x86 for particular packages, any given system is going to be somewhere between 30% and 75%. The chart doesn't show trends, and trends are actually important in order to say anything interesting here.

For that matter, the chart doesn't take into account the presence of unfixed regressions in upstream packages. A distro find that a particular version of the xorg server has major performance regressions for some configurations that are required in order to get certain functionality out of certain hardware (as reported by the upstream project), and decide to keep it out of the "current" distribution for that reason. They may decide that they can't use a particular version of X until KMS is stable on more hardware, and KMS may be sufficient in 2.6.30 but they're still verifying that regressions in 2.6.30 are fixed in 2.6.30.2. So they're behind the leading edge, but the leading edge doesn't actually work.

The OSWatershed.org project

Posted Jul 20, 2009 21:19 UTC (Mon) by bib1963 (guest, #59673) [Link]

Sorry, but this is totally suspect.

Unless he redoes his data by rebasing it when each distro came out, then this "project" should be discarded.

Remember each distro only tends to implement security patches.

If they took the decision to update it each time an updated package came out then there would be little encouragement to upgrade.

The OSWatershed.org project

Posted Jul 28, 2009 3:19 UTC (Tue) by Duncan (guest, #6647) [Link]

Actually, that's called "rolling upgrades", and it's what some of the
"future" versions do, most or all the time. Gentoo ~arch generally tracks
upstream reasonably closely (but not when they're known-broken or are
known to break other things), for instance, as does (from what I've read)
Fedora Rawhide.

On Gentoo, the initial install and recompile to current tree is tougher
than most, but after that, ~arch is "rolling upgrade", as is stable arch
as well, but rather more "obsolete" as it's tested somewhat better, and
where upgrades are tough, as from xorg-server-1.3 to 1.5 (the switch to
RandR for most drivers, and to evdev by default), they make sure there's
an upgrade guide in place before they stabilize a new version. Since that
can take awhile, arch-stable lags a significant amount on critical
packages/package-sets like gcc (remember it's from source, so gcc is
major), xorg, kde, gnome, etc, even if it is a rolling upgrade.

But regardless, I originally installed Gentoo/~amd64 from the 2004.1
release stages, and I'm still running that, but rolling upgraded generally
1-3 times a week since then, to the current default/linux/amd64/2008.0/no-
aged installation. There's people still running original installations
from 2002 or earlier, still rolling-upgrading them as they go.

It's really reasonably easy as long as you keep up with things at least
monthly. If you don't do upgrades at least monthly, by 90 days, it's
getting more difficult, by six months it's quite difficult as there's a
LOT of package upgrades to do, and while Gentoo does try to support
upgrades out to a year, honestly, by that time, or even really at six
months, it's getting easier to simply reinstall from fresh stage tarballs,
which are now updated weekly, so even they are "rolling upgrade".

That's really what makes most of the mainline binary distribution upgrades
so difficult as well -- they only come out every six months or so for most
distributions, and there's a LOT of changes in six months. If they're
rolling upgraded a package at a time as they come out (which, to be fair,
is more difficult for the distributions with binary distributions, since
they have to worry more about ABI changes as well as API changes, often
meaning changing a bunch of packages to upgrade a single library), it's a
LOT easier, because you get to get used to one set of changes before the
next comes out. Well, for the most part. Upgrades such as kde3 to kde4
are still a huge pain, but there's no getting around it for them, because
they're really a whole interdependent ecosystem of packages -- it's not
easily possible to do a single package at a time with them. But at least
that way you don't get issues like I'm seeing on the kde lists/bugs, where
people are having key change issues due to the xorg switch to evdev by
default, but they're blaming it on kde, since that's what they use and
they upgraded the entire distribution at once so got both upgrades at
once, and don't realize that the issues with xorg/evdev are not kde issues
at all.

Duncan

Nice project, but...

Posted Jul 20, 2009 21:26 UTC (Mon) by fjpop (guest, #30115) [Link]

I wonder how much of the difference in lag and percentage obsolete for
stable releases can be explained by the difference in the date of the
last release of the distros. And thus how relevant these figures are.
IMO "lag" is per definition not relevant for a *stable* release.

From the slides it is very clear that Debian unstable has completely
different figures than Debian stable.

My first impression is that the website needs a lot more explanation of
how the data was assembled and how the results were calculated. It could
also be argued that Debian testing should be used as the base
for "future" rather than Debian unstable.

BTW, the use of Debian experimental for this kind of research is rather
useless as it is not a full archive.

The OSWatershed.org project

Posted Jul 20, 2009 22:30 UTC (Mon) by lbt (subscriber, #29672) [Link]

'obsolete' ?

Let me rephrase:
Distro : % of distro containing bleeding edge & potentially untested code
Arch: 55%
Fedora: 40%
Slackware: 5.89%
Debian: 5%

Just because an application is not bleeding doesn't mean it is useless.

If the objective is to encourage upstream <-> distro collaboration to allow users to access new features in a more timely manner then it doesn't seem wise to start by calling the more sedate distros "obsolete".

Helping solve this problem may involve new approaches; eg different distros take quality ownership for different apps and working with upstream to package the application for QA asap after a release. This may give other distros more confidence to adopt a new release based on a peer-distro's recommendation.

So Scott - great analysis; lots of graphs, it told us what we already knew and didn't offer any solutions. You're going to make some consultancy firm really happy :)

The OSWatershed.org project

Posted Jul 21, 2009 0:24 UTC (Tue) by kirkengaard (subscriber, #15022) [Link]

I totally agree about real obsolescence. Codebases that rely on libc5 are obsolete.

For some of us, the sort of "obsolescence" he seems to actually be measuring -- that is, stability -- is a desirable trait. On the other hand, as a Slacker, I run -current, which makes his conclusions about 12.2 obsolete. Of course it is; we're practically on top of the 13.0 release.

The OSWatershed.org project

Posted Jul 21, 2009 2:53 UTC (Tue) by jcm (subscriber, #18262) [Link]

Absolutely. Some people think it's great to rush out and shove the software that just came out yesterday into a distribution used by (maybe) hundreds of thousands of people or more. But that is only suitable for development distributions and the brave. There are complex interactions that cannot be foreseen, there is a need for testing, and then there are bugs.

I'm quite sure many people will get all jumpy about this, and I'm quite sure most of them will miss the point. Users don't care whether they get frobulator 9.1 or frobulator 8.1 as long as when they come to plug their projector into their laptop it actually works, or sound plays, or <insert whatever here>. They much prefer things to work. And so a compromise is needed between shoving things out the door and reality.

Jon.

The OSWatershed.org project

Posted Jul 22, 2009 0:30 UTC (Wed) by lseubert (guest, #4168) [Link]

There are some implied criticisms in this thread of Linux Distros that are very up to date - those with especially low "obsolete" percentages, according to this OSWatershed study. One critique mentioned is the notion of 'pushing out software that hasn't been tested.'

So, Linux Kernel 2.6.30.1 hasn't been tested, and any distro that already has it in its current archive is pushing untested software onto the unsuspecting and soon-to-be-regretful masses? And OpenOffice.org 3.1.1 hasn't been tested either, has it?

This doesn't make sense. Upstream does test its software pretty thoroughly in most cases, and includes the latest bug and security fixes with each new release. There is a good reason why upsteam constantly pushes out new code - to improve it. And most of the time, if not all of the time, the code is in fact better. (Yeah, I know, counter-example = Xorg regressions.)

Now, perhaps these critics actually mean to criticize bleeding edge distros that put together a compilation of very up to date software, which when put together as a whole, results in an unstable system. This is certainly possible.

However, it is also possible for an up to date system to be quite stable. May I suggest that these critics invest the time to install a Linux Distro that is both up to date and stable, and yes, I am talking about Arch Linux here - to verify for themselves whether or not both stability and currency can co-exist.

This Arch user has found the Arch rolling release model to work exceptionally well. My system is amazingly current, with only a rare, occasional, minor hiccup. A strong Arch support community and a relentless committment to KISS enable this surprising combination of stability and freshly released, up-to-date software.

Try it, before you criticize it folks. You just might be delighted :-)

The OSWatershed.org project

Posted Jul 21, 2009 10:26 UTC (Tue) by niner (subscriber, #26151) [Link]

The statistics don't give too much confidence in its value. openSUSE 11.1
is listed in both "Current Distros" and "Future Distros" with different
values for "Avg # New Rels" and "Avg Lag". Also why the 8 months old 11.1
would be listed as "Future" while 11.2 Milestone 3 and the always current
Factory is available is beyond me.

The OSWatershed.org project

Posted Jul 21, 2009 4:22 UTC (Tue) by tannewt (guest, #59683) [Link]

Hi, I'm Scott Shawcroft. Here are my replies:

@Dag: Yeah its a lot of work. Yes, the future and experimental labels are generic names for ~x86 and the unstable versions of other distros.

@bib1963: The ranking will change as distributions release new versions. Elaborate what you mean by 'rebase' chances are this metric can be derived from the data.

@fjpop: More information can be found in my thesis and the source code. Debian experimental is lumped with Debian testing but that can be changed.

@lbt: 'obsolete' is one of the most talked about aspect. Any ideas of a better term? Solutions are tough to come up with. Hopefully the website itself will help shorten the migration time and direct further investigation.

@kirkengaard: 'stability' is as controversial as 'obsolete'. Recheck the website for slackware-current.

@jcm: I agree. I'm trying not to rank the 'best' distributions.

The OSWatershed.org project

Posted Jul 21, 2009 8:19 UTC (Tue) by kragil (subscriber, #34373) [Link]

Obsolete is a very bad word for what you are trying to convey. But I thought about it and I can not come up with a good term either.

Obsolete is something that will never get any patches. Maybe don't name it and just explain it a little more. (I thought about "upstream equality" which would certainly be better, but not perfect.)

The OSWatershed.org project

Posted Jul 21, 2009 8:29 UTC (Tue) by jordanb (guest, #45668) [Link]

The reason why it's not possible to come up with a good word for it is that the entire concept is bogus.

Who cares if you have the latest version of a package, unless you *need* a feature of it, or you're doing development on it? If a distro still distributes Python 2.6, does that mean that they have an 'obsolete' package there?

If he wants to track divergence from the upstream, he should do something like compare the set of outstanding distributor patches. Or even more useful, look at the average *lifetime* of a patch. How quickly is it pushed upstream? That'll avoid bias against distributions that are aggressive about closing their own bugs, so long as they're still active in pushing things upstream.

The OSWatershed.org project

Posted Jul 21, 2009 19:03 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

the reason that you care about your distro shipping version 8.1 when version 9.1 is out is that it becomes very hard to get support for version 8.1

Circularly reasoned

Posted Jul 21, 2009 20:32 UTC (Tue) by man_ls (subscriber, #15091) [Link]

That looks to me like circular reasoning. Usually you get support from your distro, not from upstream. If your distro gives you version 8.1 then they are supposed to support it; if not, they should distribute 9.1 or whatever they feel comfortable supporting.

If your distro is not going to support the packages it distributes then support will be equally hard for 8.1 or 9.1 -- in this case you might just download the latest from upstream and be done with it. Or switch distros.

Circularly reasoned

Posted Jul 21, 2009 21:28 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

in theory you are right

but in practice your distro probably isn't going to do a very good job of supporting a version by themselves. the distro gets it's support from the upstream developers, and if those developers don't provide it because the version you are using is old it severely limits how good the distro support can be.

things work better the more people you have running a version, and it's even better when the developers run/support that version as well because they will be afar more familiar with the code and the problems than any one distro will be.

now, there are particular projects where this falls down

the upstream developers may not actually make releases, so the distro is forced to take a snapshot and support it.

the upstream developers may be doing such a poor job of quality control that the distros have to lag behind to have a working setup

the distros may be customizing the code enough that the upstream developers (and other distros) can't help

but overall I do think that it is much better to use a distro that keeps with a version that is supported upstream

Spiral reasoning rather

Posted Jul 21, 2009 21:41 UTC (Tue) by man_ls (subscriber, #15091) [Link]

In these fine points you are right too, but practice shows that even this level of detail does not cover the whole situation. Arguably the best supported distro is Red Hat, and it is hopelessly outdated by the time it comes out (it doesn't even make the author's list). And the most stable (IMHO the best community-supported) is Debian, which is also quite behind the times by the time it is released.

Without even entering the mandatory stability-vs-currency dichotomy, keeping packages up to date is a very complex task and has a lot of variables. That is why the OSWatershed.org project is not so useful as it would appear at first sight. Still it is an interesting idea.

The OSWatershed.org project

Posted Jul 21, 2009 9:16 UTC (Tue) by k3ninho (subscriber, #50375) [Link]

>Obsolete is a very bad word for what you are trying to convey. But I thought about it and I can not come up with a good term either.

'Superseded' is my suggestion, if you'll be so kind as to ignore the regressions. The older releases aren't obsolete in the sense of the definition at Wiktionary (which compares well with Google's 'Define:obsolete' search [2]). These older editions of software aren't obsolete (as in 'no longer in use; gone into disuse; disused or neglected (often by preference for something newer, which replaces the subject)' -- from Wiktionary). The evidence for that is the statistics shown here: it's still in use and very much not neglected.

I'm going to assume that there's a bias toward latest-and-greatest (possibly due to youth) which is part of this work. It's neat that someone's taken the trouble to assemble this data but don't know what you can really say about this: should everyone be using upstream and following it, so as to reduce effort finding and fixing bugs? Should there be a bug-fixed non-developmental Upstream series which is reliable and stable (and not labeled obsolete)?

[1] http://en.wiktionary.org/wiki/obsolete
[2] http://www.google.com/search?q=define%3Aobsolete

The OSWatershed.org project

Posted Jul 21, 2009 9:46 UTC (Tue) by kragil (subscriber, #34373) [Link]

If the greater Linux/FOSS ecosystem were to listen to Theo de Raadt (Dutch for "the rant" I guess ) we would all be tracking current.

AsiaBSDCon 2009:The OpenBSD Release Process: A Success Story
http://www.youtube.com/watch?v=i7pkyDUX5uM (Caution Xorg devs)

The OSWatershed.org project

Posted Jul 21, 2009 10:01 UTC (Tue) by nix (subscriber, #2304) [Link]

"Unsupported upstream" might be a better term.

Of course then this founders on things like ffmpeg and (for a long time) glibc, which never even made stable releases (or threw them over the wall and forgot about them), so that the only thing upstream supported was the development trunk: yet they didn't expect average users to be running said trunk.

The OSWatershed.org project

Posted Jul 21, 2009 12:46 UTC (Tue) by nowster (subscriber, #67) [Link]

> "Unsupported upstream" might be a better term.

Not all programs which are unsupported upstream are useless. Unless a better alternative comes forward, that program may be the only one that provides that function. Should a distribution drop a useful but stable program just because it hasn't been updated in the last five years?

The OSWatershed.org project

Posted Jul 21, 2009 17:39 UTC (Tue) by tannewt (guest, #59683) [Link]

Again, this is one of the biggest controversies. How about 'old' or 'outdated'? Or is 'Not New' the best way of putting it?

The OSWatershed.org project

Posted Jul 21, 2009 9:46 UTC (Tue) by errare_est (guest, #14275) [Link]

Lag is not as bad as Obsolete, and you already are using that word :)

The OSWatershed.org project

Posted Jul 21, 2009 13:23 UTC (Tue) by kirkengaard (subscriber, #15022) [Link]

I will agree with pride that my distribution of choice is slightly more than half "stable" in its development branch at this point in time, and I think that's a great indicator of release-time balance. Perhaps "proven" is a less pejorative term, but more accurate than "stable," for the child-codebases that are not on the bleeding edge of your trees, but are far from obsolete.

The OSWatershed.org project

Posted Jul 21, 2009 13:57 UTC (Tue) by spinochet (subscriber, #23939) [Link]

Lumping Debian experimental with Debian testing makes no sense to me. While many users of experimental may use testing (though I bet most use unstable), very few users of testing would use experimental. In trying to figure out where each of stable, testing, and unstable stood, I couldn't make heads or tails of your data. I'd suggest using the same groupings and terminology for releases as the distros do, if you want the data to mean anything to them.

The OSWatershed.org project

Posted Jul 21, 2009 17:45 UTC (Tue) by tannewt (guest, #59683) [Link]

Perhaps the experimental/testing stuff is a result of me not being a Debian user. I don't use the same terminology so that I can compare different distributions. I know this comparison is controversial but I feel like it should be done to spark conversation.

The OSWatershed.org project

Posted Jul 22, 2009 17:22 UTC (Wed) by spinochet (subscriber, #23939) [Link]

There is a succinct description of the meaning of these terms at <http://www.debian.org/releases/>.

Correctly mapping Debian releases

Posted Jul 22, 2009 3:32 UTC (Wed) by fjpop (guest, #30115) [Link]

Lumping Debian experimental with Debian testing still makes absolutely no
sense.

If I look at the definitions in your theses, the correct mapping for
Debian would be:
- past: Debian oldstable
- lts: N/A (maybe Debian stable)
- current: Debian stable
- future: Debian testing
- experimental: Debian unstable

Debian experimental should IMO not be included with any of them.

BTW, how do you deal with snapshots taken from upstream source
repositories (i.e. versions in a distro that do not correspond to a
released version)?

I still agree with others that the way the data is presented gives a
completely wrong impression to people who don't know what the reasons are
for lag and obsoleteness. As such there is a high risk that your website
will create a negative view of Linux in general and distros listed as
most "obsolete" in particular. In other words: it risks generating FUD.
IMO you should take a very careful look at how you present things so that
is avoided.

As I've said in my initial comment: the terms obsolete and lag are
completely meaningless for ANY distro version after it has been made a
stable release (or even as soon as it is frozen for release
stabilization).
The only term that could be applied at that point is possibly
how "outdated" packages are. And any presentation of that data,
especially when comparing different distros, should take into account
the "age" of the release.

Also, I doubt that using "number of newer releases" as you do in 4.1.3 in
your thesis is statistically correct given the widely varying release
practices of upstream. Some follow a "release early, release often"
practice, while others will have release freezes and extensive testing
and stabilization periods before a new release. I doubt you can just lump
those together.

``Obsolete'' considered harmful

Posted Jul 25, 2009 3:22 UTC (Sat) by Max.Hyre (subscriber, #1054) [Link]

@lbt: 'obsolete' is one of the most talked about aspect. Any ideas of a better term?
How about ``trailing edge''? It could be defined as any version older than the newest release from the project.

The OSWatershed.org project

Posted Jul 26, 2009 10:14 UTC (Sun) by lab (subscriber, #51153) [Link]

Regarding 'obsolete', how about 'maturity'? It doesn't have the negative connotation, and indicates why the software is older- because it takes time to mature things. Of course, the converse connotation- 'immature' is not particularly nice either.... What about simply 'age'? The connotations of both high or low age are not overly negative, and it corresponds well to what's actually being measured- the age of the software..?

Names confusion

Posted Jul 21, 2009 5:55 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

On the page there is a column with “Codename”, which lists sometimes distro version, sometimes codename. E.g. for Ubuntu it shows “jaunty”, “karmic”. But for Fedora it shows “11”, “12”. It would be nice to have both information, so Ubuntu “9.04 Jaunty”, “9.10 Karmic”, Fedora “11 Leonidas”, “12 Constantine” and so on.

Names confusion

Posted Jul 21, 2009 17:51 UTC (Tue) by tannewt (guest, #59683) [Link]

This is a great point. Unfortunately, some of the data gathering depends on this value otherwise it could be an easy fix. I'll put it on my TODO list.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds