|
|
Subscribe / Log in / New account

Linux Evolution Reveals Origins of Curious Mathematical Phenomenon (PhysOrg)

PhysOrg.com summarizes a scientific paper describing how investigators used the Debian package history to verify Zipf's Law. "Using the data, they showed that the growth rates of connectivities between packages are proportional to the degree of connectivity between packages. In addition, they showed empirically that the average growth rate of the total number of links to a given package over a time interval is proportional to that time interval. Further, the variability of the total number of links to a given package increases proportionally to the square-root of time, providing a crucial test of the mechanism of stochastic proportional growth of connectivity between packages. Altogether, these characteristics are responsible for the universal distribution pattern of Zipf's law."

to post comments

Linux Evolution Reveals Origins of Curious Mathematical Phenomenon (PhysOrg)

Posted Dec 2, 2008 19:03 UTC (Tue) by richo123 (guest, #24309) [Link]

The law seems to hold for most natural languages so why not Debian? The fundamental mechanism in the natural language case is a universal information theoretic evolution (first discussed by Mandelbrot) so there are good grounds for thinking package development which resembles natural language development should also satisfy the same universla mechanism and so have similar statistics.

Linux Evolution Reveals Origins of Curious Mathematical Phenomenon (PhysOrg)

Posted Dec 2, 2008 21:44 UTC (Tue) by rengolin (guest, #48414) [Link]

I can't quite follow how package relationship relate to natural language. Packages follow the rules by dependency while language follows by association.

Mixing a small subset of words can generate completely different meanings (phrasal verbs are a good example of that) while you can't just link gnugo to mesa and expect it to show a 3D GO board. Of course, if you would display it in 3D you would link to mesa, but the flexibility of natural languages are much greater than that of packages.

I think this is just another curiosity of how we can fit a set of things under known distribution rules, pretty much as Riemann did with the prime numbers... ;)

Endangered packages

Posted Dec 3, 2008 1:35 UTC (Wed) by ncm (guest, #165) [Link] (3 responses)

Comments on the original article were oddly dismissive. Finding slavish adherence to a power law in a new and entirely unrelated (to natural languages) domain, and confirming all the ancillary rate conditions that enforce the law, is an important result. Its importance might be incidental to Free Software -- the original connection was just that the Debian repositories offered a comprehensive event record to mine -- but it might also provide an easily automated warning light for projects that are in danger, or that endanger others downstream.

For example, the Apache project's Xerces/Xalan C++ packages (not to be confused with the Java packages of the same name) have very few downstream packages dependent on them. Some of us know why (they suck rocks, or did when last I checked), but their statistics alone should give pause to anybody considering using them. Maintainers of packages that already do depend on them might think seriously about ripping out that dependency, and switching to other packages that have better statistics.

It's much easier to see how many dependencies up- and downstream a particular package has than to compare this for a lot of related packages, or, even moreso, to compare those numbers to corresponding rates. The outliers are the interesting ones.

Endangered packages

Posted Dec 3, 2008 2:47 UTC (Wed) by mgb (guest, #3226) [Link] (2 responses)

Sound practical advice (which, if universally followed, would destroy the power law adherence).

Endangered packages

Posted Dec 3, 2008 6:50 UTC (Wed) by krishna (guest, #24080) [Link]

If developers empirically find that one choice of upstream package is
better than another, one would think they would gravitate towards it. If
the case is (one possibility among many) that the statistics already
reflect the general quality of the upstream packages, then might
developers consulting the statistics end up producing the same result
anyway?

Another train of thought is that developers consulting the statistics
could either converge (onto a power-law formula or another formula) or
diverge as a result of the feedback which that effect produces on the
system.

Endangered packages

Posted Dec 4, 2008 18:27 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

I don't think that people observing that some packages aren't used much, and removing their dependencies on them, really will change the power law, because new packages will continue to be produced, they will look promising at first, and some of them won't pan out. So we'll just see a kind of dynamic equilibrium.


Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds