By Jake Edge
December 7, 2011
There has been quite a bit of press, and some hand-wringing, over reports
that Linux Mint has overtaken Ubuntu as the "most popular" Linux
distribution. The reports are based on the DistroWatch
rankings, which some—though notably not the DistroWatch
folks—seem to think indicates the popularity of various
distributions. While it's a bit hard to imagine that untold legions of Ubuntu
users have switched to Linux Mint en masse, it does have a non-zero
probability of being true. But there aren't, and really can't be, any
numbers to back
that up. Is popularity even really the best measure of a distribution?
The "rankings" that have spawned the uproar are simple page-hit counts.
Each unique IP address that lands on DistroWatch's page for a given
distribution increments the count for the day. It is, at best, a count of
the amount of "buzz" a particular distribution has over the past one,
three, six, and twelve months. It can also be fairly easily manipulated by
someone who has unfettered access to a large number of IP addresses or a
botnet—as well as by over-exuberant distribution fans—though there
is no evidence to suggest that's what's happening here. As
DistroWatch says,
those numbers are:
[A] light-hearted way of measuring the popularity of
Linux distributions and other free operating systems among the visitors of
this website. They correlate neither to usage nor to quality and should
not be used to measure the market share of distributions.
But, for whatever reason, Mint shows up at the top of the list for average
number of hits per day (HPD) for each of the four periods. In fact, Ubuntu has
"slipped" to fourth place over the last month with Fedora and openSUSE
taking second and third place respectively. Mint shows nearly three times
the number of HPD that any of the rest of the top four do. That's
interesting, perhaps, but not meaningful. It is a self-selected "poll"
that could be fairly easily manipulated—likely unintentionally.
The ranking is also heavily skewed toward desktop distributions, as can be
seen by the numbers for server-oriented distributions like Red Hat (which
ranks below things like GhostBSD, Zorin, and Tiny Core) or SUSE (which
ranks a bit lower). Both of those distributions should have accurate sales
numbers that may show a tad more popularity than reading things into the
DistroWatch numbers will show. In short, even a brief look at the rankings page
should be enough to deter anyone from deriving conclusions that result in
headlines like "Ubuntu
sees massive slide in popularity, Mint sprints ahead ... but why?".
Part of the problem here is that it is somewhere between difficult and
impossible to get accurate figures for distribution usage. In fact, it
goes well beyond just distributions; accurately counting users of any free or
proprietary software is well-nigh impossible. Vendors who sell their
software have some advantage, but even they don't know how many
users there are. Microsoft can undoubtedly report how many copies
of Windows it sold in the last month (quarter, year, ...), but that most
certainly doesn't count the number of Windows users. That number is likely
to be much higher due to unlicensed users, which probably dwarfs the not
completely
insubstantial number of pre-installed systems that get wiped to run other
operating systems.
The usual methods to try to track users, like phoning home with some kind
of unique ID, are intrusive. For free software, those mechanisms are
unlikely to be tolerated by some, but even users of proprietary software
may find ways to avoid being counted. Companies selling software count
their users in terms of dollars (euros, ...) so, other than being able to
report inflated piracy numbers as "lost sales", there is no real need for
additional counting. Free software projects and distributions are different.
Those who work on free projects would certainly like to feel that their
work is being used and appreciated. That's not unreasonable at all, but is
popularity really the right measure of that? Even if it can be reliably
measured, popularity just measures ... well ... what's popular—not
what works best, solves the most problems, or anything else. Does it
really matter if Ubuntu has X million users and Linux Mint has X/4
million—or
the reverse? In both cases, the distributions are serving a substantial
number of people and, presumably, solving lots of their problems.
There are some "active counting" efforts by various distributions but, as
would be expected for free software projects, they are "opt-in" services.
Fedora and openSUSE both use smolt to gather semi-anonymized installation
data. Debian and Ubuntu use popcon to generate information on the
popularity of various packages. While users are asked to enable these
counting mechanisms at install time, it's not clear how many actually do
so.
Since directly measuring users is difficult, distributions often use
indirect (and fairly inaccurate) methods to try to get a handle on their
number of users. Both Fedora and openSUSE count unique IP-address
connections to their update servers and have fairly detailed pages that
outline what they are counting (openSUSE, Fedora). Ubuntu has
been notoriously lax in providing any real information on its
methodology—without being shy about producing numbers like 20 million
Ubuntu users—but one would guess it is doing something similar.
That kind of data collection isn't really accurate to generate a "number of
users" figure, though it may be fine as an estimate. Assuming the
methodology remains the same, it may also serve as a reasonable indicator
of trends in the number of users. If Fedora 16 has 50% more unique IPs
getting updates, that's a pretty good indicator that F16 has been adopted
more widely. Comparing F16's raw numbers to those of openSUSE 11.3, for
example, is much less useful.
But obsessing over estimated numbers—or illusory trends based on web
page hits—seems counterproductive. While it is harder to generate
numbers, the measure of a community distribution really should be how vibrant
its community is. Are new people showing up, filing bugs, participating in
development or design discussions, packaging new software, translating
existing software, taking on new tasks, running for elected positions, and
so on? Those are certainly measures of growth, though numerically hard to
quantify.
Focusing on a "zero sum" game for Linux distributions is equally
counterproductive. While the GNOME 3 and Unity decisions made by various
distributions have generated a lot of noise (and likely some distribution
and desktop environment switches), it's pretty hard to justify a "Ubuntu
users are running to Mint because of Unity" stance on anything other than
anecdotal evidence. If the suggested trend is even real, it could be that
Mint is attracting many of the first-time Linux users that Ubuntu once did,
or that it is attracting more than Ubuntu currently is. That could be due
to the "buzz" factor for Mint these days, for example. Not all (or even
most) growth of Linux distributions needs to come at the expense of other
distributions.
Unlike the choice between Windows and OS X (or Linux and either of those),
the choice between Linux distributions is far less susceptible to concerns
about lock-in. Part of what free software enables is relatively easy
migration between distributions, with full data and application
portability, which undoubtedly leads to some "distro hopping". But it's
also true that providing that freedom can attract new users. We've seen it
over the past 20 years, even if the growth on the desktop is not up to what
most had hoped for. Focusing on serving existing users, while attracting
new ones, rather than worrying about pumping up popularity numbers, is a
much more likely road to success.
(
Log in to post comments)