Visualization is a critical tool for exploring and understanding large
amounts of data. Thanks to the computer power of the 21st century it has
become possible to visualize ever-expanding amounts of data. Because the
open source development model is massively decentralized and network-centric, it is by its nature the perfect domain for graph-based visualizations. Connections or dependencies between projects, communities, and code commits can be explored and displayed in a lot of ways. These visualizations can give us a unique perspective on open source projects and communities, such as fundamental differences in their approach.
Your author has a longstanding interest in visualizations, especially of
non-numerical information. The classic books about visualizing complex data
are of course Edward Tufte's works, beginning with The Visual Display of
Quantitative Information. Recently, your author has enjoyed reading more
programming-oriented books like Ben Fry's Visualizing Data and
Toby Segaran's and Jeff Hammerbacher's Beautiful Data. What's
even better is that a lot of open source software exists to put this theory
into practice. We'll look at a few of the most interesting open source
visualization programs and their application to open source projects and
Michael Ogawa, a Ph.D. student in the Visualization & Interface Design Innovation group of UC Davis, conducted some interesting research about software visualization. The purpose of this research is to help understand the relationship of the communication between developers and the evolution of the source code. In 2007, Michael published a paper about Visualizing Social Interaction in Open Source Software Projects [PDF]. In 2008, he presented StarGate [PDF], a system that grouped developers of a software project visually into clusters corresponding to the areas of the file repository they work on the most. In both visualization methods, he used Apache and PostgreSQL as case studies. Interested readers should consult the papers for some illustrative insights in these projects.
Michael's most popular visualization method is code_swarm, which shows the history of commits in an open source project as a video. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they fade away. The design of code_swarm is explained in the paper code_swarm: A Design Study in Organic Software Visualization [PDF], which shows some case studies of Python, Eclipse, Apache, and PostgreSQL. Videos generated by code_swarm for these projects are also available on the web site.
The code for code_swarm, written in Ben Fry's Java-based open source programming environment Processing, is available under the GPL v3. It supports various types of repositories: Subversion, CVS, Git, Mercurial, Perforce, VSS, Starteam, and Darcs. The wikiswarm add-on even allows visualizing Wikipedia page histories and user contributions. By downloading and executing the code, everyone can create their own software visualizations, and there's a mailing list for help. The project's wiki also has some documentation, such as a step by step guide of how to generate a video, a FAQ, and a gallery of third-party code_swarm videos.
At the end of 2009, New Zealand software developer Andrew Caudwell presented his software visualization project Gource (a play on Source and Gorse) on his computer graphics blog The Alpha Blenders. Gource takes the logs from a version control system of a software project and displays them as an animated tree with the root directory of the project at its center. Directories appear as branches with files as "leaves", represented by spheres that are colored dependent on their file extension. Developers currently contributing to the project can be seen floating near the files they are modifying. The whole visualization looks organic and is interactive, as the user can rotate the view and move the camera position.
The code for Gource is available under
the GPL v3. It's designed for use with Git, Mercurial, or Bazaar, but it
has also scripts to support CVS and Subversion. It needs a
3D accelerated video card and uses OpenGL for rendering. The wiki has some
documentation, such as how to show Gravatar
images for developers or how to change the
appearance. The wiki also explains how to produce a video and
shows some example videos and screenshots. In
January, Andrew showed
some of his visualizations at linux.conf.au.
In the last few months, several enthusiasts have been experimenting with Gource. For example, Michael DeHaan used Gource to create a visualization of Red Hat's provisioning server Cobbler and he explained that it can be really useful to evaluate an open source project:
When evaluating OSS software for use in business, you always need to know if the community is solid and self sustaining. [Gource] allows you to watch a short video and find out. Coupled with looking through the mailing list archives, that's a pretty good check. It can also help identify interesting patterns of large scale refactoring, new development, or stagnation.
Michael's visualization inspired Daniel Berrange to do the same exercise for libvirt:
It is clear from the video just how much development of libvirt has been expanding over the past 4 years, particularly with the expansion to cover VirtualBox and VMWare ESX server as hypervisor targets.
Daniel also produced a visualization of libvirt using code_swarm, which makes it easy to compare the merits of both methods.
A third, more research-oriented and more general graph visualization tool is Gephi, which is Java-based and is distributed under the GPL v3. Users of this tool call it "like PhotoShop for graphs", or should we say "like GIMP for graphs"? It's a very powerful and interactive tool for exploring, manipulating and visualizing graphs: users can manipulate the structure, shapes, colors, locations of the nodes, and so on, but they can also find the shortest path between nodes and compute graph metrics, find clusters, and conduct a lot of advanced graph analyses. Gephi was originally created in 2007 at WebAtlas, a French non-governmental organization involved in mapping the web and data mining, but is now developed by an international consortium of open source developers.
Gephi is not oriented specifically towards visualizing software
projects: the wiki page about data sets that can be
explored and visualized with Gephi gives examples such as the structure of internet, the topology of the Western States Power Grid of the United States, airlines, a network of disorders and disease genes linked by known disorder-gene associations, and a couple of social networks.
However, the social networks section shows some interesting data sets of
open source projects, which have all been visualized in Gephi
by Franck Cuny, a Perl hacker working at the French social media agency Linkfluence. He developed the CPAN Explorer web site, an interactive visualization to analyze relationships between developers and packages of CPAN (Comprehensive Perl Archive Network). The authors page shows relationships between developers inside CPAN: each developer is represented by a node with a size proportional to the number of modules the developer has released on CPAN. An edge between two developers is created when one developer uses a module from the other developer. The more uses of a developer's modules by other developers, the bigger the label.
One can deduce some interesting facts about developers from this graph: for example, Adam Kennedy gets a big node with a small label, because he has released a lot of modules on CPAN, but few of them are used by other CPAN developers. In contrast, Gisle Aas has a small node with a big label, because he has few modules, but some very popular ones like LWP and URI.
Last month, Franck introduced a new project, Github Explorer. He explained that this was a very natural choice:
I wanted to do something similar again, but not with the same data. So I took a look at what could be a good subject. One of the things that we saw from the map of the websites is the importance github is gaining inside the Perl community. Github provides a really good API, so I started to play with it.
This time, Franck didn't aim for the Perl community only, but the whole community of users of Github, a popular web-based hosting service for projects that use the Git revision control system. He warns that Github doesn't represent the whole open source community and that he has collected only a selection of all user profiles, but nevertheless it gives us a good picture. Each profile is represented by a node, and a link between two profiles is represented by an edge. The weight of the edge is incremented each time the person forks code from the target profile.
On his blog, Franck shows some Github visualizations he made with Gephi, colored by country and split according to the programming language. He has some thought-provoking analyses about some of the languages. For example, the Perl community is clearly split between the 'west' and Japan. In the Python community, there is clearly one main project, the web framework Django. PHP is the only community on Github where the visualization shows clusters of people working together on a specific project. The Ruby graph looks like a big ball of yarn with a couple of isolated countries. In other visualizations, Franck split the graphs according to their country. He offers the data he gathered for use in Gephi, he has published all the graphs on Flickr, and he will offer a printed version on posters of size A2 and A1 for sale soon.
Towards a better understanding of open source communities
Many of these visualizations are beautiful, but that's not the point: to
paraphrase Richard Hamming's dictum "The goal of computation is not
numbers, but insight.", your author would say "The goal of
visualization is not beauty, but understanding." and visualization
tools can help understand the internals and the dynamics of open source
projects and communities. While code_swarm and Gource can show users a lot
about patterns and evolutions in the development of a specific project,
including how the developer community works together, CPAN Explorer and
Github Explorer are about visualizing global connections between a lot of
open source projects, which is also an important factor in open source
communities. Now we just have to wait for some creative minds to visualize the SourceForge or Launchpad communities.
Comments (1 posted)
Q: O Great Rabbi, Perl has so many precedence laws I feel I shall never
learn them all. Which is the most important of these commandments?
A: As it was in the beginning, is now, and ever shall be, the First
Commandment is the Law of Algebraic Precedence:
#1 MULTIPLICATIVE OPERATORS BIND MORE TIGHTLY THAN ADDITIVE ONES.
The Second Commandment is to think of those who come after you,
most preferably before they do so:
#2 DON'T BE A DAMNED FOOL: OTHERWISE USE PARENTHESES!
Follow these two Commandments and all the days of your life will be
blessed, for your code shall be ever right[eous] and all shall love
you for it.
-- Tom Christiansen
Since Emacs is just an editor, not a god, it cannot do miracles.
Comments (2 posted)
Mozilla has released Firefox 3.6.3 to address a critical
that could allow remote code execution.
Full Story (comments: 3)
Grease is a Python-based 2D game engine and development framework, focused
on quick development, good performance, and fun. The project's documentation
detailed tutorial on the creation of an asteroids-style game. "Grease does
not attempt to provide one-size-fits-all solutions. Instead it provides
pluggable components and systems than can be configured, adapted and extended
to fits the particular needs at hand.
Full Story (comments: none)
Version 0.1 of the "Notmuch" email client (recently reviewed on LWN
been released. "In trying to get notmuch to grow up a little bit,
I've just added a version number (0.1 initially) and have started doing
" More informative release notes are promised for the
Full Story (comments: 7)
The initial release of StretchPlayer is available. StretchPlayer is an
audio file player with time-stretching and pitch-shifting features. The
intended audience would appear to be musicians who want to slow down a song
to learn how to play with it. More information can be found on the project's home
Full Story (comments: none)
A group of Subversion developers recently met in New York in an attempt to
come up with a plan for the future development of this source code
management system; a summary of that meeting has now been posted.
"Subversion has no future as a DVCS tool. Let's just get that out there. At
least two very successful such tools exist already, and to squeeze another
horse into that race would be a poor investment of energy and talent.
What's more, huge classes of users remain categorically opposed to the very
tenets on which the DVCS systems are based. They need centralization. They
need control. They need meaningful path-based authorization. They need
simplicity. In short, they desperately need Subversion. It's this class of
user -- the corporate developer -- that stands to benefit hugely from what
Subversion brings to the party.
" Read the whole thing for details
on how they plan to meet that developer's needs.
Full Story (comments: 227)
is a utility for the
processing of raw images from digital cameras. The biggest addition in the
0.17 release would appear to be the incorporation of the lensfun library
, allowing UFRaw to
correct for lens distortion using a database of hundreds of lenses. Also
in this release are a new despeckling algorithm, hot pixel elimination by
default, better zoom support, and more.
Full Story (comments: none)
Fresh from the X.Org server 1.8 release
Keith Packard is pondering making
some changes for the next time around. At the top of his list is
shortening the release cycle to something closer to three months as a way
of getting new hardware support to users more quickly. That proposal is
not universally loved, though, so it's not clear if it will be adopted or
not. He is proposing that the 1.9 release happen in late August. "I don't think there are any major changes planned for this release, so
this shorter merge window seems like it should be sufficient. Nor do I
necessarily think that this would also mean that the X.org release date
should be moved in; having the X server ready *before* the X.org release
seems like a good idea to me.
Full Story (comments: none)
Newsletters and articles
Comments (none posted)
Georges Auberger reports
that the Songbird media player is dropping Linux support. "After
careful consideration, we've come to the painful conclusion that we should
discontinue support for the Linux version of Songbird. Some of you may
wonder how a company with deep roots in Open Source could drop Linux and we
want you to know it isn't without heartache. We have a small engineering
team here at Songbird, and, more than ever, must stay very focused on a
narrow set of priorities. Trying to deliver a raft of new features around
all media types, and across a growing list of devices, we had to make some
" An untested and unsupported version of Songbird for
Linux will still be available for developers.
Comments (9 posted)
Page editor: Jonathan Corbet
Next page: Announcements>>