May 14, 2008
This article was contributed by Diego Pettenò
One of the important rights that Free Software gives you is the ability
to take the source code of any software, modify it, and release it again
under a compatible Free Software license.
It is a very important freedom, as it allows not only users to
customize the software they use to better suit their requirements,
but also enables distributions to patch software to build in their
environment. Environmental changes include new architectures and
different versions of system tools and libraries.
As with other important freedoms, this ability can prove to be a huge
problem if not handled properly. There can be problems for
the original author, the person doing the fork, and the users of the
various versions of the software.
The story of Free Software is full of good examples of forks handled correctly,
like the EGCS
fork that transformed the GNU C Compiler into the
GNU Compiler Collection (GCC),
or more recently the replacement of
Jörg Schilling's
cdrtools
with the cdrkit
package that is now found in most distributions.
Unfortunately, the list of bad examples is longer.
Historically, forking a project was a difficult task for most single developers: handling version control repositories (especially with CVS)
was not something done easily. It limited the task of forking to
experienced developers, who usually had enough common sense to know
when forking was not an option.
Nowadays, forking is much easier,
Subversion
allows to developers to easily fetch the whole history of a project.
Distributed version control systems (DVCS) like git, Mercurial,
Bazaar-NG and others remove the need for a central repository, making
forking and branching two very similar activities.
Recently, the GitHub hosting site
has made this action even more prominent by adding a "fork" button on the
pages for the repository hosted on their servers, allowing anybody to
create a new branch (or fork) of a project in a simple mouse click.
The Downsides of Forking
Forking is not always the best option. It should probably be considered
the last resort. Forking divides efforts
as the two projects often take slightly different turns.
The result of the fork is that the two versions of the code diverge, even
though they share the same interface and most
of the background logic.
This creates a series of problems, of a technical nature, that reflects
on the non-technical attributes of a program.
A forked project reuses a big part of the code from the original
project. This causes code duplication, with its usual problems, and one
in particular: security risks. A forked project is usually
vulnerable to the problems the original project had, unless that part
of the code has been rewritten or modified with time.
As the forks evolve, authors often miss the security issues fixed by
their ancestor, making it harder for developers to track the issues down.
Another common problem is the division of users' contributions.
Users usually just report issues to one project, the one they use.
So either the developers of the two projects exchange information about
the bugs they fix in the common code, or the problems will likely be
ignored by one of the two projects, making the distance between the
projects increase.
You can find this very problem with software like
Ghostscript, the
omnipresent PostScript processor, used to generate, view and convert
PostScript files. Its development is currently divided into multiple
forks which do not always give their code back to the originating
project.
You can find one version released under the AFPL (Aladdin Free Public
License), one released under the GPL, a commercial/proprietary one,
and one version that used to be developed by Easy Software
Products, the authors of the CUPS printing system.
The reasons for the forks here were mostly related to licensing issues.
And, in the case of ESP, to better support CUPS.
In the end, the development of different bloodlines for the project
caused, and still causes, problems for distribution maintainers.
Distribution issues include keeping packages aligned, which means
doubling the effort needed to fix the code if it breaks or if it
doesn't follow policy.
Another case where dividing the development effort has caused problems
is in the universe of Logitech mouse control software.
The
lmctl
project was started as a tool to control some
settings of Logitech devices, like resolution and cordless channels.
The code has to know which devices have which settings available.
To do this, it keeps a table of USB identifiers. As new devices started
appearing on the market, and Linux users started using them and the table
became outdated.
Distributions patched this up, but in different ways, creating
inconsistent tables. Some users started releasing their own modified
version of lmctl with an extended table to support different devices.
While explicit forks of entire projects have problems, the fact that
they delineate where they took the code from makes it easier to track
down the source of bugs and handle security vulnerabilities. On the
other hand, when a project borrows some code and imports it in its
source distribution, this kind of tracking becomes more difficult.
Free Software licenses explicitly allow, and push for, importing code
between projects; cross-pollination also improves general code quality
over time.
For most distributions, an internal imported copy of a library inside
another project is also a violation of policy. For this reason the
developers will most likely try to make the project use a shared, external
copy of the code.
This works fine when the other library is simply bundled together
untouched, but it becomes a nuisance if there are subtle changes
which might not be apparent at a first impression.
One thing to take into account when you want to have an internal copy of
a library is to consider it as an untouchable piece of code.
instead of spending time fixing bugs inside that copy of the code, the
developers should try to fix the bugs in the original sources, so that
everybody (including themselves) can make use of the improvement.
In the real world, one example of this can be the
FFmpeg source code.
FFmpeg is imported by many different Free Software projects in the area
of multimedia: xine, MPlayer, GStreamer. While it is a very wide common ground for all these projects, as well for some others that aren't
importing a copy of it like VLC, some of the imports change the source
code, in more or less subtle ways. In the case of xine, the whole build
system is replaced to integrate it with the automake-based build system
used by the rest of the library. Further patching is done to the
sources themselves so that they behave in a slightly different way than
the original. The code rots quickly and bugs that were already
fixed in the in-development sources of FFmpeg still sprout in xine-lib.
Maintaining such an import is a difficult and boring task, to the point
that the developers, in the past two years, have spent a lot of energy
toward the goal of not using an internal copy of FFmpeg anymore.
The result is that the difference between the original FFmpeg and the
internal copy is quite smaller, mostly limited to the build system.
Instead of advising against using an external copy of FFmpeg, it is
advised not to use the internal one. For the next minor version of
xine-lib, FFmpeg is being used pristine, entirely unpatched, and it will
probably not even be bundled with the library in the next future.
Successful Forks
Of course it's not all bad. There are successful forks in Free Software,
and many of them are now more famous than their parents. I've already named
the GNU Compiler Collection, which is the GCC that almost all Free Software
users have at hand at the moment. Most people use GCC version 3
and later, which started as a fork of the other GCC (the GNU C Compiler), version 2. The original development of GCC was, like many other GNU projects, very closed to the community.
As Eric S. Raymond defined it in his book The Cathedral and the
Bazaar, it was a Cathedral-style development that often prepares the ground for forks, and this was no exception. Multiple forks of the GCC
code were created. Their goals, while different, often didn't clash, but could have easily been worked on at the same time. Some of the forks were
then merged into the EGCS project, which eventually replaced the original
GCC.
Again citing GNU's Cathedral-style of development, it's difficult not to
talk about GNU Emacs
and its brother XEmacs.
Created originally to
support one particular product, the XEmacs project is nowadays a mostly
standalone project. XEmacs is kept at an arm's length from GNU Emacs,
mostly because of licensing and copyright assignment issues.
Neither version can be considered a superset of the other because they
both implement features in their own way.
Better is the state of
Claws Mail,
started as a different
branch of Sylpheed,
with the name Sylpheed Claws. Originally the intention was
to develop new features that could one day find their way back to the
original code. Claws Mail has since declared itself independent and
is now a stand-alone project. In this case, the exchange of code between the two projects has basically halted, as the code bases have diverged so
much that they retain very little in common.
In the case of the Ultima Online server emulators, forks became daily
events, and cross-pollination had grown to the point where at least five
projects were linked by family ties.
The UOX3 source code has been
forked, reused, imported and cut down so that it is present in WolfPack,
LoneWolf, NoX-Wizard and Hypnos.
Almost all of the UOX3 forks involved re-writing parts of the code,
as it had stratified to the point of not being maintainable.
The forks continued copying one from the other to make use of the best
features available.
Forking vs. Branching
There are a few good reasons why you might want to detach, temporarily,
from a given development track. Development of experimental features, new
interfaces, backend rewrites or resurrection of a project whose original authors are unavailable.
In most of these cases, forking is not the best solution but
branching most likely is. Although the border between these two
actions started slimming down thanks to distributed VCS, branching
usually doesn't involve setting up a new web page for the project,
changing its name or finding a new goal. And a branch is usually
related, tightly or not, to the original project. Merges between
the two code bases often happen at more or less regular intervals,
and ideas and bug reports are shared.
Branches usually have the target of being merged in the main
development track, sooner for small, testing branches, or later for huge
rewrites. They don't usually require dividing of the efforts as the
problems affecting the main branch get their fixes propagated to the
other branches when they merge back the original code.
One common problem with developing through branches involved bad support
in the Subversion version control system. In Subversion the branches are
represented as a different path in the repository, with almost no help
for branches in the merge operations.
With a modern distributed VCS, branches are so cheap that
any checkout is, from some points of view, a different branch, and the
merge operations are one of the main focuses.
Projects like the Linux kernel or xine-lib rely heavily on an
above-average number of branches. These are often short-lived and
used for testing purposes.
Looking to the Future
Forks will never end in Free Software as they are supported by one of
the freedoms that make Free Software what we all want it to be.
The future will, of course, bring new forks.
Recently there has been a lot of talk about
Funpidgin,
a fork of the widespread Pidgin Instant Messaging client (formerly Gaim).
Again it seems like it was the Cathedral-style development of the original
code that motivated a fork that could give (some of) the users what
they wanted.
And even though GNU Emacs opened its process quite a lot, its forks
haven't stopped sprouting. This is despite the fact that
Richard Stallman, original author and mastermind behind the GNU project,
stepped down as maintainer, putting in place Stefan Monnier and Chong Yidong.
The Aquamacs Emacs is still diverging from the original GNU
Emacs for supporting Apple's Mac OS X, while different versions
are being developed to support the multiple user interfaces one can use
on that operating system. Similarly, although the Windows port of Emacs
is already pretty solid, there are extensions being written to make it
easier for users to adapt it to the Microsoft environment.
Forks are usually the effect of a closed-circle development, a Cathedral,
where some of the developers or users can't see their objective being
fulfilled, will all their energy being poured in. So just look for the
projects that don't seem to be getting much love from a community, and
you might find a fork starting to make its first leaves.
Then there is the
Poppler project,
which merged together the modified versions of the XPDF code imported by
projects like GNOME and KDE for their PDF viewers.
Poppler is soon going to be a nearly omnipresent PDF viewer on Free
Software desktops and beyond.
This summer's milestone KDE 4.1 release will include the release of
the new oKular document viewer, oKular will use Poppler for PDF rendering
on the (stable) KDE users' desktops.
Conclusions
I'd suggest that anybody thinking about creating a fork should think
twice. Forking is rarely a good choice, better choices can be
branching, or if you need just part of a code, working together like
Poppler developers did to separate the code to share the common parts.
When you want to make some changes to a software project, propose
branching it, show the results to the original developers and discuss
with them on how to improve the code. Most of the times you'll find
authors are open to the changes.
A fork is a grave matter. It might bring innovation to the Free Software
community, but it could also separate developers that could otherwise
work together, maybe in a better way. In this light, GitHub's one click
forking capability seems like a dangerous feature.
The ever-increasing ease of forking everything, from small projects
to part of, or even entire distributions (think about Debian's
repositories and Gentoo's overlays) is increasing the fragmentation of
Free Software projects. Biodiversity in software can be a very good
thing, just like in nature, but people should first try their best to
work together, rather than one against the other.
Comments (9 posted)
System Applications
Database Software
The May 11, 2008 edition of the PostgreSQL Weekly News
is online with the latest PostgreSQL DBMS articles and resources.
Full Story (comments: none)
Device Drivers
Version 0.8.3 of
LIRC,
an Infrared remote control interface, has been
announced. A bug with the
Irman remote has been fixed.
Comments (none posted)
Embedded Systems
Stable version 1.10.2 of
BusyBox,
a collection of command line tools for embedded systems, has been announced.
"
Bugfix-only release for 1.10.x branch. It contains fixes for echo, httpd, pidof, start-stop-daemon, tar, taskset, tab completion in shells, build system."
Comments (none posted)
Web Site Development
Version 1.0 of BencHTTP has been
announced.
"
BencHTTP is a utility to test the performance of HTTP servers under load. It is highly configurable via simple script files and allows extensive customisation of HTTP requests. The current server performance is continuously displayed during the test."
Comments (none posted)
Version 0.6.31 of
nginx, an
HTTP server and mail proxy server, has been announced, this
is a bug fix release. See the
CHANGES
file for more details.
Comments (none posted)
Version 2.0 of OpenKM, a multi-platform Web 2.0 document
management application, has been announced.
"
This new version entails the following improvements: the
previsualization of multimedia elements as images and videos, an
improved an rewritten administration interface, a centralized management
of templates, an exclusive area to allow users to store their private
documentation, a tool for massive import and output data from ZIP files,
searches by date ranks as well as translations to more languages."
Full Story (comments: none)
Desktop Applications
Audio Applications
Version 1.3.5 beta of
Audacity,
an audio editor, has been announced.
"
Changes include improvements and new features for recording, import/export and the user interface. Because it is a work in progress and does not yet come with complete documentation or translations into foreign languages, it is recommended for more advanced users."
Comments (none posted)
Business Applications
Version 2.35 maintenance pack 4 of Openbravo ERP, a web-based ERP for SMEs,
has been
announced.
"
This a stabilization release with no additional functionality. It is intended for production usage."
Comments (none posted)
Version 1.3 of YaMA has been announced.
"
Yet Another Meeting Assistant (YaMA), will help you with the Agenda,
Meeting Invitations, Minutes of a Meeting as well as Action Items. If
you are the assigned minute taker at any meeting, this tool is for
you."
Full Story (comments: none)
Desktop Environments
GNOME 2.22.1 is now available for Slackware 12.1, with the release of
SlackBuild GNOME 2.22.1. "
There have been a lot of improvements in
this latest GSB release, including the move to PulseAudio, fewer package
replacements, a GNOME-integrated Compiz-Fusion setup, the latest
NetworkManager, Abiword 2.6, and OpenOffice2.4 built for GNOME, a richer
Mono C# suite, as well as all the great features of GNOME 2.22."
Full Story (comments: 1)
The following new GNOME software has been announced this week:
You can find more new GNOME software releases at
gnomefiles.org.
Comments (none posted)
The following new KDE software has been announced this week:
You can find more new KDE software releases at
kde-apps.org.
Comments (none posted)
The following new Xorg software has been announced this week:
More information can be found on the
X.Org Foundation wiki.
Comments (none posted)
Desktop Publishing
Version 1.5.5 of LyX, a GUI front-end to the TeX typesetter, is out.
"
We are pleased to announce the release of LyX 1.5.5. Being the fourth
maintenance release in the 1.5.x cycle, this release further improves
the stability and usability of the application. Besides this, it also
introduces some new features.
Most notably, LyX is now prepared to be compiled with Qt 4.4 that has
just been released: the stability issues that occured in previous
versions of LyX when compiled against Qt 4.4 have been resolved."
Full Story (comments: none)
Financial Applications
Version 3.2 stable of Buddi has been
announced.
"
Buddi is a simple budgeting program targeted for users with little or no financial background. It allows users to set up accounts and categories, record transactions, check spending habits, etc.
I am happy to announce the first minor stable release on the 3.x branch. This version fixes a few bugs (including a bug when copying your files from OSX 10.4 Tiger to OSX 10.5 Leopard), and resolves a few feature requests. No major changes here, but it is recommended that all users upgrade."
Comments (none posted)
The long-awaited KMyMoney 0.9 release is out. There's a lot of new stuff
here, including charts, budgets, forecasts ("
Sadly, we are unable to accurately predict the future value
of your investments") a whole new set of wizards, better transaction
auto-filling, and more. Note that this is a development release, but it's
still an important step forward for a promising project.
Full Story (comments: none)
Interoperability
Release 1.0-rc1 of Wine has been
announced.
"
This is the first release candidate for Wine 1.0. Please give it a
good testing to help us make 1.0 as good as possible. In particular
please help us look for apps that used to work, but don't now."
Comments (none posted)
Office Suites
KDE.News
covers
the release of KOffice 2.0 Alpha 7.
"
The KDE Project today announced the release of KOffice version 2.0 Alpha 7, a technology preview of the upcoming version 2.0. This version adds a lot of polish, some new features in Kexi and KPresenter and especially better support for the OpenDocument format. It is clear that the release of KOffice 2.0 with all the new technologies it brings is drawing nearer.
This is mainly a technology preview for those that are interested in the new ideas and technologies of the KOffice 2 series."
Comments (none posted)
Miscellaneous
Version 1.0.0 of Accelerator has been announced.
"
Accelerator is a GUI program that shows where keyboard accelerators should
go in menu option texts and dialog labels. The program produces optimal
results on the basis that the best accelerator is the first character, the
second best is the first character of a word, the third best is any
character, the worst is no accelerator at all, and no accelerator should be
used more than once. With this program developers can help improve
usability for users who can't use the mouse and for fast typists who don't
want to use the mouse."
Full Story (comments: 2)
Version 0.70.0 of Task Coach, a todo manager for managing personal tasks
and todo lists, has been
announced. Changes include:
"
Small feature enhancements, more translations and several bug fixes. Task Coach is now distributed under the GPLv3+."
Comments (none posted)
Languages and Tools
C
The May 12, 2008 edition of the GCC 4.2.4 Status Report
has been published.
"
GCC 4.2.4 will follow in about a week in the absence of any
significant problems with 4.2.4-rc1. If you believe a problem should
delay the release, please file a bug in Bugzilla, mark it as a
regression, note there the details of the previous 4.2 releases in
which it worked and CC me on it. Anything that is not a regression
from a previous 4.2 release is unlikely to delay this release."
Full Story (comments: none)
Caml
The May 13, 2008 edition of the Caml Weekly News
is out with new articles about the Caml language.
Topics include: Core godi package,
GODI Search: Includes now sources and
uint64lib release (again), plus help request.
Full Story (comments: none)
Perl
The 28 April-3 May, 2008 edition of
This Week on perl5-porters is out with the latest Perl 5 news.
Comments (none posted)
Python
Python versions 2.6a3 and 3.0a5 have been released.
"
Please note that these are alpha releases, and as such are not
suitable for production environments. We continue to strive for a
high degree of quality, but there are still some known problems and
the feature sets have not been finalized. These alphas are being
released to solicit feedback and hopefully discover bugs, as well as
allowing you to determine how changes in 2.6 and 3.0 might impact
you."
Full Story (comments: none)
Version 0.9.7 of Pyrex has been announced, several new capabilities have
been added.
"
Pyrex is a language for writing Python extension modules.
It lets you freely mix operations on Python and C data, with
all Python reference counting and error checking handled
automatically."
Full Story (comments: none)
The May 12, 2008 edition of the Python-URL! is online with
a new collection of Python article links.
Full Story (comments: none)
Tcl/Tk
The May 8, 2008 edition of the Tcl-URL! is online with new
Tcl/Tk articles and resources.
Full Story (comments: none)
The May 14, 2008 edition of the Tcl-URL! is online with new
Tcl/Tk articles and resources.
Full Story (comments: none)
Build Tools
Version 0.7 of Paver has been announced.
"
Paver
is a "task" oriented
build, distribution and deployment scripting tool. It's similar in idea to
Rake, but is geared toward Python projects and takes advantage of popular
Python tools and libraries.
Paver can be seen as providing an easier and more cohesive way to work
with a variety of proven tools.
With Version 0.7, Paver is now a full stand-in for the traditional
distutils- or setuptools-based setup.py. Need to perform some extra
work before an sdist runs?"
Full Story (comments: 1)
IDEs
Version 1.3.17 of Pydev and Pydev Extensions is out with bug fixes.
"
PyDev is a plugin that enables users to use Eclipse for Python and
Jython development -- making Eclipse a first class Python IDE -- It
comes with many goodies such as code completion, syntax highlighting,
syntax analysis, refactor, debug and many others."
Full Story (comments: none)
Page editor: Forrest Cook
Next page: Linux in the news>>