LWN.net Logo

Development

The Freedom of Fork

May 14, 2008

This article was contributed by Diego Pettenò

One of the important rights that Free Software gives you is the ability to take the source code of any software, modify it, and release it again under a compatible Free Software license. It is a very important freedom, as it allows not only users to customize the software they use to better suit their requirements, but also enables distributions to patch software to build in their environment. Environmental changes include new architectures and different versions of system tools and libraries. As with other important freedoms, this ability can prove to be a huge problem if not handled properly. There can be problems for the original author, the person doing the fork, and the users of the various versions of the software.

The story of Free Software is full of good examples of forks handled correctly, like the EGCS fork that transformed the GNU C Compiler into the GNU Compiler Collection (GCC), or more recently the replacement of Jörg Schilling's cdrtools with the cdrkit package that is now found in most distributions. Unfortunately, the list of bad examples is longer.

Historically, forking a project was a difficult task for most single developers: handling version control repositories (especially with CVS) was not something done easily. It limited the task of forking to experienced developers, who usually had enough common sense to know when forking was not an option.

Nowadays, forking is much easier, Subversion allows to developers to easily fetch the whole history of a project. Distributed version control systems (DVCS) like git, Mercurial, Bazaar-NG and others remove the need for a central repository, making forking and branching two very similar activities. Recently, the GitHub hosting site has made this action even more prominent by adding a "fork" button on the pages for the repository hosted on their servers, allowing anybody to create a new branch (or fork) of a project in a simple mouse click.

The Downsides of Forking

Forking is not always the best option. It should probably be considered the last resort. Forking divides efforts as the two projects often take slightly different turns. The result of the fork is that the two versions of the code diverge, even though they share the same interface and most of the background logic. This creates a series of problems, of a technical nature, that reflects on the non-technical attributes of a program.

A forked project reuses a big part of the code from the original project. This causes code duplication, with its usual problems, and one in particular: security risks. A forked project is usually vulnerable to the problems the original project had, unless that part of the code has been rewritten or modified with time. As the forks evolve, authors often miss the security issues fixed by their ancestor, making it harder for developers to track the issues down.

Another common problem is the division of users' contributions. Users usually just report issues to one project, the one they use. So either the developers of the two projects exchange information about the bugs they fix in the common code, or the problems will likely be ignored by one of the two projects, making the distance between the projects increase.

You can find this very problem with software like Ghostscript, the omnipresent PostScript processor, used to generate, view and convert PostScript files. Its development is currently divided into multiple forks which do not always give their code back to the originating project. You can find one version released under the AFPL (Aladdin Free Public License), one released under the GPL, a commercial/proprietary one, and one version that used to be developed by Easy Software Products, the authors of the CUPS printing system.

The reasons for the forks here were mostly related to licensing issues. And, in the case of ESP, to better support CUPS. In the end, the development of different bloodlines for the project caused, and still causes, problems for distribution maintainers. Distribution issues include keeping packages aligned, which means doubling the effort needed to fix the code if it breaks or if it doesn't follow policy.

Another case where dividing the development effort has caused problems is in the universe of Logitech mouse control software. The lmctl project was started as a tool to control some settings of Logitech devices, like resolution and cordless channels. The code has to know which devices have which settings available. To do this, it keeps a table of USB identifiers. As new devices started appearing on the market, and Linux users started using them and the table became outdated. Distributions patched this up, but in different ways, creating inconsistent tables. Some users started releasing their own modified version of lmctl with an extended table to support different devices.

While explicit forks of entire projects have problems, the fact that they delineate where they took the code from makes it easier to track down the source of bugs and handle security vulnerabilities. On the other hand, when a project borrows some code and imports it in its source distribution, this kind of tracking becomes more difficult. Free Software licenses explicitly allow, and push for, importing code between projects; cross-pollination also improves general code quality over time.

For most distributions, an internal imported copy of a library inside another project is also a violation of policy. For this reason the developers will most likely try to make the project use a shared, external copy of the code. This works fine when the other library is simply bundled together untouched, but it becomes a nuisance if there are subtle changes which might not be apparent at a first impression. One thing to take into account when you want to have an internal copy of a library is to consider it as an untouchable piece of code. instead of spending time fixing bugs inside that copy of the code, the developers should try to fix the bugs in the original sources, so that everybody (including themselves) can make use of the improvement.

In the real world, one example of this can be the FFmpeg source code. FFmpeg is imported by many different Free Software projects in the area of multimedia: xine, MPlayer, GStreamer. While it is a very wide common ground for all these projects, as well for some others that aren't importing a copy of it like VLC, some of the imports change the source code, in more or less subtle ways. In the case of xine, the whole build system is replaced to integrate it with the automake-based build system used by the rest of the library. Further patching is done to the sources themselves so that they behave in a slightly different way than the original. The code rots quickly and bugs that were already fixed in the in-development sources of FFmpeg still sprout in xine-lib.

Maintaining such an import is a difficult and boring task, to the point that the developers, in the past two years, have spent a lot of energy toward the goal of not using an internal copy of FFmpeg anymore. The result is that the difference between the original FFmpeg and the internal copy is quite smaller, mostly limited to the build system. Instead of advising against using an external copy of FFmpeg, it is advised not to use the internal one. For the next minor version of xine-lib, FFmpeg is being used pristine, entirely unpatched, and it will probably not even be bundled with the library in the next future.

Successful Forks

Of course it's not all bad. There are successful forks in Free Software, and many of them are now more famous than their parents. I've already named the GNU Compiler Collection, which is the GCC that almost all Free Software users have at hand at the moment. Most people use GCC version 3 and later, which started as a fork of the other GCC (the GNU C Compiler), version 2. The original development of GCC was, like many other GNU projects, very closed to the community.

As Eric S. Raymond defined it in his book The Cathedral and the Bazaar, it was a Cathedral-style development that often prepares the ground for forks, and this was no exception. Multiple forks of the GCC code were created. Their goals, while different, often didn't clash, but could have easily been worked on at the same time. Some of the forks were then merged into the EGCS project, which eventually replaced the original GCC.

Again citing GNU's Cathedral-style of development, it's difficult not to talk about GNU Emacs and its brother XEmacs. Created originally to support one particular product, the XEmacs project is nowadays a mostly standalone project. XEmacs is kept at an arm's length from GNU Emacs, mostly because of licensing and copyright assignment issues. Neither version can be considered a superset of the other because they both implement features in their own way.

Better is the state of Claws Mail, started as a different branch of Sylpheed, with the name Sylpheed Claws. Originally the intention was to develop new features that could one day find their way back to the original code. Claws Mail has since declared itself independent and is now a stand-alone project. In this case, the exchange of code between the two projects has basically halted, as the code bases have diverged so much that they retain very little in common.

In the case of the Ultima Online server emulators, forks became daily events, and cross-pollination had grown to the point where at least five projects were linked by family ties. The UOX3 source code has been forked, reused, imported and cut down so that it is present in WolfPack, LoneWolf, NoX-Wizard and Hypnos. Almost all of the UOX3 forks involved re-writing parts of the code, as it had stratified to the point of not being maintainable. The forks continued copying one from the other to make use of the best features available.

Forking vs. Branching

There are a few good reasons why you might want to detach, temporarily, from a given development track. Development of experimental features, new interfaces, backend rewrites or resurrection of a project whose original authors are unavailable. In most of these cases, forking is not the best solution but branching most likely is. Although the border between these two actions started slimming down thanks to distributed VCS, branching usually doesn't involve setting up a new web page for the project, changing its name or finding a new goal. And a branch is usually related, tightly or not, to the original project. Merges between the two code bases often happen at more or less regular intervals, and ideas and bug reports are shared.

Branches usually have the target of being merged in the main development track, sooner for small, testing branches, or later for huge rewrites. They don't usually require dividing of the efforts as the problems affecting the main branch get their fixes propagated to the other branches when they merge back the original code.

One common problem with developing through branches involved bad support in the Subversion version control system. In Subversion the branches are represented as a different path in the repository, with almost no help for branches in the merge operations. With a modern distributed VCS, branches are so cheap that any checkout is, from some points of view, a different branch, and the merge operations are one of the main focuses. Projects like the Linux kernel or xine-lib rely heavily on an above-average number of branches. These are often short-lived and used for testing purposes.

Looking to the Future

Forks will never end in Free Software as they are supported by one of the freedoms that make Free Software what we all want it to be. The future will, of course, bring new forks. Recently there has been a lot of talk about Funpidgin, a fork of the widespread Pidgin Instant Messaging client (formerly Gaim). Again it seems like it was the Cathedral-style development of the original code that motivated a fork that could give (some of) the users what they wanted.

And even though GNU Emacs opened its process quite a lot, its forks haven't stopped sprouting. This is despite the fact that Richard Stallman, original author and mastermind behind the GNU project, stepped down as maintainer, putting in place Stefan Monnier and Chong Yidong. The Aquamacs Emacs is still diverging from the original GNU Emacs for supporting Apple's Mac OS X, while different versions are being developed to support the multiple user interfaces one can use on that operating system. Similarly, although the Windows port of Emacs is already pretty solid, there are extensions being written to make it easier for users to adapt it to the Microsoft environment.

Forks are usually the effect of a closed-circle development, a Cathedral, where some of the developers or users can't see their objective being fulfilled, will all their energy being poured in. So just look for the projects that don't seem to be getting much love from a community, and you might find a fork starting to make its first leaves.

Then there is the Poppler project, which merged together the modified versions of the XPDF code imported by projects like GNOME and KDE for their PDF viewers. Poppler is soon going to be a nearly omnipresent PDF viewer on Free Software desktops and beyond. This summer's milestone KDE 4.1 release will include the release of the new oKular document viewer, oKular will use Poppler for PDF rendering on the (stable) KDE users' desktops.

Conclusions

I'd suggest that anybody thinking about creating a fork should think twice. Forking is rarely a good choice, better choices can be branching, or if you need just part of a code, working together like Poppler developers did to separate the code to share the common parts.

When you want to make some changes to a software project, propose branching it, show the results to the original developers and discuss with them on how to improve the code. Most of the times you'll find authors are open to the changes.

A fork is a grave matter. It might bring innovation to the Free Software community, but it could also separate developers that could otherwise work together, maybe in a better way. In this light, GitHub's one click forking capability seems like a dangerous feature.

The ever-increasing ease of forking everything, from small projects to part of, or even entire distributions (think about Debian's repositories and Gentoo's overlays) is increasing the fragmentation of Free Software projects. Biodiversity in software can be a very good thing, just like in nature, but people should first try their best to work together, rather than one against the other.

Comments (8 posted)

System Applications

Database Software

PostgreSQL Weekly News

The May 11, 2008 edition of the PostgreSQL Weekly News is online with the latest PostgreSQL DBMS articles and resources.

Full Story (comments: none)

Device Drivers

LIRC 0.8.3 released

Version 0.8.3 of LIRC, an Infrared remote control interface, has been announced. A bug with the Irman remote has been fixed.

Comments (none posted)

Embedded Systems

BusyBox 1.10.2 released

Stable version 1.10.2 of BusyBox, a collection of command line tools for embedded systems, has been announced. "Bugfix-only release for 1.10.x branch. It contains fixes for echo, httpd, pidof, start-stop-daemon, tar, taskset, tab completion in shells, build system."

Comments (none posted)

Web Site Development

BencHTTP: 1.0 released (SourceForge)

Version 1.0 of BencHTTP has been announced. "BencHTTP is a utility to test the performance of HTTP servers under load. It is highly configurable via simple script files and allows extensive customisation of HTTP requests. The current server performance is continuously displayed during the test."

Comments (none posted)

nginx 0.6.31 released

Version 0.6.31 of nginx, an HTTP server and mail proxy server, has been announced, this is a bug fix release. See the CHANGES file for more details.

Comments (none posted)

OpenKM 2.0 released

Version 2.0 of OpenKM, a multi-platform Web 2.0 document management application, has been announced. "This new version entails the following improvements: the previsualization of multimedia elements as images and videos, an improved an rewritten administration interface, a centralized management of templates, an exclusive area to allow users to store their private documentation, a tool for massive import and output data from ZIP files, searches by date ranks as well as translations to more languages."

Full Story (comments: none)

Desktop Applications

Audio Applications

Audacity 1.3.5 beta released

Version 1.3.5 beta of Audacity, an audio editor, has been announced. "Changes include improvements and new features for recording, import/export and the user interface. Because it is a work in progress and does not yet come with complete documentation or translations into foreign languages, it is recommended for more advanced users."

Comments (none posted)

Business Applications

Openbravo ERP: 2.35 maintenance pack 4 is available (SourceForge)

Version 2.35 maintenance pack 4 of Openbravo ERP, a web-based ERP for SMEs, has been announced. "This a stabilization release with no additional functionality. It is intended for production usage."

Comments (none posted)

Announcing feature update v1.3 of YaMA

Version 1.3 of YaMA has been announced. "Yet Another Meeting Assistant (YaMA), will help you with the Agenda, Meeting Invitations, Minutes of a Meeting as well as Action Items. If you are the assigned minute taker at any meeting, this tool is for you."

Full Story (comments: none)

Desktop Environments

Announcement for GNOME SlackBuild GNOME 2.22.1 Desktop for Slackware 12.1

GNOME 2.22.1 is now available for Slackware 12.1, with the release of SlackBuild GNOME 2.22.1. "There have been a lot of improvements in this latest GSB release, including the move to PulseAudio, fewer package replacements, a GNOME-integrated Compiz-Fusion setup, the latest NetworkManager, Abiword 2.6, and OpenOffice2.4 built for GNOME, a richer Mono C# suite, as well as all the great features of GNOME 2.22."

Full Story (comments: 1)

GNOME Software Announcements

The following new GNOME software has been announced this week: You can find more new GNOME software releases at gnomefiles.org.

Comments (none posted)

KDE Software Announcements

The following new KDE software has been announced this week: You can find more new KDE software releases at kde-apps.org.

Comments (none posted)

Xorg Software Announcements

The following new Xorg software has been announced this week: More information can be found on the X.Org Foundation wiki.

Comments (none posted)

Desktop Publishing

LyX 1.5.5 is released

Version 1.5.5 of LyX, a GUI front-end to the TeX typesetter, is out. "We are pleased to announce the release of LyX 1.5.5. Being the fourth maintenance release in the 1.5.x cycle, this release further improves the stability and usability of the application. Besides this, it also introduces some new features. Most notably, LyX is now prepared to be compiled with Qt 4.4 that has just been released: the stability issues that occured in previous versions of LyX when compiled against Qt 4.4 have been resolved."

Full Story (comments: none)

Financial Applications

Buddi: 3.2 stable released (SourceForge)

Version 3.2 stable of Buddi has been announced. "Buddi is a simple budgeting program targeted for users with little or no financial background. It allows users to set up accounts and categories, record transactions, check spending habits, etc. I am happy to announce the first minor stable release on the 3.x branch. This version fixes a few bugs (including a bug when copying your files from OSX 10.4 Tiger to OSX 10.5 Leopard), and resolves a few feature requests. No major changes here, but it is recommended that all users upgrade."

Comments (none posted)

KMyMoney 0.9 released

The long-awaited KMyMoney 0.9 release is out. There's a lot of new stuff here, including charts, budgets, forecasts ("Sadly, we are unable to accurately predict the future value of your investments") a whole new set of wizards, better transaction auto-filling, and more. Note that this is a development release, but it's still an important step forward for a promising project.

Full Story (comments: none)

Interoperability

Wine 1.0-rc1 released

Release 1.0-rc1 of Wine has been announced. "This is the first release candidate for Wine 1.0. Please give it a good testing to help us make 1.0 as good as possible. In particular please help us look for apps that used to work, but don't now."

Comments (none posted)

Office Suites

KOffice 2.0 Alpha 7 released (KDE.News)

KDE.News covers the release of KOffice 2.0 Alpha 7. "The KDE Project today announced the release of KOffice version 2.0 Alpha 7, a technology preview of the upcoming version 2.0. This version adds a lot of polish, some new features in Kexi and KPresenter and especially better support for the OpenDocument format. It is clear that the release of KOffice 2.0 with all the new technologies it brings is drawing nearer. This is mainly a technology preview for those that are interested in the new ideas and technologies of the KOffice 2 series."

Comments (none posted)

Miscellaneous

Accelerator 1.0.0 released

Version 1.0.0 of Accelerator has been announced. "Accelerator is a GUI program that shows where keyboard accelerators should go in menu option texts and dialog labels. The program produces optimal results on the basis that the best accelerator is the first character, the second best is the first character of a word, the third best is any character, the worst is no accelerator at all, and no accelerator should be used more than once. With this program developers can help improve usability for users who can't use the mouse and for fast typists who don't want to use the mouse."

Full Story (comments: 2)

Task Coach: Release 0.70.0 available (SourceForge)

Version 0.70.0 of Task Coach, a todo manager for managing personal tasks and todo lists, has been announced. Changes include: "Small feature enhancements, more translations and several bug fixes. Task Coach is now distributed under the GPLv3+."

Comments (none posted)

Languages and Tools

C

GCC 4.2.4 Status Report

The May 12, 2008 edition of the GCC 4.2.4 Status Report has been published. "GCC 4.2.4 will follow in about a week in the absence of any significant problems with 4.2.4-rc1. If you believe a problem should delay the release, please file a bug in Bugzilla, mark it as a regression, note there the details of the previous 4.2 releases in which it worked and CC me on it. Anything that is not a regression from a previous 4.2 release is unlikely to delay this release."

Full Story (comments: none)

Caml

Caml Weekly News

The May 13, 2008 edition of the Caml Weekly News is out with new articles about the Caml language. Topics include: Core godi package, GODI Search: Includes now sources and uint64lib release (again), plus help request.

Full Story (comments: none)

Perl

This Week on perl5-porters (use Perl)

The 28 April-3 May, 2008 edition of This Week on perl5-porters is out with the latest Perl 5 news.

Comments (none posted)

Python

Python 2.6a3 and 3.0a5 announced

Python versions 2.6a3 and 3.0a5 have been released. "Please note that these are alpha releases, and as such are not suitable for production environments. We continue to strive for a high degree of quality, but there are still some known problems and the feature sets have not been finalized. These alphas are being released to solicit feedback and hopefully discover bugs, as well as allowing you to determine how changes in 2.6 and 3.0 might impact you."

Full Story (comments: none)

Pyrex 0.9.7 released

Version 0.9.7 of Pyrex has been announced, several new capabilities have been added. "Pyrex is a language for writing Python extension modules. It lets you freely mix operations on Python and C data, with all Python reference counting and error checking handled automatically."

Full Story (comments: none)

Python-URL! - weekly Python news and links

The May 12, 2008 edition of the Python-URL! is online with a new collection of Python article links.

Full Story (comments: none)

Tcl/Tk

Tcl-URL! - weekly Tcl news and links

The May 8, 2008 edition of the Tcl-URL! is online with new Tcl/Tk articles and resources.

Full Story (comments: none)

Tcl-URL! - weekly Tcl news and links

The May 14, 2008 edition of the Tcl-URL! is online with new Tcl/Tk articles and resources.

Full Story (comments: none)

Build Tools

Paver 0.7 announced

Version 0.7 of Paver has been announced. "Paver is a "task" oriented build, distribution and deployment scripting tool. It's similar in idea to Rake, but is geared toward Python projects and takes advantage of popular Python tools and libraries. Paver can be seen as providing an easier and more cohesive way to work with a variety of proven tools. With Version 0.7, Paver is now a full stand-in for the traditional distutils- or setuptools-based setup.py. Need to perform some extra work before an sdist runs?"

Full Story (comments: 1)

IDEs

Pydev and Pydev Extensions 1.3.17 announced

Version 1.3.17 of Pydev and Pydev Extensions is out with bug fixes. "PyDev is a plugin that enables users to use Eclipse for Python and Jython development -- making Eclipse a first class Python IDE -- It comes with many goodies such as code completion, syntax highlighting, syntax analysis, refactor, debug and many others."

Full Story (comments: none)

Page editor: Forrest Cook
Next page: Linux in the news>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.