LWN.net Weekly Edition for April 7, 2005

The kernel and BitKeeper part ways

Back in early 1999, your editor got a call from Larry McVoy. He was worried that Linus Torvalds was on the verge of burning out as the kernel project grew. The problems in those days were quite evident; Linus, it was being said, did not scale. But, according to Larry, a complete burnout was not inevitable. If Linus could be given the right tools, many of his problems (and the frustrations of other kernel developers) could be solved, and the system would run smoothly again. The right tool, according to Larry, was a thing called BitKeeper; if some sort of agreement could be made on licensing, Larry (along with his company, BitMover) was willing to make BitKeeper available for kernel development. In fact, Larry wanted to see BitKeeper used for all free software development; see this article from the March 25, 1999 LWN Weekly Edition for a view of how things looked at that time.

Three years later, the situation did not look any better. The 2.4 kernel had taken almost a full year to stabilize after 2.4.0 came out. 2.5 had begun, but the process was looking rocky at best. Patches were being dropped, developers were complaining, and Linus was getting tired. After convincing himself that the tool had reached a point where it could do what he needed, Linus decided to give BitKeeper a try. There was no looking back.

Some of the development process issues could have been addressed by adopting any source code management system. But BitKeeper brought more than that; it established a model where there is no central repository. Instead, each developer could maintain one or more fully independent trees. When the time came, patches of interest could be "pulled" from one tree to another while retaining the full revision history. Rather than send patches in countless email messages - often multiple times - developers could simply request a pull from their BitKeeper trees. Meanwhile, the current development trees could be pulled automatically into the -mm kernel, allowing patches to be tested by a wider audience prior to merging into the mainline. BitKeeper enabled a work method and patch flow which naturally supported the kernel's development model.

Once the developers and the tools got up to speed, kernel development took off like never before. The rate at which patches were merged skyrocketed, the developers were happy, and the whole system ran smoothly. The public version of Linus's BitKeeper repository (and the repositories of many other developers) made the development process more transparent than ever. Anybody could look to see the up-to-the-minute state of the kernel and how it got there. Larry was right: with the right tools, Linus really could scale.

The only problem was that BitKeeper is proprietary software. Instead, it came (in binary-only form) with a license which allowed free use, but which imposed some significant restrictions. The free version of BitKeeper could only be used with open source projects; users could be required to make their repositories available on demand. The free version posted all changelog information on openlogging.org, and disabling the logging was not allowed. Users were required to upgrade to new versions, which could come with different licenses. And users were not only prohibited from reverse engineering the software, but they were prohibited from working on any sort of source code management system at all.

Larry wanted to have his cake and eat it too. He truly wanted to support the development of free software - as long as that software didn't threaten his own particular business niche. Supporting the kernel development cost real money - and supporting the business which created BitKeeper cost even more. Whenever BitMover felt that its business model was threatened, it responded; often the BitKeeper licensing terms were changed in response to perceived threats - to the point that the BitKeeper license became known in some circles as the "don't piss off Larry license."

Well, somebody pissed off Larry one time too many. The final straw, it seems, was a certain high-profile developer who refused to stop reverse engineering work while simultaneously doing some work for OSDL. BitMover is now withdrawing support for the free version of BitKeeper, and Linus has ceased to use it. BitKeeper is no longer the source code management system for the kernel. Proprietary software can be good stuff, but it always carries this threat: you never really know if it will be there for you tomorrow or not. BitMover has decided that it can no longer afford to make BitKeeper available for the free software community.

BitMover has issued a press release on this change:

BitMover looks forward to implementing our extensive roadmap and delivering advanced SCM technology to a wider market. As part of this focus, BitMover has replaced the free version of BitKeeper with the recently released open source BitKeeper client. Those developers who desire additional functionality may choose to migrate to the more powerful commercial version of BitKeeper.

The open source client, incidentally, enables the extraction of the current version from a repository, but does little else. The PR also states that "Our relationship with the Open Source community has been evolving and many of the key developers have already migrated to the commercial version of BitKeeper." Linus has, however, made it clear that he is not one of those "key developers":

Right now, the only real thing that has happened is that I've decided to not use BK mainly because I need to figure out the alternatives, and rather than continuing "things as normal", I decided to bite the bullet and just see what life without BK looks like. So far it's a gray and bleak world ;)

What happens next is far from clear. The kernel developers will not go back to the previous way of doing things - no source code management system at all. Even the developers who can continue to use BitKeeper are unlikely to continue doing so if Linus is unable to pull their patches. So a replacement will have to be found. It is not clear that any of the free alternatives is up to the task of handling a project as large as the kernel. One of them may end up growing up in a hurry in order to take the load. Thanks partly to the example and motivation provided by BitKeeper, the free alternatives do look far more viable than they did three years ago, when Linus first started using BitKeeper.

Larry has made it clear that he blames the free software community for this turn of events:

I'm far from blameless but the majority of this problem is an open source community problem. They simply don't want to play with non-open source. At least some of them don't and they ruin it for the rest of us. The problem here is one of policing. By ignoring/tolerating the bad apples the community punishes itself.

If BitKeeper users were violating the license under which they received the software, they have indeed done something wrong. Every time we release code under a free license, we do so with the expectation that the terms of that license will be respected. To treat somebody else's license with less respect is hypocritical; if the license terms are not acceptable, do not use the software. That said, one could note a couple of other things. The notion that developers of proprietary software do not engage in reverse engineering - that it's "an open source community problem" - is debatable at best. And how, exactly, might the community be expected to do this sort of "policing"?

The ironic result of all this is likely to be the accelerated development of exactly what Larry claims to most fear: a free source code management system that, while it lacks much of what makes BitKeeper great, is "good enough" for a large portion of the user base. As the BitKeeper developers found out, hosting the kernel project is an effective way to shake out scalability and usability problems. Whichever system ends up hosting the kernel can be expected to go through a period of rapid improvement.

BitMover did, in fact, get a few benefits from hosting the kernel, even if, in the company's view, the benefits do not come close to equaling the associated costs. BitKeeper is a more scalable and robust system as a result of the use it saw in that role. There were also substantial PR benefits; see, for example, this 2004 press release with nice quotes from David Miller and Linus Torvalds. There can be no doubt that working with the kernel has brought a great deal of visibility to BitKeeper, and that must have resulted in some new business. The cynical among us might conclude (and some already have concluded) that BitMover simply decided that it had obtained most of the benefits it was going to get from hosting the kernel and decided to move on.

Whether or not that is the case, it cannot be doubted that Linux, too, has benefited strongly from its association with BitKeeper. We would not have a 2.6 kernel with anything near its level of capability, scalability, and robustness without the role played by BitKeeper. One could easily argue that the free source code management systems would not be as good as they are had BitKeeper not come along. BitKeeper was a gift to the community that was well worth accepting; now that it is gone, the best thing to do is to say "thanks" (with sincerity!) and figure out what comes next.

Comments (65 posted)

Ubuntu and UserLinux

April 6, 2005

This article was contributed by Joe 'Zonker' Brockmeier.

Despite what you may have heard on Slashdot, UserLinux and Ubuntu aren't going to be merging anytime soon. A few weeks ago, Ubuntu's Jeff Waugh invited the UserLinux project to "collaborate with Ubuntu to build the finest platform and community for FOSS service providers". This was after a discussion about the problems of trying to build UserLinux around Debian when it's taking a long time for a new stable release from Debian. Waugh's invitation generated a fair amount of additional discussion on the UserLinux list, but little comment from UserLinux founder Bruce Perens. It's become clear that Ubuntu and UserLinux will remain separate for the foreseeable future, but we decided to check in with Perens to see what he had to say about the whole thing.

Perens was quick to note that he supports Ubuntu, but doesn't think that Ubuntu's corporate-sponsored model is the way to go for UserLinux.

When Mark [Shuttleworth] started to work on Ubuntu, he called me up and we talked about whether I'd be interested in taking a leadership position in Ubuntu and I decided not to pursue that, because I feel that a non-profit is the correct paradigm for a Linux distribution. A Linux distribution is inherently not a profit-making enterprise and we are seeing [some of] the commercial Linux distributions abuse the open source paradigm because of that fact.

In addition, Perens said that Debian's development process allows anyone to become a developer and run for Project Leader or hold another Debian office, which doesn't exist in other projects. "You can be part of Ubuntu's community or Fedora's community, you don't get the chance to be boss".

Shortly after Waugh's invitation on the UserLinux mailing list, some of the Ubuntu team created experimental UserLinux packages for Ubuntu. The metapackages would allow creating "a sort of Ubuntu-flavored UserLinux". Unfortunately, the packages were also in violation of the UserLinux trademark policy. When we asked about the situation, Perens noted the importance of having a trademark policy, given the abuse of the Debian trademark "in various ways" and that "the UserLinux guys get to say what can be called UserLinux when they do their version of the Debian release." He also said he didn't have a problem with labeling the packages "ul" or something similar to distinguish them from official UserLinux packages. It would appear, after a bit of friction, that the projects are sorting out the trademark issue so Ubuntu can include the metapackages.

But the Ubuntu effort highlights the problem of perceived inertia for UserLinux. Perens announced UserLinux in December of 2003. There was a great deal of interest in the idea at the time, but the wait for a Debian release has certainly had an impact on the momentum of the project.

Perens conceded that there was a perception among the Linux community that UserLinux had stagnated, but said that the perception can be overcome.

A lot of people would have given up now, because the time-to-market is totally blown, but this was never intended to be a start-up business. Having been on board or watching the last five companies that were attempting to commercialize Debian, I have some idea what went wrong and what went right and I think we can make this idea work with businesses.

As far as UserLinux, I think what I would like to do, is once Debian has made a release, have our roster of support companies ready to support it, and to just start giving these things out at Linux-related business events and saying 'here's a system with full support, here's a price sheet, and we're going to give you a lower cost of ownership than Linux. We're going to beat other Linux distributions on TCO and we're going to give you more control because, more than Fedora, more than Ubuntu, you get a chance to determine exactly how the system is built, because it's tracking what the Debian organization does, it's not a Debian variant.

Perens also told LWN that the best way for someone to help with UserLinux is to be involved with Debian.

For people in the community, my main desire is that they work on Debian, okay? We can use some people on the UserLinux project, but the UserLinux policy is when we make software, we do it on Debian teams, and check it into Debian Subversion, don't issue as separate UL packages unless there's a Debian freeze...I think that Debian is a very healthy community, despite the challenges.

To outsiders, however, it may appear at times that the Debian Project is too mired in political disagreements and flame wars to actually get anything done -- which is a significant objection to wanting to be involved with Debian. Perens said that there is a need to convince "a significant portion of 1,000 active developers that your policy is right" when working with Debian, but "that in itself is a quality assurance process".

Perens said he was "heartened" by the recent announcement that Debian will soon be doing a release, and that "when Debian wants to get off the dime, they can". He also said that the Debian developers have been "pretty embarrassed by the long delay of the release" and have bit the bullet to get it out the door. He also predicted that the next Debian release after Sarge will be scheduled, and it will be kept on schedule.

It will be interesting to see what happens after Debian Sarge is released, and whether the UserLinux project can succeed as distribution for "businesses of all sizes".

Comments (4 posted)

GCJ - past, present, and future

April 6, 2005

This article was contributed by Mark Wielaard

GCJ (the GNU Compiler for the java programming language) is part of GCC (the GNU Compiler Collection) and provides a compiler, runtime environment, core libraries and tools for the Java language - it's an object oriented, strongly typed, garbage collected programming framework with a rich core library. GCJ is modeled after, and is a free replacement for, the proprietary Java platform. But like GNU is Not Unix, GCJ is not Java.

The traditional Java platform is clearly not an ideal system, especially when combined with the traditional GNU system, but it is not too bad. The essential features seem to be good ones. Lots of Free Software is already written in the Java programming language so a free system compatible with the Java platform would be convenient for many hackers. GCJ is an extension of GCC and facilitates integration with other languages supported by GCC. GCJ 4, part of GCC 4.0, adds more features to easily integrate programs written using the GCJ development environment with the rest of the GNU platform while being even more compatible with the traditional Java platforms then previous releases. GCC 4.0 is scheduled to be released around April 15.

GCJ design history

Originally GCJ was designed as a “radically traditional” compiler for the Java programming language. It is an AOT (Ahead Of Time) compiler which automatically uses every GCC optimization available during compile time for a given architecture and produces binaries or (shared) libraries for the given platform. These programs run at full native speed without needing any interpreter or JIT (Just In Time) compilation. GCC is available for a large number of architectures and platforms so compiling directly to native code using the GCC back-ends makes programs written with GCJ much more portable then the traditional (proprietary) Java platform. This radically traditional approach makes all normal GNU tools like GDB available to the programmer writing code in the Java programming language just like when programming in any other language supported by GCC.

Thereafter, support for generating and interpreting byte code .class and .jar files was added. This made GCJ more compatible with traditional applications written in the Java programming language that are compiled to byte code. GCJ can be used in various modes:

Compile and link .java source files to binaries, .o or .so files.
Compile and link .class or .jar byte code files to binary.
Compile .java source files to .class byte code files (gcj -C).
Interpret .class or .jar byte code files during runtime (gij).

The byte code interpreter is included as part of the standard runtime libgcj and can be used by programs to switch between interpreting byte code and executing natively compiled code on demand. So not all of the program has to be completely interpreted or completely compiled ahead of time at the same time.

To facilitate integration with code written in other languages, GCJ defines the CNI (Compiled Native Interface). CNI makes it easy to mix and match code and classes written in C, C++ and Java by allowing you to write some methods of a class in C++ and to catch and throw exceptions directly to and from parts of the program written in different languages. GCJ also support the more traditional JNI (Java Native Interface) for using code written in C from your programs.

Anthony Green posted the original design document for GCJ from 1998.

Drawbacks of the GCJ 3.x approach

GCJ 3.x provides a good “better than Java” development environment that allows tight integration with the rest of the GNU platform. But it has disappointed some traditional Java programmers. The possibility to mix and match native code with byte code in the compiler and libgcj runtime makes GCJ very flexible. But falling back to interpreting byte code doesn't really take full advantage of the whole “radically traditional” approach. Especially programs using advanced byte code based class loader tricks used to work slowly because they fell back to using the interpreter during runtime.

There are GCJ extensions to add support for using natively compiled code all the time. But programs had to be adapted to use these extensions. Instead of using .jar files containing byte code definitions of new classes programs would have to use a new URL scheme (gcjlib:) for their URLClassLoader uses. The first “Fast Free Eclipse” port to GCJ was done this way. The source code of the plugin loading mechanism was adapted to search for natively compiled plugins in shared library .so files besides ordinary .jar byte code files. There was even a moderately popular project, rhug, that maintained a lot of patched versions of traditional free Java programs that were adapted to gcj's view of the world. But these patches were almost never adopted upstream and the maintenance of these forks took a lot of time. So the benefits of the GCJ approach were only seen by programs written explicitly for it, but not by traditional Java programs.

One of the main goals of the GCJ 4 effort was to bring all the advantages of the “radically traditional” approach to any program written in the Java programming language without needing any application-level changes.

GCJ 4 enhancements

Probably the most visible enhancement of GCJ 4 comes from merging the libgcj runtime with the GNU Classpath core class library project. By collaborating with other free runtimes like the traditional kaffe environment and around 20 other projects, GCJ 4 is able to offer a core class library comparable to JDK 1.3 or 1.4. The collaboration of all these projects on a common core library implementation means that a lot of the libraries needed by applications, except for advanced Swing, Corba and sound usage, are available out of the box. Kaffe, for example, is being used by the Apache project to track the build of most of the jakarta projects using their Gump auto-builder.

The other big change is the addition of the -findirect-dispatch switch to the compiler. Using that option causes GCJ to generate native code for classes and methods that follow the precise same binary compatibility rules as described in the Java Language Specification. This means that native compiled code can now be used everywhere, even in the most tricky class loader situations, where previously the program would fall back to interpreted byte code. At the 2004 GCC Summit Tom Tromey and Andrew Haley described this new binary compatibility ABI for GCJ in more detail.

The new binary compatibility (BC) ABI makes it possible to transparently compile programs to native code using gcj -findirect-dispatch without having to change the application source code or even the build process. To map byte code to GCJ compiled native code, GCJ 4 introduces gcj-dbtool. This tool is used by the packager during deployment of the application or library to create a database mapping the bytecode of a class to the native code during runtime. Programs can use different databases using the gnu.gcj.precompiled.db.path system property. The databases make it possible to create a cache of all native compiled code that can be shared by different programs installed on the system. The How to BC compile with GCJ GCC wiki page has examples.

This approach is used by the native Eclipse packages in Fedora Core 4. No changes to the eclipse code base are necessary anymore and, after the project is bootstrapped, all resulting .jar files are BC compiled. To almost completely automate this process, Thomas Fitzsimmons created java-gcj-compat. A collection of wrapper scripts, symlinks and jar files that provide a Java-SDK-like interface to this new GCJ 4 tool set.

Future plans

The -findirect-dispatch switch can currently only be used for byte code and not in combination with CNI (JNI is already supported). This limitation currently prevents parts of the core class libraries from being BC compiled. Lifting this restricting will facilitate more integration with GNU Classpath.

With GCJ and GDB a programmer can step through native C, C++ and Java source code using the same tool. Traditional Java developers are more used to JDWP (Java Debugging Wire Protocol) for debugging their applications. Eclipse comes with built-in support for JDWP. Work is in progress to provide JDWP debugging support for the different execution mechanisms. This code will also be shared with the GNU Classpath project.

Benchmarks show that GCJ is comparable (sometimes faster, sometimes slower) to traditional execution mechanisms for Java programs. But GCJ currently doesn't really take advantage of the new GCC 4.0 Tree SSA optimizer framework. For 4.1 the GCJ developers hope to add a couple of GCJ specific optimizations.

Tom Tromey is currently working on GCJX, a new GCC frontend that will include support for the new 1.5 language additions, such as generics. And the GNU Classpath project has a separate branch for the core class libraries that depend on the new 1.5 language additions.

Escaping the Java Trap

GCJ 4 is the result of seven years of work by a large and active community of Free Software hackers. This new version is complete enough to replace most interesting uses of the proprietary Java platform. It adds a whole new set of core libraries and adds some new features to help integration with the rest of the GNU platform. Upcoming versions of some GNU/Linux distributions will use GCJ 4 to provide much more Java-based Free Software, including Eclipse, Jonas, OpenOffice.org 2, Tomcat and the Jakarta libraries. There is also a great deal of free software to integrate with traditional GNU/Linux distributions provided by the JPackage project. Both Debian and Fedora are working with the jpackage hackers to support more of these packages “out of the box”.

All this doesn't mean that we have escaped the Java trap yet. As pointed out by Richard Stallman in “Free But Shackled - The Java Trap” we have to actively work together to keep code safe and free. It looks like the main target projects for GCJ 4 (Apache Jakarta, Eclipse and OpenOffice.org 2), have all reacted positively to the feedback and patches provided to support free alternatives to the Java platform. The fact that the changes requested were for making the projects more portable ("don't use undocumented com.sun internal classes") rather than requests to dramatically change the code, (core) libraries used or build infrastructure has helped a lot. But the above projects were already free software projects at heart. It remains to be seen if other more traditional java projects will adapt so easily to support GCJ 4 out of the box.

Comments (44 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Popup blocking; New vulnerabilities in gaim, kernel, sylpheed, wu-ftpd, ...
Kernel: A new semaphore type; The boundaries for stable kernel patches; Dealing with binary firmware.
Distributions: Changes at Mandrakesoft; Trustix Secure Linux 3.0 alpha; FedoraForum.org; 64 Studio
Development: The DSpace Digital Repository System, new versions of Gentle.NET, Hobbit Monitor, pkpgcounter, ACollab, DocBookWiki, mnoGoSearch, Glame, BRL-CAD, PLplot, gEDA Suite, SQL-Ledger, Cyphesis, ClearHealth, Furthur, AbiWord, Perl, PHP.
Press: Free Software in Brazil, Enlightenment DR17, No More Free BitKeeper, Latin America Install Fest, Sun criticizes GPL, de Icaza and Stallman interviews, Live CDs and security.
Announcements: Mandrakesoft acquires Conectiva, anti-Linux study, FUDCon2 at LinuxTag, Lisp conferences, Samba eXPerience.

Next page: Security>>