LWN.net Weekly Edition for April 7, 2005
The kernel and BitKeeper part ways
Back in early 1999, your editor got a call from Larry McVoy. He was worried that Linus Torvalds was on the verge of burning out as the kernel project grew. The problems in those days were quite evident; Linus, it was being said, did not scale. But, according to Larry, a complete burnout was not inevitable. If Linus could be given the right tools, many of his problems (and the frustrations of other kernel developers) could be solved, and the system would run smoothly again. The right tool, according to Larry, was a thing called BitKeeper; if some sort of agreement could be made on licensing, Larry (along with his company, BitMover) was willing to make BitKeeper available for kernel development. In fact, Larry wanted to see BitKeeper used for all free software development; see this article from the March 25, 1999 LWN Weekly Edition for a view of how things looked at that time.Three years later, the situation did not look any better. The 2.4 kernel had taken almost a full year to stabilize after 2.4.0 came out. 2.5 had begun, but the process was looking rocky at best. Patches were being dropped, developers were complaining, and Linus was getting tired. After convincing himself that the tool had reached a point where it could do what he needed, Linus decided to give BitKeeper a try. There was no looking back.
Some of the development process issues could have been addressed by adopting any source code management system. But BitKeeper brought more than that; it established a model where there is no central repository. Instead, each developer could maintain one or more fully independent trees. When the time came, patches of interest could be "pulled" from one tree to another while retaining the full revision history. Rather than send patches in countless email messages - often multiple times - developers could simply request a pull from their BitKeeper trees. Meanwhile, the current development trees could be pulled automatically into the -mm kernel, allowing patches to be tested by a wider audience prior to merging into the mainline. BitKeeper enabled a work method and patch flow which naturally supported the kernel's development model.
Once the developers and the tools got up to speed, kernel development took off like never before. The rate at which patches were merged skyrocketed, the developers were happy, and the whole system ran smoothly. The public version of Linus's BitKeeper repository (and the repositories of many other developers) made the development process more transparent than ever. Anybody could look to see the up-to-the-minute state of the kernel and how it got there. Larry was right: with the right tools, Linus really could scale.
The only problem was that BitKeeper is proprietary software. Instead, it came (in binary-only form) with a license which allowed free use, but which imposed some significant restrictions. The free version of BitKeeper could only be used with open source projects; users could be required to make their repositories available on demand. The free version posted all changelog information on openlogging.org, and disabling the logging was not allowed. Users were required to upgrade to new versions, which could come with different licenses. And users were not only prohibited from reverse engineering the software, but they were prohibited from working on any sort of source code management system at all.
Larry wanted to have his cake and eat it too. He truly wanted to support the development of free software - as long as that software didn't threaten his own particular business niche. Supporting the kernel development cost real money - and supporting the business which created BitKeeper cost even more. Whenever BitMover felt that its business model was threatened, it responded; often the BitKeeper licensing terms were changed in response to perceived threats - to the point that the BitKeeper license became known in some circles as the "don't piss off Larry license."
Well, somebody pissed off Larry one time too many. The final straw, it seems, was a certain high-profile developer who refused to stop reverse engineering work while simultaneously doing some work for OSDL. BitMover is now withdrawing support for the free version of BitKeeper, and Linus has ceased to use it. BitKeeper is no longer the source code management system for the kernel. Proprietary software can be good stuff, but it always carries this threat: you never really know if it will be there for you tomorrow or not. BitMover has decided that it can no longer afford to make BitKeeper available for the free software community.
BitMover has issued a press release on this change:
The open source client, incidentally, enables the extraction of the current
version from a repository, but does little else.
The PR also states that "Our relationship with the Open Source
community has been evolving and many of the key developers have already
migrated to the commercial version of BitKeeper.
" Linus has,
however, made it clear that he is not one
of those "key developers":
What happens next is far from clear. The kernel developers will not go back to the previous way of doing things - no source code management system at all. Even the developers who can continue to use BitKeeper are unlikely to continue doing so if Linus is unable to pull their patches. So a replacement will have to be found. It is not clear that any of the free alternatives is up to the task of handling a project as large as the kernel. One of them may end up growing up in a hurry in order to take the load. Thanks partly to the example and motivation provided by BitKeeper, the free alternatives do look far more viable than they did three years ago, when Linus first started using BitKeeper.
Larry has made it clear that he blames the free software community for this turn of events:
If BitKeeper users were violating the license under which they received the software, they have indeed done something wrong. Every time we release code under a free license, we do so with the expectation that the terms of that license will be respected. To treat somebody else's license with less respect is hypocritical; if the license terms are not acceptable, do not use the software. That said, one could note a couple of other things. The notion that developers of proprietary software do not engage in reverse engineering - that it's "an open source community problem" - is debatable at best. And how, exactly, might the community be expected to do this sort of "policing"?
The ironic result of all this is likely to be the accelerated development of exactly what Larry claims to most fear: a free source code management system that, while it lacks much of what makes BitKeeper great, is "good enough" for a large portion of the user base. As the BitKeeper developers found out, hosting the kernel project is an effective way to shake out scalability and usability problems. Whichever system ends up hosting the kernel can be expected to go through a period of rapid improvement.
BitMover did, in fact, get a few benefits from hosting the kernel, even if, in the company's view, the benefits do not come close to equaling the associated costs. BitKeeper is a more scalable and robust system as a result of the use it saw in that role. There were also substantial PR benefits; see, for example, this 2004 press release with nice quotes from David Miller and Linus Torvalds. There can be no doubt that working with the kernel has brought a great deal of visibility to BitKeeper, and that must have resulted in some new business. The cynical among us might conclude (and some already have concluded) that BitMover simply decided that it had obtained most of the benefits it was going to get from hosting the kernel and decided to move on.
Whether or not that is the case, it cannot be doubted that Linux, too, has benefited strongly from its association with BitKeeper. We would not have a 2.6 kernel with anything near its level of capability, scalability, and robustness without the role played by BitKeeper. One could easily argue that the free source code management systems would not be as good as they are had BitKeeper not come along. BitKeeper was a gift to the community that was well worth accepting; now that it is gone, the best thing to do is to say "thanks" (with sincerity!) and figure out what comes next.
Ubuntu and UserLinux
Despite what you may have heard on Slashdot, UserLinux and Ubuntu aren't going to be merging anytime soon. A few weeks ago, Ubuntu's Jeff Waugh invited the UserLinux project to "collaborate with Ubuntu to build the finest platform and community for FOSS service providers". This was after a discussion about the problems of trying to build UserLinux around Debian when it's taking a long time for a new stable release from Debian. Waugh's invitation generated a fair amount of additional discussion on the UserLinux list, but little comment from UserLinux founder Bruce Perens. It's become clear that Ubuntu and UserLinux will remain separate for the foreseeable future, but we decided to check in with Perens to see what he had to say about the whole thing.
Perens was quick to note that he supports Ubuntu, but doesn't think that Ubuntu's corporate-sponsored model is the way to go for UserLinux.
In addition, Perens said that Debian's development process allows anyone to
become a developer and run for Project Leader or hold another Debian
office, which doesn't exist in other projects. "You can be part of
Ubuntu's community or Fedora's community, you don't get the chance to be
boss
".
Shortly after Waugh's invitation on the UserLinux mailing list, some of the
Ubuntu team created experimental
UserLinux packages for Ubuntu. The metapackages would allow creating
"a sort of Ubuntu-flavored UserLinux
". Unfortunately, the
packages were also in violation of the UserLinux trademark policy. When we
asked about the situation, Perens noted the importance of having a
trademark policy, given the abuse of the Debian trademark "in various
ways
" and that "the UserLinux guys get to say what can be
called UserLinux when they do their version of the Debian release.
"
He also said he didn't have a problem with labeling the packages "ul" or
something similar to distinguish them from official UserLinux packages. It
would appear, after a bit of friction, that the projects are
sorting
out the trademark issue so Ubuntu can include the metapackages.
But the Ubuntu effort highlights the problem of perceived inertia for UserLinux. Perens announced UserLinux in December of 2003. There was a great deal of interest in the idea at the time, but the wait for a Debian release has certainly had an impact on the momentum of the project.
Perens conceded that there was a perception among the Linux community that UserLinux had stagnated, but said that the perception can be overcome.
As far as UserLinux, I think what I would like to do, is once Debian has made a release, have our roster of support companies ready to support it, and to just start giving these things out at Linux-related business events and saying 'here's a system with full support, here's a price sheet, and we're going to give you a lower cost of ownership than Linux. We're going to beat other Linux distributions on TCO and we're going to give you more control because, more than Fedora, more than Ubuntu, you get a chance to determine exactly how the system is built, because it's tracking what the Debian organization does, it's not a Debian variant.
Perens also told LWN that the best way for someone to help with UserLinux is to be involved with Debian.
To outsiders, however, it may appear at times that the Debian Project is too
mired in political disagreements and flame wars to actually get anything
done -- which is a significant objection to wanting to be involved with
Debian. Perens said that there is a need to convince "a significant
portion of 1,000 active developers that your policy is right
" when
working with Debian, but "that in itself is a quality assurance
process
".
Perens said he was "heartened
" by the recent announcement that
Debian will soon be doing a release, and that "when Debian wants to
get off the dime, they can
". He also said that the Debian developers
have been "pretty embarrassed by the long delay of the
release
" and have bit the bullet to get it out the door. He also
predicted that the next Debian release after Sarge will be scheduled, and
it will be kept on schedule.
It will be interesting to see what happens after Debian Sarge is released,
and whether the UserLinux project can succeed as distribution for
"businesses of all sizes
".
GCJ - past, present, and future
GCJ (the GNU Compiler for the java programming language) is part of GCC (the GNU Compiler Collection) and provides a compiler, runtime environment, core libraries and tools for the Java language - it's an object oriented, strongly typed, garbage collected programming framework with a rich core library. GCJ is modeled after, and is a free replacement for, the proprietary Java platform. But like GNU is Not Unix, GCJ is not Java.The traditional Java platform is clearly not an ideal system, especially when combined with the traditional GNU system, but it is not too bad. The essential features seem to be good ones. Lots of Free Software is already written in the Java programming language so a free system compatible with the Java platform would be convenient for many hackers. GCJ is an extension of GCC and facilitates integration with other languages supported by GCC. GCJ 4, part of GCC 4.0, adds more features to easily integrate programs written using the GCJ development environment with the rest of the GNU platform while being even more compatible with the traditional Java platforms then previous releases. GCC 4.0 is scheduled to be released around April 15.
GCJ design history
Originally GCJ was designed as a “radically traditional” compiler for the Java programming language. It is an AOT (Ahead Of Time) compiler which automatically uses every GCC optimization available during compile time for a given architecture and produces binaries or (shared) libraries for the given platform. These programs run at full native speed without needing any interpreter or JIT (Just In Time) compilation. GCC is available for a large number of architectures and platforms so compiling directly to native code using the GCC back-ends makes programs written with GCJ much more portable then the traditional (proprietary) Java platform. This radically traditional approach makes all normal GNU tools like GDB available to the programmer writing code in the Java programming language just like when programming in any other language supported by GCC.
Thereafter, support for generating and interpreting byte code .class and .jar files was added. This made GCJ more compatible with traditional applications written in the Java programming language that are compiled to byte code. GCJ can be used in various modes:
- Compile and link .java source files to binaries, .o or .so files.
- Compile and link .class or .jar byte code files to binary.
- Compile .java source files to .class byte code
files (
gcj -C). - Interpret .class or .jar byte code files during
runtime (
gij).
The byte code interpreter is included as part of the standard runtime libgcj and can be used by programs to switch between interpreting byte code and executing natively compiled code on demand. So not all of the program has to be completely interpreted or completely compiled ahead of time at the same time.
To facilitate integration with code written in other languages, GCJ defines the CNI (Compiled Native Interface). CNI makes it easy to mix and match code and classes written in C, C++ and Java by allowing you to write some methods of a class in C++ and to catch and throw exceptions directly to and from parts of the program written in different languages. GCJ also support the more traditional JNI (Java Native Interface) for using code written in C from your programs.
Anthony Green posted the original design document for GCJ from 1998.
Drawbacks of the GCJ 3.x approach
GCJ 3.x provides a good “better than Java” development environment that allows tight integration with the rest of the GNU platform. But it has disappointed some traditional Java programmers. The possibility to mix and match native code with byte code in the compiler and libgcj runtime makes GCJ very flexible. But falling back to interpreting byte code doesn't really take full advantage of the whole “radically traditional” approach. Especially programs using advanced byte code based class loader tricks used to work slowly because they fell back to using the interpreter during runtime.
There are GCJ extensions to add support for using natively compiled
code all the time. But programs had to be adapted to use these
extensions. Instead of using .jar files containing byte code
definitions of new classes programs would have to use a new URL scheme
(gcjlib:) for their URLClassLoader uses. The
first “Fast Free
Eclipse” port to GCJ was done this way. The source code of
the plugin loading mechanism was adapted to search for natively
compiled plugins in shared library .so files besides ordinary
.jar byte code files. There was even a moderately popular
project, rhug, that
maintained a lot of patched versions of traditional free Java programs
that were adapted to gcj's view of the world. But these
patches were almost never adopted upstream and the maintenance of
these forks took a lot of time. So the benefits of the GCJ approach
were only seen by programs written explicitly for it, but not by
traditional Java programs.
One of the main goals of the GCJ 4 effort was to bring all the advantages of the “radically traditional” approach to any program written in the Java programming language without needing any application-level changes.
GCJ 4 enhancements
Probably the most visible enhancement of GCJ 4 comes from merging the libgcj runtime with the GNU Classpath core class library project. By collaborating with other free runtimes like the traditional kaffe environment and around 20 other projects, GCJ 4 is able to offer a core class library comparable to JDK 1.3 or 1.4. The collaboration of all these projects on a common core library implementation means that a lot of the libraries needed by applications, except for advanced Swing, Corba and sound usage, are available out of the box. Kaffe, for example, is being used by the Apache project to track the build of most of the jakarta projects using their Gump auto-builder.
The other big change is the addition of the -findirect-dispatch switch to the compiler. Using that option causes GCJ to generate native code for classes and methods that follow the precise same binary compatibility rules as described in the Java Language Specification. This means that native compiled code can now be used everywhere, even in the most tricky class loader situations, where previously the program would fall back to interpreted byte code. At the 2004 GCC Summit Tom Tromey and Andrew Haley described this new binary compatibility ABI for GCJ in more detail.
The new binary compatibility (BC) ABI makes it possible to transparently
compile programs to native code using gcj
-findirect-dispatch without having to change the application
source code or even the build process. To map byte code to GCJ
compiled native code, GCJ 4 introduces gcj-dbtool. This tool
is used by the packager during deployment of the application or
library to create a database mapping the bytecode of a class to the
native code during runtime. Programs can use different databases
using the gnu.gcj.precompiled.db.path system property. The
databases make it possible to create a cache of all native compiled
code that can be shared by different programs installed on the system.
The How
to BC compile with GCJ GCC wiki page has examples.
This approach is used by the native Eclipse packages in Fedora Core 4. No changes to the eclipse code base are necessary anymore and, after the project is bootstrapped, all resulting .jar files are BC compiled. To almost completely automate this process, Thomas Fitzsimmons created java-gcj-compat. A collection of wrapper scripts, symlinks and jar files that provide a Java-SDK-like interface to this new GCJ 4 tool set.
Future plans
The -findirect-dispatch switch can currently only be used for byte code and not in combination with CNI (JNI is already supported). This limitation currently prevents parts of the core class libraries from being BC compiled. Lifting this restricting will facilitate more integration with GNU Classpath.
With GCJ and GDB a programmer can step through native C, C++ and Java source code using the same tool. Traditional Java developers are more used to JDWP (Java Debugging Wire Protocol) for debugging their applications. Eclipse comes with built-in support for JDWP. Work is in progress to provide JDWP debugging support for the different execution mechanisms. This code will also be shared with the GNU Classpath project.
Benchmarks show that GCJ is comparable (sometimes faster, sometimes slower) to traditional execution mechanisms for Java programs. But GCJ currently doesn't really take advantage of the new GCC 4.0 Tree SSA optimizer framework. For 4.1 the GCJ developers hope to add a couple of GCJ specific optimizations.
Tom Tromey is currently working on GCJX, a new GCC frontend that will include support for the new 1.5 language additions, such as generics. And the GNU Classpath project has a separate branch for the core class libraries that depend on the new 1.5 language additions.
Escaping the Java Trap
GCJ 4 is the result of seven years of work by a large and active community of Free Software hackers. This new version is complete enough to replace most interesting uses of the proprietary Java platform. It adds a whole new set of core libraries and adds some new features to help integration with the rest of the GNU platform. Upcoming versions of some GNU/Linux distributions will use GCJ 4 to provide much more Java-based Free Software, including Eclipse, Jonas, OpenOffice.org 2, Tomcat and the Jakarta libraries. There is also a great deal of free software to integrate with traditional GNU/Linux distributions provided by the JPackage project. Both Debian and Fedora are working with the jpackage hackers to support more of these packages “out of the box”.
All this doesn't mean that we have escaped the Java trap yet. As pointed out by Richard Stallman in “Free But Shackled - The Java Trap” we have to actively work together to keep code safe and free. It looks like the main target projects for GCJ 4 (Apache Jakarta, Eclipse and OpenOffice.org 2), have all reacted positively to the feedback and patches provided to support free alternatives to the Java platform. The fact that the changes requested were for making the projects more portable ("don't use undocumented com.sun internal classes") rather than requests to dramatically change the code, (core) libraries used or build infrastructure has helped a lot. But the above projects were already free software projects at heart. It remains to be seen if other more traditional java projects will adapt so easily to support GCJ 4 out of the box.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Popup blocking; New vulnerabilities in gaim, kernel, sylpheed, wu-ftpd, ...
- Kernel: A new semaphore type; The boundaries for stable kernel patches; Dealing with binary firmware.
- Distributions: Changes at Mandrakesoft; Trustix Secure Linux 3.0 alpha; FedoraForum.org; 64 Studio
- Development: The DSpace Digital Repository System, new versions of Gentle.NET, Hobbit Monitor, pkpgcounter, ACollab, DocBookWiki, mnoGoSearch, Glame, BRL-CAD, PLplot, gEDA Suite, SQL-Ledger, Cyphesis, ClearHealth, Furthur, AbiWord, Perl, PHP.
- Press: Free Software in Brazil, Enlightenment DR17, No More Free BitKeeper, Latin America Install Fest, Sun criticizes GPL, de Icaza and Stallman interviews, Live CDs and security.
- Announcements: Mandrakesoft acquires Conectiva, anti-Linux study, FUDCon2 at LinuxTag, Lisp conferences, Samba eXPerience.
