April 6, 2005
This article was contributed by Mark Wielaard
GCJ (the GNU Compiler for the java programming language) is part of GCC
(the GNU Compiler Collection) and provides a compiler, runtime
environment, core libraries and tools for the Java language - it's an
object oriented, strongly
typed, garbage collected programming framework with a rich core
library. GCJ is modeled after, and is a free replacement for, the
proprietary Java platform. But like GNU is Not Unix, GCJ is not Java.
The traditional Java platform is clearly not an ideal system,
especially when combined with the traditional GNU system, but it is
not too bad. The essential features seem to be good ones. Lots of
Free Software is already written in the Java programming language so a
free system compatible with the Java platform would be convenient for
many hackers. GCJ is an extension of GCC and facilitates integration
with other languages supported by GCC. GCJ 4, part of GCC 4.0, adds
more features to easily integrate programs written using the GCJ
development environment with the rest of the GNU platform while being
even more compatible with the traditional Java platforms then previous
releases. GCC 4.0 is scheduled to be released around April 15.
GCJ design history
Originally GCJ was designed as a “radically
traditional” compiler for the Java programming language. It is an
AOT (Ahead Of Time) compiler which automatically uses every GCC
optimization available during compile time for a given architecture
and produces binaries or (shared) libraries for the given platform.
These programs run at full native speed without needing any
interpreter or JIT (Just In Time) compilation. GCC is available for a
large number of architectures and platforms so compiling directly to
native code using the GCC back-ends makes programs written with GCJ
much more portable then the traditional (proprietary) Java platform.
This radically traditional approach makes all normal GNU tools like
GDB available to the programmer writing code in the Java programming
language just like when programming in any other language supported by
GCC.
Thereafter,
support for generating and interpreting byte code .class and
.jar files was added. This made GCJ more compatible with
traditional applications written in the Java programming language
that are compiled to byte code. GCJ can be used in various modes:
- Compile and link .java source files to binaries,
.o or .so files.
- Compile and link .class or .jar byte code files
to binary.
- Compile .java source files to .class byte code
files (
gcj -C).
- Interpret .class or .jar byte code files during
runtime (
gij).
The byte code interpreter is included as part of the standard runtime
libgcj and can be used by programs to switch between
interpreting byte code and executing natively compiled code on demand.
So not all of the program has to be completely interpreted or
completely compiled ahead of time at the same time.
To facilitate integration with code written in other languages, GCJ
defines the CNI
(Compiled Native Interface). CNI makes it easy to mix and match
code and classes written in C, C++ and Java by allowing you to write
some methods of a class in C++ and to catch and throw exceptions
directly to and from parts of the program written in different
languages. GCJ also support the more traditional JNI (Java Native
Interface) for using code written in C from your programs.
Anthony Green posted the
original design
document for GCJ from 1998.
Drawbacks of the GCJ 3.x approach
GCJ 3.x provides a good “better than Java” development
environment that allows tight integration with the rest of the GNU
platform. But it has disappointed some traditional Java programmers.
The possibility to mix and match native code with byte code in the
compiler and libgcj runtime makes GCJ very flexible. But falling back
to interpreting byte code doesn't really take full advantage of the
whole “radically traditional” approach. Especially
programs using advanced byte code based class loader tricks used to
work slowly because they fell back to using the interpreter during
runtime.
There are GCJ extensions to add support for using natively compiled
code all the time. But programs had to be adapted to use these
extensions. Instead of using .jar files containing byte code
definitions of new classes programs would have to use a new URL scheme
(gcjlib:) for their URLClassLoader uses. The
first “Fast Free
Eclipse” port to GCJ was done this way. The source code of
the plugin loading mechanism was adapted to search for natively
compiled plugins in shared library .so files besides ordinary
.jar byte code files. There was even a moderately popular
project, rhug, that
maintained a lot of patched versions of traditional free Java programs
that were adapted to gcj's view of the world. But these
patches were almost never adopted upstream and the maintenance of
these forks took a lot of time. So the benefits of the GCJ approach
were only seen by programs written explicitly for it, but not by
traditional Java programs.
One of the main goals of the GCJ 4 effort was to bring all the
advantages of the “radically traditional” approach to any
program written in the Java programming language without needing any
application-level changes.
GCJ 4 enhancements
Probably the most visible enhancement of GCJ 4 comes from merging the
libgcj runtime with the GNU Classpath core
class library project. By collaborating with other free runtimes like
the traditional kaffe environment and around 20
other projects, GCJ 4 is able to offer a core class library comparable to
JDK 1.3 or 1.4. The collaboration of all these projects on a common
core library implementation means that a lot of the libraries needed by
applications, except for advanced Swing, Corba and sound usage, are
available out of the box. Kaffe, for example, is being used by the
Apache project to track the build of most of the jakarta projects
using their Gump auto-builder.
The other big change is the addition of the
-findirect-dispatch switch to the compiler. Using that
option causes GCJ to generate native code for classes and methods that follow
the precise same binary compatibility rules as described in the Java
Language Specification. This means that native compiled code can now
be used everywhere, even in the most tricky class loader situations,
where previously the program would fall back to interpreted byte code.
At the 2004 GCC Summit Tom Tromey and Andrew Haley described this
new
binary compatibility ABI for GCJ in more detail.
The new binary compatibility (BC) ABI makes it possible to transparently
compile programs to native code using gcj
-findirect-dispatch without having to change the application
source code or even the build process. To map byte code to GCJ
compiled native code, GCJ 4 introduces gcj-dbtool. This tool
is used by the packager during deployment of the application or
library to create a database mapping the bytecode of a class to the
native code during runtime. Programs can use different databases
using the gnu.gcj.precompiled.db.path system property. The
databases make it possible to create a cache of all native compiled
code that can be shared by different programs installed on the system.
The How
to BC compile with GCJ GCC wiki page has examples.
This approach is used by the native Eclipse packages in Fedora Core 4.
No changes to the eclipse code base are necessary anymore and, after
the project is bootstrapped, all resulting .jar files are BC
compiled. To almost completely automate this process, Thomas Fitzsimmons
created java-gcj-compat.
A collection of wrapper scripts, symlinks and jar files that provide a
Java-SDK-like interface to this new GCJ 4 tool set.
Future plans
The -findirect-dispatch switch can currently only be used for byte code
and not in combination with CNI (JNI is already supported). This limitation
currently prevents parts of the core class libraries from being BC
compiled. Lifting this restricting will facilitate more integration
with GNU Classpath.
With GCJ and GDB a programmer can step through native C, C++ and Java
source code using the same tool. Traditional Java developers are more
used to JDWP (Java Debugging Wire Protocol) for debugging their
applications. Eclipse comes with built-in support for JDWP. Work is
in progress to provide JDWP debugging support for the different
execution mechanisms. This code will also be shared with the GNU
Classpath project.
Benchmarks show that GCJ
is comparable (sometimes faster, sometimes slower) to traditional
execution mechanisms for Java programs. But GCJ currently doesn't
really take advantage of the new GCC 4.0 Tree SSA optimizer
framework. For 4.1 the GCJ developers hope to add a couple of GCJ
specific optimizations.
Tom Tromey is currently working on GCJX, a new GCC frontend that will
include support for the new 1.5 language additions, such as generics. And
the GNU Classpath project has a separate branch for the core class
libraries that depend on the new 1.5 language additions.
Escaping the Java Trap
GCJ 4 is the result of seven years of work by a large and active community of
Free Software hackers. This new version is complete enough to
replace most interesting uses of the proprietary Java platform. It
adds a whole new set of core libraries and adds some new features to
help integration with the rest of the GNU platform. Upcoming versions
of some GNU/Linux distributions will use GCJ
4 to provide much more Java-based Free Software, including Eclipse, Jonas,
OpenOffice.org 2, Tomcat and the Jakarta libraries.
There is also a great deal of free software to
integrate with traditional GNU/Linux distributions provided by the JPackage project. Both Debian and
Fedora are working with the jpackage hackers to support more
of these packages “out of the box”.
All this doesn't mean that we have escaped the Java trap yet. As
pointed out by Richard Stallman in “Free But Shackled
- The Java Trap” we have to actively work together to keep
code safe and free. It looks like the main target projects for GCJ 4
(Apache Jakarta, Eclipse and OpenOffice.org 2), have all reacted
positively to the feedback and patches provided to support free
alternatives to the Java platform.
The fact that the
changes requested were for making the projects more portable
("don't use undocumented com.sun internal classes")
rather than requests to dramatically change the code, (core) libraries
used or build infrastructure has helped a lot. But the above
projects were already free software projects at heart. It remains to
be seen if other more traditional java projects will adapt so easily
to support GCJ 4 out of the box.
(
Log in to post comments)