August 9, 2006
This article was contributed by Stacey Quandt
Google
used the recent O'Reilly
Open Source Convention (OSCON) to announce that it is launching a
project hosting service. The two
primary features of the this service are Subversion hosting,
and a brand new take on managing bug reports.
Google has seven Subversion
developers on staff who are building a new storage back-end for
Subversion to store data in a "Bigtable." A Bigtable is a system for storing
and managing very large amounts of structured data. The system is designed
to manage several petabytes of data distributed across thousands of
machines, with very high update and read request rates coming from
thousands of simultaneous clients. This architecture allows Google to scale
Subversion up to the meet the demands of storage and concurrency it
believes will be needed to serve its members. According to Google's Greg Stein,
“The existing two back-ends for Subversion (Berkeley DB and flat files) just do
not have the capability to scale to our needs. The Bigtable system also
gives us things like failover, monitoring, and performance tuning
capabilities that are not present in the standard Subversion
back-ends.” More information on Google's version of Subversion
can be found on the FAQ.
When asked if Google intends to contribute its Bigtable code back to
Subversion, Greg Stein responds: “We're certainly not opposed to the
concept, but the devil is in the details.” The issue is that the code
that interacts directly with Bigtable cannot be contributed back to the
Subversion project since Google has no plans to publish the source code to
Bigtable at this time. Stein explains, “We have made a number of
changes in the functional tests, and a couple higher level libraries that
we are going to contribute back.” However, source code changes that
are highly specific to Google's environment will not be contributed back to
the Subversion project because as Stein says, “It would not make
sense...[since]... those changes would needlessly pollute the code base
with no measurable benefit for others.” In essence Stein isn't
opposed to contributing source code back to the community and stresses that
“We've got to figure out what the best line is that helps the public
code base".
One potential solution is to publish a non-working copy of the back-end
database simply to see if there is some interest in the open source
community for reviewing Google's model. Stein says: “The lessons
learned and control/data flow patterns might be helpful for
other, future back-ends.” Since Google started work on a version
of Subversion that could be integrated with Google's technology “We
have been heads-down getting the service built and delivered to the
public”, claims Stein. He further states “We have much more
work that we want to do, but it may be time for a breather to review what
we've done and figure out the best options to get some pieces
published.”
Google's ability to contribute the source code for its issue tracker back
to the open source community falls under constraints similar to those it faces
with Subversion. Stein explains, “When you subtract the Bigtable
code, the search technology, and a few of the other
proprietary pieces, then there is actually very little left.”
Stein asserts Google has talked about this right from the start. In the
event that someone should want to replicate Google's issue tracker Stein,
says, “We'd happily consult with that community about what we've
done.
There may be a couple pieces we can provide (under the Apache license).”
As for the architecture of the issue tracker, Google disregarded the idea
of a heavily structured database and replaced it with a free-form system
based on Google's search technology. Issues can be arbitrarily labeled to
note version information, operating system, milestones, priority or other
project specific information. Users can query across all of the
descriptions, comments, and labels to find the relevant issues. Advanced
search allows a user to search just the labels or just the status of an
issue. On top of this new model for storing and querying issues, Google
built an Ajax-based
interface to make it very easy for users to interact with. Issues are
listed in a standard list format but users can perform basic changes to the
user interface including adjusting the columns and sorting.
Google has also made it simpler to submit a bug report. Stein says,
“Today a user is typically faced with a crazy set of drop-downs and
fields covering everything from priority, to software components, to
the target milestones.” Stein asks the logical question: “How
is the user supposed to know any of this? They just wanted to use that
screaming mp3 server, and have no idea whether the affected component is
Foo or Bar.” Google addresses this potential problem by only
requiring the user to specify a summary and description. The user can also
optionally attach files and an optional indication that they want updates as
developers work on the bug report. Project developers can add,
remove, or alter labels, assign owners, change the status to an existing
bug report, and, when they are creating a new issue to be tracked in
Google's issue tracker, they can add these labels as part of creating the
bug report.
Stein claims, “Most open source groups don't require the heavy
structure or workflow that is present in today's issue trackers.”
Still Stein concedes that there are some large groups that do need these features, but they
are typically in the minority. By focusing on the majority's needs, Google's
take on bug reports could turn out to be beneficial for the open source
community.
Google's Project Hosting enters a crowded space with alternative
services from not only Sourceforge.net but also Savannah and Debian's Alioth, among others. This leads to the
question of how easy is it to import a project, or to export it and move it
somewhere else in the future. According to Stein, the answer is “Not
very easy”. This is because at present there is no way to upload or
download a Subversion dump file. Google engineers are working on both of
these efforts. Stein says, “For upload, we'll maybe do something in
combination with a file upload/download feature or rely on the revision of
Subversion 1.4's sync/reply feature when it is released and after we
upgrade the servers.”
Download is a different story. Google plans to make the dump file
available to project owners so they can always access their complete
information. Stein states, “We know how important it is to open
source groups to know that they are not locked into a hosting service.”
Google does not support the data export capability today but it does plan
on allowing for the export of all information. The import and export
functionality is not defined yet and Google plans to investigate using some
simple APIs for this. Stein voices some concern about this approach and
says: "I have a natural wariness with APIs. If you get them wrong then you
can paint yourself into a corner.”
A question on some peoples' minds is: will Google project hosting offer the
same services as Sourceforge? Google project hosting is similar to
Sourceforge in its goal to encourage open source projects and foster
productive open source communities. Aside from architectural considerations,
another difference between the two services is the new Google service will not include Web site
hosting and will initially target smaller projects.
Since Google has no plans to make it easy to
move a project from other hosting sites it appears that Sourceforge.net
does not have to worry about losing its share of current users.
Stein stresses: “Sourceforge is one the major cornerstones of the
open source community, and we have zero interest in damaging that
foundation.” It is clear that, while Stein recognizes that people may
develop tools on their own, especially once the Google project hosting
system has a better import system, but he says, “We have no
plans to be an instigator for that.” If you try to create a project
at Google Code using a name of a Sourceforge project then Google will stop
the process and note the conflict. An email will be be sent to the owner of
the Sourceforge project
requesting approval (or denying the project creation). Google wants
to prevent malicious impersonation or accidental name conflicts and worked
with Sourceforge to get a list of all hosted projects and email addresses
of the owners. Google is also working with other hosting sites such as tigris.org, java.net and Codehaus to avoid naming conflicts.
Google has set initial storage limits at 100 MB for Subversion, and 50
MB for issue attachments. Stein says, “These limits will be more than
enough for for open source projects, but we can individually adjust them
for valid projects.” The limits are designed to prevent spam or
abusive projects from inappropriately using Google's services to host
content which is unrelated to free software projects or not freely
redistributable.
The first step in getting started is creating a Gmail
account, which is required for project owners and members. Owners
have the ability to reconfigure projects, add/remove other
owners and members, and to manage basic metadata about the project. Members
can commit to the repository, and can change metadata on bug
reports. To file a bug report or issue a comment on one, a user only needs
a Google account with a verified email address.
A Google account can be associated with any email address; a
Gmail account is not required for this purpose. A valid email address is
required so that the project members can get in touch with the person
filing the bug report or in the event that further clarification is
required.
Google requires a Gmail account for project owners and members in an attempt
to obtain a higher certainty that they are not bots that could use the project
space for spam or other malicious purposes. Also the fact that all owners
and members use a Gmail account may also help Google in future
integration efforts.
It is clear that Google wants to participate in the free software
development process and provide a viable
alternative to other open source project repositories.
Less clear is whether Google hosting
is merely a goodwill exercise with the open source community or whether its
goal is to be a profit-making venture, either via advertising revenue or by
encouraging more Gmail usage. Regardless, Google's new offering will no
doubt be a useful service to open source developers and a challenge for
other hosting sites to improve the services offered to their users. As we
all know, competition is a good thing.
(
Log in to post comments)