Google's project hosting service
Google used the recent O'Reilly Open Source Convention (OSCON) to announce that it is launching a project hosting service. The two primary features of the this service are Subversion hosting, and a brand new take on managing bug reports.
Google has seven Subversion developers on staff who are building a new storage back-end for Subversion to store data in a "Bigtable." A Bigtable is a system for storing and managing very large amounts of structured data. The system is designed to manage several petabytes of data distributed across thousands of machines, with very high update and read request rates coming from thousands of simultaneous clients. This architecture allows Google to scale Subversion up to the meet the demands of storage and concurrency it believes will be needed to serve its members. According to Google's Greg Stein, “The existing two back-ends for Subversion (Berkeley DB and flat files) just do not have the capability to scale to our needs. The Bigtable system also gives us things like failover, monitoring, and performance tuning capabilities that are not present in the standard Subversion back-ends.” More information on Google's version of Subversion can be found on the FAQ.
When asked if Google intends to contribute its Bigtable code back to Subversion, Greg Stein responds: “We're certainly not opposed to the concept, but the devil is in the details.” The issue is that the code that interacts directly with Bigtable cannot be contributed back to the Subversion project since Google has no plans to publish the source code to Bigtable at this time. Stein explains, “We have made a number of changes in the functional tests, and a couple higher level libraries that we are going to contribute back.” However, source code changes that are highly specific to Google's environment will not be contributed back to the Subversion project because as Stein says, “It would not make sense...[since]... those changes would needlessly pollute the code base with no measurable benefit for others.” In essence Stein isn't opposed to contributing source code back to the community and stresses that “We've got to figure out what the best line is that helps the public code base".
One potential solution is to publish a non-working copy of the back-end database simply to see if there is some interest in the open source community for reviewing Google's model. Stein says: “The lessons learned and control/data flow patterns might be helpful for other, future back-ends.” Since Google started work on a version of Subversion that could be integrated with Google's technology “We have been heads-down getting the service built and delivered to the public”, claims Stein. He further states “We have much more work that we want to do, but it may be time for a breather to review what we've done and figure out the best options to get some pieces published.”
Google's ability to contribute the source code for its issue tracker back to the open source community falls under constraints similar to those it faces with Subversion. Stein explains, “When you subtract the Bigtable code, the search technology, and a few of the other proprietary pieces, then there is actually very little left.” Stein asserts Google has talked about this right from the start. In the event that someone should want to replicate Google's issue tracker Stein, says, “We'd happily consult with that community about what we've done. There may be a couple pieces we can provide (under the Apache license).”
As for the architecture of the issue tracker, Google disregarded the idea of a heavily structured database and replaced it with a free-form system based on Google's search technology. Issues can be arbitrarily labeled to note version information, operating system, milestones, priority or other project specific information. Users can query across all of the descriptions, comments, and labels to find the relevant issues. Advanced search allows a user to search just the labels or just the status of an issue. On top of this new model for storing and querying issues, Google built an Ajax-based interface to make it very easy for users to interact with. Issues are listed in a standard list format but users can perform basic changes to the user interface including adjusting the columns and sorting.
Google has also made it simpler to submit a bug report. Stein says, “Today a user is typically faced with a crazy set of drop-downs and fields covering everything from priority, to software components, to the target milestones.” Stein asks the logical question: “How is the user supposed to know any of this? They just wanted to use that screaming mp3 server, and have no idea whether the affected component is Foo or Bar.” Google addresses this potential problem by only requiring the user to specify a summary and description. The user can also optionally attach files and an optional indication that they want updates as developers work on the bug report. Project developers can add, remove, or alter labels, assign owners, change the status to an existing bug report, and, when they are creating a new issue to be tracked in Google's issue tracker, they can add these labels as part of creating the bug report.
Stein claims, “Most open source groups don't require the heavy structure or workflow that is present in today's issue trackers.” Still Stein concedes that there are some large groups that do need these features, but they are typically in the minority. By focusing on the majority's needs, Google's take on bug reports could turn out to be beneficial for the open source community.
Google's Project Hosting enters a crowded space with alternative services from not only Sourceforge.net but also Savannah and Debian's Alioth, among others. This leads to the question of how easy is it to import a project, or to export it and move it somewhere else in the future. According to Stein, the answer is “Not very easy”. This is because at present there is no way to upload or download a Subversion dump file. Google engineers are working on both of these efforts. Stein says, “For upload, we'll maybe do something in combination with a file upload/download feature or rely on the revision of Subversion 1.4's sync/reply feature when it is released and after we upgrade the servers.”
Download is a different story. Google plans to make the dump file available to project owners so they can always access their complete information. Stein states, “We know how important it is to open source groups to know that they are not locked into a hosting service.” Google does not support the data export capability today but it does plan on allowing for the export of all information. The import and export functionality is not defined yet and Google plans to investigate using some simple APIs for this. Stein voices some concern about this approach and says: "I have a natural wariness with APIs. If you get them wrong then you can paint yourself into a corner.”
A question on some peoples' minds is: will Google project hosting offer the same services as Sourceforge? Google project hosting is similar to Sourceforge in its goal to encourage open source projects and foster productive open source communities. Aside from architectural considerations, another difference between the two services is the new Google service will not include Web site hosting and will initially target smaller projects. Since Google has no plans to make it easy to move a project from other hosting sites it appears that Sourceforge.net does not have to worry about losing its share of current users.
Stein stresses: “Sourceforge is one the major cornerstones of the open source community, and we have zero interest in damaging that foundation.” It is clear that, while Stein recognizes that people may develop tools on their own, especially once the Google project hosting system has a better import system, but he says, “We have no plans to be an instigator for that.” If you try to create a project at Google Code using a name of a Sourceforge project then Google will stop the process and note the conflict. An email will be be sent to the owner of the Sourceforge project requesting approval (or denying the project creation). Google wants to prevent malicious impersonation or accidental name conflicts and worked with Sourceforge to get a list of all hosted projects and email addresses of the owners. Google is also working with other hosting sites such as tigris.org, java.net and Codehaus to avoid naming conflicts.
Google has set initial storage limits at 100 MB for Subversion, and 50 MB for issue attachments. Stein says, “These limits will be more than enough for for open source projects, but we can individually adjust them for valid projects.” The limits are designed to prevent spam or abusive projects from inappropriately using Google's services to host content which is unrelated to free software projects or not freely redistributable.
The first step in getting started is creating a Gmail account, which is required for project owners and members. Owners have the ability to reconfigure projects, add/remove other owners and members, and to manage basic metadata about the project. Members can commit to the repository, and can change metadata on bug reports. To file a bug report or issue a comment on one, a user only needs a Google account with a verified email address. A Google account can be associated with any email address; a Gmail account is not required for this purpose. A valid email address is required so that the project members can get in touch with the person filing the bug report or in the event that further clarification is required.
Google requires a Gmail account for project owners and members in an attempt to obtain a higher certainty that they are not bots that could use the project space for spam or other malicious purposes. Also the fact that all owners and members use a Gmail account may also help Google in future integration efforts.
It is clear that Google wants to participate in the free software development process and provide a viable alternative to other open source project repositories. Less clear is whether Google hosting is merely a goodwill exercise with the open source community or whether its goal is to be a profit-making venture, either via advertising revenue or by encouraging more Gmail usage. Regardless, Google's new offering will no doubt be a useful service to open source developers and a challenge for other hosting sites to improve the services offered to their users. As we all know, competition is a good thing.
| Index entries for this article | |
|---|---|
| GuestArticles | Quandt, Stacey |
