LWN.net Logo

Development

Looking in on Apache OpenOffice.org

By Jake Edge
June 29, 2011

In the few weeks since Apache accepted OpenOffice.org (OOo) as an incubator project, the project has gotten off to a fast start, at least from a planning perspective. It is an interesting case study in adapting an existing project into the infrastructure and governance of an entirely different organization. That leads to some technical and organizational challenges that are being addressed, but it remains to be seen how well OOo will fit under the Apache umbrella. Apache is an excellent organization, and likely a good home for OpenOffice.org developers who are not concerned about copyleft, but it hasn't ever really absorbed a codebase and community of this size before.

The biggest hurdle may be in the area of tools, in the form of adopting (and adapting to) the Subversion (svn) version control system. There is nothing wrong with svn per se, but it may not be the right tool for this particular job. The existing OOo repositories are stored in Mercurial (hg), which is a distributed version control system (DVCS), so the existing developers clearly saw a benefit to DVCS-style development. LibreOffice (LO), on the other hand, has adopted Git, which also has the distributed features many projects are finding useful. But Apache projects must use Apache infrastructure, which requires svn, at least for now.

The problems of converting the existing repositories from hg to svn are certainly solvable, but the bigger problems may arise in trying to coordinate development with a widely distributed contributor base. There are, undoubtedly, ways to make that work (and various Apache projects already do), but it seems a bit odd to choose a VCS based on the required infrastructure, rather than the other way around. In the end, a decision made based on the licensing that Oracle and IBM were comfortable with trickles down into the development style of the project.

None of that is to say that svn will necessarily be a barrier, or that the Apache infrastructure is inadequate to the task. One would guess that any problems there will be worked around one way or another. But community-led projects typically make their decisions rather differently.

There are lots of discussions going on in the new incubator ooo-dev mailing list that clearly indicate a community that is bootstrapping itself. That list has well over 1000 posts in just the two weeks or so that it has existed. It's an interesting transition to watch, at least partly because we haven't seen anything like it in the free software world. There is a huge existing infrastructure (web sites, build farms, code repositories, etc.) to support OOo that currently exists on Oracle's servers, so moving that to its new home is a major part of the transition.

IBM's Rob Weir has been coordinating much of the work that is being done on the Apache project. In addition, he has put out a pre-proposal for an ODF Toolkit project, which would include several, mostly Java-based, tools for manipulating Open Document Format (ODF) files. Whether that ends up as a top-level Apache project or gets added into an existing Apache project (perhaps POI or OOo itself) is an open question, but it's clear that there is more than just OOo that Oracle and IBM would like to put under the Apache banner.

Governance issues are also being discussed as a corporate-controlled (and dominated) project transitions to the community-oriented, meritocratic style that is expected of Apache projects. There is a project management committee (PMC) to set up; in this case it's a podling PMC (PPMC) as OOo is still in the incubator (and thus a "podling"). The list of project committers also need to be formalized out of the list of initial committers that was gathered during the incubation proposal. These are the people who will have commit access to the repositories, and additional committers will be added to the project as their work merits it.

There are also format questions, in terms of both documentation for OOo itself and for project planning purposes. For a document suite, creating documentation using its formats makes a great deal of sense, but it isn't necessarily the format that developers are used to. There is a bit of a clash between the text-only and "rich text" (i.e. ODF) worlds. Like many projects, Apache typically uses diffs to review patches, but that is somewhere between difficult and impossible to do with ODF. It would seem that a consensus is arising that planning documents will be text-oriented and put into the wiki, while product documentation will continue to use ODF and the ODFAuthors infrastructure.

There is also the minor task of starting to do builds of the code, and trying to ferret out any missing pieces, along with replacing dependencies that are not available under the Apache license. There are still some lingering questions about whether the grant from Oracle actually contains all of the necessary files—though it is believed that is only a procedural hurdle. OOo 3.4 had a beta release recently, and there are thoughts that the release should be made using the Oracle servers, before making the big switch over to Apache. And so on.

Many more of these kinds of discussions are taking place, most of them very amicable, and problems are being worked out as they come up. Undoubtedly LibreOffice struggled with many of the same kinds of things as it came up to speed over the last eight months or so. In some sense, LO had two parent projects to digest, the Go-OO project (which was mostly a set of patches against OOo) as well as the OOo upstream code from Sun/Oracle. But LO did not have any existing structure like Apache that it needed to mesh with. Watching both projects develop over the next few years should prove interesting.

Comments (1 posted)

Brief items

Quotes of the week

It's amazing to me how people think of documentation as easy or an afterthought, but there's a huge difference between documentation written by someone coming up the learning curve and documentation written by someone who really knows it. I'd say well designed and engineered documentation is more important than well designed and engineered source code.
-- Brien Behlendorf

Sure, we could have labeled it 4.0.1 or some other fractional value that would have made some slashdot and ars readers happy, but that would be a lie to add-on and web developers because it's not a minor non-breaking change for them. With this versioning system, we are communicating honestly to the only people who will have a reason to care about versions -- developers, that we are making, or at least asserting that we may be making, breaking changes that they should care about.

Again, Firefox version numbers are not for consumers. Nowhere in our announcements of Firefox was it called anything but the latest version of Firefox. There will be no past versions of Firefox available to consumers so it's just plain "Firefox" and it gets better at regular 6 week intervals.

-- Asa Dotzler

1.4.0 has actually seen testing in the form of loading the module, enjoying a view of a non-crashing X server (-retro too, I'm soo 80s today...) and thus deducting that the driver is bug-free. Which is more testing than previous releases have seen. Nonetheless, you may not want to control your nuclear power plants with this driver.
-- Peter Hutterer

The RFC forgot to send an army with you, so it cannot expect to be obeyed. In the GNU Project, we do not obey standards -- we consider them, then DTRT. Often TRT is to do what the standard says. Sometimes TRT is something else.
-- Richard Stallman

Comments (12 posted)

Announcing Echoprint

The Echoprint project has announced its existence and initial release of code and data. "The Echo Nest has been focusing on a crucial component of the oncoming music cloud for some time: we spend a lot of time and engineering resources on music resolving. This extends from mapping a query for a band name to its ID, to uncovering mentions of songs on blogs, to identifying the song in an audio stream without any metadata - otherwise known as fingerprinting. The Echo Nest's existing fingerprint technology, 'The Echo Nest Musical Fingerprint' aka ENMFP, has been in wide use privately and via our API for 18 months. Today we are unveiling a new fingerprint technology called 'Echoprint,' whose main feature is its complete openness - everything from the program to analyze the audio to the server and data to make the match are available for anyone to use, under a permissive open source license, for free."

Comments (9 posted)

EuroPython Language Summit report

Mark Dickinson has put together a brief report from the EuroPython Language summit, held in Florence on June 19. Topics covered include Python 3 adoption, various open PEPs, the Linux kernel version number change, and more.

Full Story (comments: none)

Removing the PyPy global lock

The PyPy Status Blog has an article describing a plan to remove the global interpreter lock and switch to an transactional memory scheme. "During a transaction, we don't actually change the global memory at all. Instead, we use the thread-local transaction object. We store in it which objects we read from, which objects we write to, and what values we write. It is only when the transaction reaches its end that we attempt to 'commit' it. Committing might fail if other commits have occurred in between, creating inconsistencies; in that case, the transaction aborts and must restart from the beginning."

Comments (16 posted)

Rockbox 3.9

Version 3.9 of the Rockbox audio player firmware system has been released. Changes include a playback engine rework, a number of hardware-related improvements, support for antialiased fonts, and more; see the release notes for details.

Comments (none posted)

Newsletters and articles

Development newsletters from the last week

Comments (1 posted)

Applying Graph Analysis and Manipulation to Data Stores

Roberto V. Zicari talks with Marko Rodriguez and Peter Neubauer about the Tinkerpop project. "TinkerPop is an open-source graph software group. Currently, we provide a stack of technologies (called the TinkerPop stack) and members contribute to those aspects of the stack that align with their expertise. The stack starts just above the database layer (just above the graph persistence layer) and connects to various graph database vendors - e.g. Neo4j, OrientDB, DEX, RDF Sail triple/quad stores, etc."

Comments (none posted)

Zeuthen: Writing a C library, part 1

David Zeuthen has posted a set of guidelines for low-level library implementers. "Unless it's self-evident, all functions should have documentation explaining how parameters are managed. It is often a good idea to try to force some kind of consistency on the API. For example, in the GLib stack the general rule is that the caller owns parameters passed to a function (so the function need to take a reference or make a copy if the parameter is used after the function returns) and that the callee owns the returned parameters (so the caller needs to make a copy or increase the reference count) unless the function can be called from multiple threads (in which case the caller needs to free the returned object)."

Comments (108 posted)

Zeuthen: Writing a C library, part 2

David Zeuthen has posted the second installment in his series on best practices for low-level library development. "It is important for users of a library to know if calling a function involves doing synchronous I/O (also called blocking I/O). For example, an application with an user interface need to be responsive to user input and may even need to update the user interface every frame for smooth animations (e.g. 60 times a second). To avoid unresponsive applications and jerky animations, its UI thread must never call any functions that does any synchronous I/O."

Comments (5 posted)

Page editor: Jonathan Corbet
Next page: Announcements>>

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds