By Jake Edge
June 29, 2011
In the few weeks since Apache accepted
OpenOffice.org (OOo) as an incubator project, the project has gotten
off to a
fast start, at least from a planning perspective. It is an interesting
case study in adapting an existing project into the infrastructure and
governance of an entirely different organization. That leads to some
technical and organizational challenges that are being addressed, but it
remains to be seen how well OOo will fit under the Apache umbrella. Apache is an excellent organization, and
likely a good home for OpenOffice.org developers who are not
concerned about copyleft, but it hasn't ever really absorbed a codebase and
community of this size before.
The biggest hurdle may be in the area of tools, in the form of adopting
(and adapting
to) the Subversion (svn) version control system. There is nothing wrong
with svn per se, but it may not be the right tool for this
particular job. The existing OOo repositories are stored in Mercurial
(hg), which is a distributed version control system (DVCS), so the existing
developers clearly saw a benefit to DVCS-style development. LibreOffice (LO),
on the other hand, has adopted Git, which also has the distributed features
many projects are finding useful. But Apache projects must use Apache
infrastructure, which requires svn, at least for now.
The problems of converting the existing repositories from hg to svn are
certainly solvable, but the bigger problems may arise in trying to
coordinate development with a widely distributed contributor base. There
are, undoubtedly, ways to make that work (and various Apache projects
already do), but it seems a bit odd to choose a VCS based on the required
infrastructure, rather than the other way around. In the end, a decision made
based on the licensing that Oracle and IBM were comfortable with trickles
down into the development style of the project.
None of that is to say that svn will necessarily be a barrier, or that the
Apache infrastructure is inadequate to the task. One would guess that any
problems there will be worked around one way or another. But community-led
projects typically make their decisions rather differently.
There are lots of discussions going on in the new incubator ooo-dev
mailing list that clearly indicate a community that is bootstrapping
itself. That list has well over 1000 posts in just the two weeks or so that
it has existed. It's an interesting transition to watch, at least partly
because we haven't seen anything like it in the free software world. There
is a huge existing infrastructure (web sites, build farms, code
repositories, etc.) to support OOo that currently exists on
Oracle's servers, so moving that to its new home is a major part of the
transition.
IBM's Rob Weir has been coordinating much of the work that is being done on
the Apache project. In addition, he has put out a pre-proposal for an ODF Toolkit project, which
would include several, mostly Java-based, tools for manipulating Open
Document Format (ODF) files. Whether that ends up as a top-level Apache
project or gets added into an existing Apache project (perhaps POI or OOo itself) is an open question,
but it's clear that there is more than just OOo that Oracle and IBM would
like to put under the Apache banner.
Governance issues are also being discussed as a corporate-controlled (and
dominated) project transitions to the community-oriented, meritocratic
style that is expected of Apache projects. There is a project management
committee (PMC) to set up; in this case it's a podling PMC (PPMC) as
OOo is still in the incubator (and thus a "podling"). The list of project
committers also
need to be formalized out of the list of initial committers that was
gathered during the incubation proposal. These are the people who will
have commit access to the repositories, and additional committers will be
added to the project as their work merits it.
There are also format questions, in terms of both documentation for OOo
itself and for project planning purposes. For a document suite, creating
documentation using its formats makes a great deal of sense, but it isn't
necessarily the format that developers are used to. There is a bit of a
clash between the text-only and "rich text" (i.e. ODF) worlds. Like many
projects, Apache typically uses diffs to review patches, but that is
somewhere between difficult and impossible to do with ODF. It would seem
that a consensus is arising that planning documents will be text-oriented
and put into the wiki,
while product documentation will continue to use ODF and the ODFAuthors infrastructure.
There is also the minor task of starting to do builds of the code,
and trying to ferret out any missing pieces, along with replacing
dependencies that are not available under the Apache license. There are
still some lingering questions about whether the grant from Oracle actually
contains all of the necessary files—though it is believed that is
only a procedural hurdle. OOo 3.4 had a beta release recently, and there
are thoughts that the release should be
made using the Oracle servers, before making the big switch over to Apache.
And so on.
Many more of these kinds of discussions are taking place, most of them very
amicable, and problems are being worked out as they come up. Undoubtedly
LibreOffice struggled with many of the same kinds of things as it came up
to speed over the last eight months or so. In some sense, LO had two
parent projects to digest, the Go-OO project (which was mostly a set of
patches against OOo) as well as the OOo upstream
code from Sun/Oracle. But LO did not have any existing structure like
Apache that it needed to mesh with. Watching both projects develop over
the next
few years should prove interesting.
Comments (1 posted)
Brief items
It's amazing to me how people think of documentation as easy or an
afterthought, but there's a huge difference between documentation
written by someone coming up the learning curve and documentation
written by someone who really knows it. I'd say well designed and
engineered documentation is more important than well designed and
engineered source code.
--
Brien Behlendorf
Sure, we could have labeled it 4.0.1 or some other fractional value
that would have made some slashdot and ars readers happy, but that
would be a lie to add-on and web developers because it's not a
minor non-breaking change for them. With this versioning system, we
are communicating honestly to the only people who will have a
reason to care about versions -- developers, that we are making, or
at least asserting that we may be making, breaking changes that
they should care about.
Again, Firefox version numbers are not for consumers. Nowhere in
our announcements of Firefox was it called anything but the latest
version of Firefox. There will be no past versions of Firefox
available to consumers so it's just plain "Firefox" and it gets
better at regular 6 week intervals.
--
Asa Dotzler
1.4.0 has actually seen testing in the form of loading the module,
enjoying a view of a non-crashing X server (-retro too, I'm soo 80s
today...) and thus deducting that the driver is bug-free. Which is
more testing than previous releases have seen. Nonetheless, you
may not want to control your nuclear power plants with this driver.
--
Peter Hutterer
The RFC forgot to send an army with you, so it cannot expect to be
obeyed. In the GNU Project, we do not obey standards -- we
consider them, then DTRT. Often TRT is to do what the standard
says. Sometimes TRT is something else.
--
Richard Stallman
Comments (12 posted)
The Echoprint project has
announced
its existence and initial release of code and data. "
The Echo
Nest has been focusing on a crucial component of the oncoming music cloud
for some time: we spend a lot of time and engineering resources on music
resolving. This extends from mapping a query for a band name to its ID, to
uncovering mentions of songs on blogs, to identifying the song in an audio
stream without any metadata - otherwise known as fingerprinting. The Echo
Nest's existing fingerprint technology, 'The Echo Nest Musical Fingerprint'
aka ENMFP, has been in wide use privately and via our API for 18
months. Today we are unveiling a new fingerprint technology called
'Echoprint,' whose main feature is its complete openness - everything from
the program to analyze the audio to the server and data to make the match
are available for anyone to use, under a permissive open source license,
for free."
Comments (9 posted)
Mark Dickinson has put together a brief report from the EuroPython Language
summit, held in Florence on June 19. Topics covered include
Python 3 adoption, various open PEPs, the Linux kernel version number
change, and more.
Full Story (comments: none)
The PyPy Status Blog has
an
article describing a plan to remove the global interpreter lock and
switch to an transactional memory scheme. "
During a transaction, we
don't actually change the global memory at all. Instead, we use the
thread-local transaction object. We store in it which objects we read from,
which objects we write to, and what values we write. It is only when the
transaction reaches its end that we attempt to 'commit' it. Committing
might fail if other commits have occurred in between, creating
inconsistencies; in that case, the transaction aborts and must restart from
the beginning."
Comments (16 posted)
Version 3.9 of the
Rockbox audio player
firmware system has been released. Changes include a playback engine
rework, a number of hardware-related improvements, support for antialiased
fonts, and more; see
the release notes for
details.
Comments (none posted)
Newsletters and articles
Comments (1 posted)
Roberto V. Zicari
talks
with Marko Rodriguez and Peter Neubauer about the Tinkerpop project.
"
TinkerPop is an open-source graph software group. Currently, we provide a stack of technologies (called the TinkerPop stack) and members contribute to those aspects of the stack that align with their expertise. The stack starts just above the database layer (just above the graph persistence layer) and connects to various graph database vendors - e.g. Neo4j, OrientDB, DEX, RDF Sail triple/quad stores, etc."
Comments (none posted)
David Zeuthen has posted
a
set of guidelines for low-level library implementers. "
Unless
it's self-evident, all functions should have documentation explaining how
parameters are managed. It is often a good idea to try to force some kind
of consistency on the API. For example, in the GLib stack the general rule
is that the caller owns parameters passed to a function (so the function
need to take a reference or make a copy if the parameter is used after the
function returns) and that the callee owns the returned parameters (so the
caller needs to make a copy or increase the reference count) unless the
function can be called from multiple threads (in which case the caller
needs to free the returned object)."
Comments (108 posted)
David Zeuthen has posted
the
second installment in his series on best practices for low-level
library development. "
It is important for users of a library to know
if calling a function involves doing synchronous I/O (also called blocking
I/O). For example, an application with an user interface need to be
responsive to user input and may even need to update the user interface
every frame for smooth animations (e.g. 60 times a second). To avoid
unresponsive applications and jerky animations, its UI thread must never
call any functions that does any synchronous I/O."
Comments (5 posted)
Page editor: Jonathan Corbet
Next page: Announcements>>