Officeshots: making ODF truly interoperable
Complex file formats, such as those used for office documents, inevitably lead to differences in interpretation by application developers. If a user sends a document to someone else who views it in a different application or version, chances are that the output shows some subtle differences or, by bad luck, that the formatting is completely munged. For people that give presentations regularly, this is a constant nightmare: they have to hope that the office application on the conference laptop is able to show the presentation without mangling the slides. These problems are not tied to proprietary file formats: open standards such as ODF (Open Document Format) also have interoperability issues.
![[Upload screen]](https://static.lwn.net/images/officeshots_upload_sm.png)
A web service, Officeshots, was recently launched to remedy this problem. The project is in public beta and users can register for free to upload their ODF documents. The web site then generates the output of the document using various office applications, which enables the user to check for interoperability issues. The launch of the public beta took place during the second ODF plugfest in Orvieto, Italy on November 2nd and 3rd. A lot of vendors and developers using ODF in their software gathered in Orvieto, such as IBM, Google, OpenOffice.org, Novell, KOffice, AbiWord, and Microsoft.
Officeshots is a project by NOiV (Netherlands in Open Connection), a Dutch government program to promote the use of open standards and open source, in collaboration with the OpenDoc Society and NLnet Foundation, a Dutch non-profit organization that financially supports contributors to an open information society. LWN talked to Sander Marechal, who developed the bulk of the Officeshots code and is the project leader. He owns Lone Wolves, a small non-profit open source development company based in The Netherlands.
In June 2008, Sander was invited by Michiel Leenaars (of OpenDoc Society and NLnet) to give a talk at Sun Microsystems in Hamburg about another Lone Wolves project, ODF-XSLT. Sander drove to Hamburg with Michiel and the two talked about their mutual interests. That car drive started the ball rolling:
As the director strategy for NLnet and member of the OpenDoc Society, Michiel Leenaars had a lot of contacts with office software vendors, both open source and proprietary, including Sun, Novell, and Google. He got them interested in the Officeshots project and talked with other developers. During the recent plugfest, the project even got some Microsoft Office licenses as a gift.
Document factories
The Officeshots web site has a very simple user interface: the user submits a document, and the site delivers a PDF export, a screenshot, or a round-trip ODF file produced by the applications the user selects. A round-trip ODF means that an application opens the ODF document and then saves it again. So if the user chooses round-trip ODF as the output format, he gets an ODF document back. What's the point of this? Sander explains the importance:
Currently supported applications are different versions of AbiWord, Gnumeric, EuroOffice, Go-oo, Corel WordPerfect, KOffice, OpenOffice.org, StarOffice, TextMaker, and PlanMaker, in Linux/BSD as well as in Windows. Supported document formats are Open Document texts, spreadsheets, and presentations. The user can also create a public gallery to show conversion errors to others. A simple test using some ODF files in the example content that comes with Ubuntu definitely shows interoperability issues.
Under the hood, the user's uploaded file gets distributed to rendering servers hosted by vendors and the community. The Officeshots project calls each server that is producing output a factory. Most of the factories are run by the Officeshots project, which has a couple of virtual machines running on the Xen hypervisor to guarantee that the service is always able to produce some output.
Other factories are run by people from AbiWord, Gnumeric, and other projects, and a couple are run by volunteers. Sander highlights the first two projects:
The Officeshots project has a list of factories (currently 14) and a list of active factories (at the moment of writing 5). At this moment, the project is waiting for a new server that will host virtual machines with various Linux distributions, as well as Windows with Microsoft Office.
Contribute to Officeshots
The Officeshots project not only provides the free online web service, but also provides the code for the underlying framework (Affero GPLv3-licensed). While Sander admits that there haven't been that much external code contributions yet, he points out that there are a lot of other means by which one can contribute to the project: people can run a factory, translate Officeshots to their language, or donate hardware or software licenses.
People who want to run their own factory should contact Officeshots and consult the manual. The code can be downloaded from the Officeshots Subversion repository. The manual also explains how to implement a backend for a not-yet-supported application. The simplest way is if the application offers command-line conversion functionality. This led at least one team to implement this feature into their office application, Sander remarks:
But actually, one doesn't have to go that far to give a helping hand to the project's mission: if a user detects interoperability issues thanks to Officeshots and reports the problem to the relevant office applications, then the project has succeeded.
New functionality
The Officeshots developers have a couple of ideas to implement in the future. Of course they will add new backends. For example, Sander has already written a backend for an older version of Microsoft Word using the Sun ODF plugin, so when the Windows virtual machines are ready, a new Microsoft Office backend will be one of the possibilities. They will also add backends for the office viewer of Symbian S60 smartphones.
But other than new backends, the project has some additional new features in
the pipeline. One notable feature is an ODF diff tool. "We are
looking at a commercial
tool by DeltaXML.com, which is very useful because normal XML diffs
generate too much noise,
" Sander explains. "Using it shows
clearly that Microsoft Office replaces formulas and charts when
saving.
" Another feature in the pipeline is a service running the ODF Validator against an
uploaded document. "But we are also looking into ODF validators that
can generate messages a normal human being can understand, instead of
throwing cryptic XML exceptions like most XML validators do.
"
Another plan is to integrate the complete ODF 1.0 test suite
into Officeshots. A factory could then be periodically offered a set of
hundreds of documents to automate parts of the test suite.
Privacy
The project is also seeking some ways to protect the user's privacy. If users upload documents with sensitive information, they should know that Officeshots and the factories can read this information. At the moment, the project asks their users to have trust in the Officeshots project and third-party factories. Sander adds:
Because the anonymizer will run on the Officeshots server, the factories receive the modified text, so that users don't have to trust the third-party factories. But it still asks users to trust the people of the Officeshots server which runs the code that anonymizes the uploaded document. Concerned people can install itools locally (it is packaged in a couple of Linux distributions) and use the iodf-greek.py script (added in itools 0.60.3) to anonymize their documents before uploading them. For very sensitive documents, it is possible to run a local copy of the Officeshots web service and backends, but that takes time to install and configure.
Conclusion
The Officeshots web site is a handy service for users that are evaluating which office application to migrate to. Thanks to the project, they don't have to install each application locally to check for interoperability issues. With the web service, they can easily check if each application does what it says. Also consider template designers and people creating documents for public release. With Officeshots, they can easily check if their documents work everywhere. Last but not least, it is also a helpful tool for the office software vendors who can spot errors in their ODF support. In these ways, the Officeshots project should accelerate interoperability in the office software market.
Index entries for this article | |
---|---|
GuestArticles | Vervloesem, Koen |
Posted Nov 23, 2009 19:44 UTC (Mon)
by Velmont (guest, #46433)
[Link]
Officeshots: making ODF truly interoperable