LWN.net Logo

Comparing ODF and OOXML

Sam Hiser has put up a detailed comparison of the OpenDocument and Microsoft OOXML document formats. "ODF is the only format unencumbered by intellectual property rights (IPR) restrictions on its use in other software, as certified by the Software Freedom Law Center. Conversely, many elements designed into the OOXML formats but left undefined in the OOXML specification require behaviors upon document files that only Microsoft Office applications can provide. This makes data inaccessible and breaks work group productivity whenever alternative software is used."
(Log in to post comments)

ODF complete?

Posted Jun 11, 2007 15:54 UTC (Mon) by forthy (guest, #1525) [Link]

I must say I'm a bit biased, because I think the only viable long-lasting and sufficiently well documented format for text documents is LaTeX. The comparison claims that ODF is completely documented. However, I don't see the same level of details on ODF how a text is actually layouted as in LaTeX. The ODF specification is basically a structural specification, it says what the tags tell the layout engine, it does not say how the layout engine actually works. You can somewhat guess what it's roughly supposed to mean, but it's not documented.

The good part of this is that we don't have to stick to OpenOffice.org's current, not exactly good algorithm how to format a document. But when people complain that OOXML doesn't tell you how the WP6 spacing algorithm works: ODF doesn't tell you how the spacing algorithm works, either (the only one to choose from, the default one). It doesn't tell you a lot of other formatting/rendering things, as well.

Well, in the land of the blind, one-eyed is king.

ODF complete?

Posted Jun 11, 2007 16:03 UTC (Mon) by nix (subscriber, #2304) [Link]

It depends whether you want a document to always look the same *and* be editable. If you do, the only viable format is one based on TeX (or that transitions through TeX to do rendering, e.g. LyX).

I'd prefer that for my own documents, too, but many people only require that their documents always be editable and be renderable into *some* acceptable-looking output format, in which case ODF will do: or that a given output document always looks the same (for which a myriad of formats will do, PS, PDF, DVI, but not HTML or XML).

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 16:22 UTC (Mon) by khim (subscriber, #9252) [Link]

This is joke, right ? LaTeX documentation is worse then both ODF and/or OOXML. A lot of details are not even mentioned in documentation - the only way to know what goes on is to dig in implementation. Of course the are only one implementation of LaTeX and it's compatible with itself, but that's true for any proprietary format: .doc or .xy ...

So if you want to compare LaTeX with anything you should compare it with .doc format with MS Word as a single implementation. And it works quite well indeed and is easier to use in many cases. Of course the big difference is that changes in LaTeX are smaller in scale and that TeX and LaTeX are free, but from technical viewpoint it's adequate comparision. To compare ODF or OOXML with LaTeX is to compare apples and oranges...

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 16:58 UTC (Mon) by ajross (subscriber, #4563) [Link]

My memory is that LaTeX is very well documented from the perspective of authoring. Writing output/presentation level stuff with TeX is, however, a black art. If the built in templates work for you, then you're fine. But it's true that TeX for the professional publisher is definitely on the challenging side.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 17:15 UTC (Mon) by khim (subscriber, #9252) [Link]

Everything you've said applies to .doc (with MS Word as a single implementation) just fine: as long as you are happy with what it does - everything is fine. If you'll try to understand what goes on, or, god forbid you'll try to write independent reader - everything goes straight to hell.

BTW as document format it's horrible: if you'll copy fragment from one document to another one (with different enough set of styles applyed) you'll probably mangle it beyond recognition. If the document will be valid at all.

LaTeX is perfect for what it does, but it's horrible, closed document format: the only reliable way to read LaTeX is to use TeX (just like the only reliable way to read OOXML is to use Microsoft's tools).

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 18:09 UTC (Mon) by tzafrir (subscriber, #11501) [Link]

Considering that TeX has so many implementations, and that I have three different latex2html packages on my Debian (latex2html in perl with some help from tex, hevea in ocaml and hyperlater: not sure).

Also: if you copy part of a document, the text and semantics are supposed to be preserved. Formatting is not. This is a design decision, and a sane one.

Well, LaTeX is indeed technically a collection of TeX macros. But this does not mean that the format is closed. It is well-documented and implemented in various inter-operating implementations.

No company considered it important enough to be standardized as an official ANSI ISO or whatever standard. But this is not a basic requirement from an open standard.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 20:06 UTC (Mon) by khim (subscriber, #9252) [Link]

<p><i>Considering that TeX has so many implementations,</i></p>

<p>There are only two kinds of implementations: correct ones (based on Knuth's version) and kind-of-working ones (the rest). And I don't even remember when I've last seen second kind: they are all dead because "they are incompatible".</p>

<p><i>and that I have three different latex2html packages on my Debian (latex2html in perl with some help from tex, hevea in ocaml and hyperlater: not sure).</i></p>

<p>And how well all these tools preserve layout ? More-or-less-acceptable if you are lucky, right ? Welcome: this is <b>exactly</b> OOXML situtaion. Both with LaTeX and OOXML you have "central program" (TeX for LaTeX, MS Office for OOXML) with full format support and you have "second-rate citizens" (the rest).</p>

<p><i>Also: if you copy part of a document, the text and semantics are supposed to be preserved. Formatting is not. This is a design decision, and a sane one.<i></p>

<p>Have you ever tried to combine texts with \usepackage[latin1]{inputenc} and \usepackage[koi8-r]{inputenc} (for example) ? World != U.S., you know. You kind-of-do-it with iconv and \usepackage[utf-8]{inputenc} - but then you'll have a lot of problems with BibTex. Yes, it's possible to do, but it's not easy! For <b>a lot of</b> people ability to copy-and-paste text ranks much higher then ability to precisely draw math...</p>

<p>Well, LaTeX is indeed technically a collection of TeX macros. But this does not mean that the format is closed. It is well-documented and implemented in various inter-operating implementations.</p>

<p>Just like OOXML. It's documented and you can kind-of-process it, but to produce <b>real</b> output you can use one program and one program only. And OOXML has more precise documentation.</p>

<p><i>No company considered it important enough to be standardized as an official ANSI ISO or whatever standard. But this is not a basic requirement from an open standard.</i></p>

<p>Of course not. There are different definitions of "open standard", but the most important measure of openness is number of independent implementations. Both OOXML and LaTeX fail this test spectacularly.</p>

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 23:29 UTC (Mon) by anselm (subscriber, #2796) [Link]

LaTeX is implemented in terms of TeX, thus runs wherever TeX runs. Even though most current installations of TeX are based on Donald E. Knuth's WEB implementation (by way of the web2c translator and lots of patches) , there are indeed correct (in the sense of passing the TRIP test suite) implementations of TeX that do not derive directly from Knuth's code. On the other hand, Knuth's code has been ported to pretty much any worthwhile computer, so there is nothing wrong with using it -- especially since by now it is, to all intents and purposes, bug-free, which makes it unique among software systems of comparable size.

As far as the input-encoding problem is concerned, a »\usepackage[...] {inputenc}« command should be viewed like a »Content-Type: text/plain; charset=...« header in a MIME message -- that is, it declares the meaning of the non-ASCII bytes within a file. It would not be a big problem to come up with a »\begin{inputenc}{...} ... \end{inputenc}« LaTeX environment that would let you mix various input encodings in the same file, but from a practical point of view most text editors will not handle this satisfactorily. Nor do most WYSIWYG word processors! In the long run, using Unicode may indeed be the best bet. BibTeX could be fixed (there is an »8-bit BibTeX« around and has been for a while -- it's not exactly rocket science), and there are other programs/macro packages for doing bibliographies that tend to work better than BibTeX, anyway.

There can be no doubt that LaTeX has its shortcomings but the ones that you quote are mostly straw men. However there can also be no doubt that TeX and LaTeX have proved their worth in various important places for the last 25 years or so, and most if not all documents from back then can still be successfully processed using a modern system (which is more than can be said for Microsoft Word). From a practical point of view this is the type of continuity that ODT promises us with great vehemence, but we shall not know for another 20 years whether it will actually deliver on this promise.

Anselm

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 18:10 UTC (Mon) by JoeBuck (subscriber, #2330) [Link]

The situation with LaTeX is not quite as bad as you describe, in that the directives are documented and it is a text format.

Still, LaTeX is only usable by those who don't particularly care about the details of the layout and presentation of their document, and are satisfied with whatever decisions LaTeX makes for them. For technical papers, it's fine. But it is an inflexible system that Knows Better Than You.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 23:33 UTC (Mon) by anselm (subscriber, #2796) [Link]

Actually there have been great advances in methods to adapt LaTeX output to what you (as the document author) want. No longer do you have to limit yourself to Computer Modern fonts and LaTeX's ideas of what a chapter heading ought to look like. There are various packages around that will let you conveniently restyle all sorts of presentational aspects that somebody who hasn't looked at LaTeX for 5 to 10 years would consider set in stone.

Anselm

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 21:42 UTC (Mon) by bfields (subscriber, #19510) [Link]

LaTeX is perfect for what it does, but it's horrible, closed document format:

I think you have an odd definition of "closed".

the only reliable way to read LaTeX is to use TeX (just like the only reliable way to read OOXML is to use Microsoft's tools).

I don't think it's fair to completely ignore the fact that TeX and LaTeX are free software.

You can get "TeX: the program" from Amazon--600-some obsessively-commented pages of source code. I don't know if it's the whole thing, but I doubt that'll bring it up to the 6000 pages this article claims for the OOXML spec.

I know which I'd rather read.

TeX is most thoroughly DOCUMENTED.

Posted Jun 12, 2007 0:27 UTC (Tue) by xoddam (subscriber, #2322) [Link]

> 600-some obsessively-commented pages of source code. I don't
> know if it's the whole thing,

It's the whole thing.

What you're running on your modern Linux box won't be *identical*,
because TeX has not been at a complete standstill since the book
was published, but additions have been few and far between.

As far as I'm concerned, no software in history has been better
documented than TeX.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 17:52 UTC (Mon) by tzafrir (subscriber, #11501) [Link]

There are multiple implementations of the TeX ("engine") and they are *very* compatible with one another.

http://www.tug.org/interest.html#free

This is much more than you can say about "Microsoft Word format" (different versions of MS-Word render the same document differently) or even ODF (abiword, kword and oowriter don't really agree on how a ODF document I have should look like).

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 18:09 UTC (Mon) by arcticwolf (guest, #8341) [Link]

Those are distributions, not implementations. They still all use the same TeX (and LaTeX) at their core.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 19:34 UTC (Mon) by mebrown (subscriber, #7960) [Link]

ODF is a document format specification, not an output/rendering specification. It is meant to ensure you can edit documents with any set of programs, not that they should all look identical when output.

If you want pixel-level accuracy when rendering a document, use PDF.

LaTeX is DOCUMENTED ?

Posted Jun 11, 2007 19:19 UTC (Mon) by lenov (subscriber, #15428) [Link]

First of all, you mistake TeX and LaTeX. LaTeX is a high level language that encodes the semantics of the document, not the physical details. Then you have TeX the language, that is processed by TeX the program to generate the document. As a former member of the french translation team, I can say that LaTeX is EXTREMELY well documented.

Regarding the implementations of TeX, there are many: TeX itself, PDFTeX Omega, NTS etc

LaTeX as an Open Document Format

Posted Jun 11, 2007 18:41 UTC (Mon) by filker0 (guest, #31278) [Link]

You bring up LaTeX as a good document format. I agree that LaTeX is good for writing documents; it's a mark-up language.

LaTeX is a structured text document format that does not directly dictate presentation. You tell it how you want the document structured, and indicate the style you want it presented with, and when you render it, you get what you asked for based on the device it is rendered for. If rendering for two devices is a bit different, you might end up with different pagination, hyphenation, etc., and may have to tweak it. For the WYSIWYG set, this is unacceptable (mostly because they've never really delt with the structure of their documents, just what they look like.) I've written documents with LaTeX and then, with no change to the source files, printed them out in several formats, just by applying different document styles to the same set of source texts.

ODF is somewhat informed by LaTeX/TeX if for no other reason that SGML is influenced by LaTeX/TeX. It's a structured mark-up language, designed to be applicaiton neutral (it can be argued that LaTeX is not application neutral).

Almost all (if not all) TeX implementations share the same core web sources at the root. LaTeX is based on a set of TeX macros written by, I believe, Leslie Lamport. This is all in the public domain, so far as I know.

Although you might say that LaTeX is an open format, it really has only one application that reads the document files. This happens to be a free, cross-platform application, but it's still only one application. It's pretty straight forward (given the right DTDs) to translate LaTeX documents into SGML, and I presume ODF; but I suspect that you'd have to call TeX (and LaTeX) largely a legacy application. Still, archived LaTeX documents are simply text files with mark-up, so there's no application lock-in for the information itself (there is with the MS apps and formats, including OOXML).

ODF goes beyond what LaTeX attempts to do. It allows the embedding, within a single document format, several different types of information, and is far less static. TeX is not appropriate for embedding non-text (graphics, spreadsheets, presentations, audio, hyper-links, etc.) objects. It has some dvi escapes for including stuff from external sources, but it's primarily designed for formatting/publishing static documents.

So, while I like TeX and LaTeX, and have been using both since the early to mid 1980s, it lacks a lot of what is needed in a unified document format that can be used by a large number of different applications.

LaTeX as an Open Document Format

Posted Jun 11, 2007 19:30 UTC (Mon) by lenov (subscriber, #15428) [Link]

> Almost all (if not all) TeX implementations share the same core web sources at the root.

This is completely untrue.

> LaTeX is based on a set of TeX macros written by, I believe, Leslie Lamport. This is all in the public domain, so far as I know.

"believe", "so far as I know". This is the whole problem of this discussion. I am STUNNED by the depth of its inaccuracy.

FYI, yes Leslie Lamport wrote the first set of macro (until LaTeX2.09). Then the community took over. No, it is not in the public domain (TeX is). It is protected by a Licence called the LaTeX project public license (LPPL):
http://www.latex-project.org/lppl/

LaTeX / TeX

Posted Jun 11, 2007 21:44 UTC (Mon) by filker0 (guest, #31278) [Link]

Lenov: I'm terribly sorry that I've so offended you.

TeX is in the public domain, as you agree. LaTeX uses the LPPL -- I'm not sure the original version that I got from Leslie back in the early 1980s was, though. I don't thhink the LPPL existed at the time. I know the GPL didn't. In any case, LaTeX is still a set of TeX macros.

I don't have time to research everything that I say. I was not making a legal representation, so "I believe" and "as far as I know" is perfectly acceptable. I was not stating as fact anything that didn't know for sure, and since I was not entirely correct in what I said, that was a good thing. I've been a user of TeX and LaTeX for a long time. I first used LaTeX on a VAX VMS system at Digital Equipment Corp., where I worked in the early to mid 1980s. I got my LaTeX macros back then directly from Mr. Lamport.

I have 6 different TeX distributions on hand. As far as I can tell from the documentation, all of them are reworked web2c implementations, where someone went through and replaced all of the trivial access routines with direct references to a global data structure, a context data structure passed on the stack, or to discrete global variables. I believe they are all based, however, on Knuth's web code. That's not to say that there are no existing TeX implementations that were done without using the web sources, (nothing to do with HTML, btw.), just that I'm not directly familiar with them.

I have no idea how anything I said was completely untrue.

I hope you've recovered enough from your trauma. All I said was that LaTeX, although a wonderful text markup system, does not do what a modern enterprise document handling system (not just text, but spreadsheets and multi-media presentations) requires of it. Frankly, I used to be a TeX junkie. I did a lot of TeX coding back in the 80s and early 90s. I did a port of TeX to the Atari ST (starting with the web sources) in 1987; I had to modify the linker and RTL that I was using in order to support .text segment overlays in order to make it fit. LaTeX simply ran on top of that.

So, once you're no longer STUNNED, I hope can find a little bit of perspective before attacking someone for an opinion. I also suggest that you read what's actually written, not just what you want to see so you can get so worked up.

TeX is NOT in the public domain

Posted Jun 11, 2007 23:45 UTC (Mon) by anselm (subscriber, #2796) [Link]

TeX is in the public domain, ...

Please check before making misleading claims like these. The source code for TeX starts like

% This program is copyright (C) 1982 by D. E. Knuth; all rights are
% reserved. Copying of this file is authorized only if (1) you are
% D. E. Knuth, or if (2) you make absolutely no changes to your copy.
So TeX is most emphatically not in the public domain -- it is copyrighted by Donald E. Knuth.

Anselm

TeX is NOT in the public domain

Posted Jun 12, 2007 1:37 UTC (Tue) by filker0 (guest, #31278) [Link]

I stand corrected. It is, however, still freely available. I believe that I even exchanged some e- mail with Dr. Knuth about my port of it back around the time I was trying to get it to run on the Atari. The web source, being the source to both the TeX program and to (at least early versions of) the TeX book, is certainly still the property of Knuth, I would think.

I never intended to mislead anyone. What's the point? You should say "Please check before making incorrect statements", which is silly, it should really be "please check your facts before knee-jerk repsonses to an over-the-top flame while you're already under stress", and even then, if you look back at my original reply on which this thread is expanding, I said that it was my recollection (or something to the effect). Since the flamer agreed that TeX was PD, I didn't look further. I did, however, look at the various TeX packages that I have accesses to, and they were all decendents of Knuth's web sources via web2c. They all say so in their README files.

I no longer have the web sources around. I do have a version of weave and web2c, but the TeX original sources were on a 9 track magtape that I've lost.

According to the docs that I have at home, I did originally have an extremely early version of LaTeX, obtained from Dr. Lamport directly around the time he came to DEC from SIR. I've got a print-out of an old e-mail from him in reply to a question I sent him.

I still stand by my assertion that TeX and LaTeX, although very good in their own right, do not meet all of the needs for a common multi-application file format. Also, it's harder to parse than XML, though dead easy compared to some other mark-up languages that were around at the time. That does not mean, by any means, that I don't believe that TeX and LaTeX are very good; they are. Further, I've seen TeX to/from SGML translators from the 1990s, and looking at the SGML generated (and the DTD), I see that it is a pretty clean translation. All of the concepts within LaTeX should be (I don't know that they are) expressable in ODF. Whether the apps are up to the level of expressiveness of TeX/LaTeX, I seriously doubt. WYSIWYG doesn't do that well with some of the things you can do with TeX. (Anyone remember the spiral text paragraphs examples?)

The discussion at hand, however, is XML "office" document formats. Whatever LaTeX is, it isn't one of those. Neither is Interleaf, DSR (DEC Standard Runoff), roff/nroff/troff/ditroff, rn, Eva, Unilogic Scribe, Author, or CDF. All of these are text with mark-up, just like LaTeX. All of them have some number of documents written in them. CDF (Combined Document Format) even allows the embedding of dynamic content, but it's still not XML, and cannot be processed by an XML parser.

Disclaimer: I don't like WYSIWYG editors. I use emacs and LaTeX, not OO.o or Word, when I'm writing my own documents. My opinions are my own, and are obviously not shared by anyone with half a brain.

TeX is NOT in the public domain

Posted Jun 12, 2007 17:45 UTC (Tue) by lenov (subscriber, #15428) [Link]

"So TeX is most emphatically not in the public domain -- it is copyrighted by Donald E. Knuth."

I think you may confuse a copyright and a licence.

The source-code you mention is taken out of context. This is not the licence of TeX but a paragraph The complete mention is:

% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved.
% Copying of this file is authorized only if (1) you are D. E. Knuth, or if
% (2) you make absolutely no changes to your copy. (The WEB system provides
% for alterations via an auxiliary file; the master file should stay intact.)
% See Appendix H of the WEB manual for hints on how to install this program.
% And see Appendix A of the TRIP manual for details about how to validate it.

It is not meant to restrict the use of TeX algorithm or software, or the development of other implementations, but to conserve the integrity of the system.

TeX has been repetedly said to be in the public domain by Knuth himself.
see http://www.tug.org/TUGboat/Articles/tb07-2/tb15knut.pdf

TeX is NOT in the public domain

Posted Jun 12, 2007 18:33 UTC (Tue) by anselm (subscriber, #2796) [Link]

If TeX was in the public domain, which it isn't, it wouldn't need a licence for people to change it, etc. With respect, if anybody is confusing copyrights and licences here it is you. Donald Knuth has graciously made the source code to his TeX implementation available for us to use, but his copyright and licence forbid any changes to that source code (as expressed in the tex.web file, from which these lines are quoted). However, he has seen fit to also make available a mechanism by which controlled changes to a TeX implementation may be made by anybody without having to change Knuth's actual work, by preparing sets of patches to be applied to the tex.web file before compiling it, with no changes to tex.web itself. This mechanism has been used to great effect for all sorts of interesting changes and improvements during the last 25 years or so.

To reiterate, a work (such as a book, piece of music, or program) that is by its nature eligible for copyright may be either copyrighted by somebody or »in the public domain« (meaning nobody holds a copyright). The usual way for a work to enter the public domain is for the original copyright to lapse (in Germany, where I live, this is in fact the only way, and happens 70 years after the death of the work's author), but in some jurisdictions, such as the US, it is possible for an author to explicitly disclaim their copyright and donate a work to the public domain. With TeX, this has not happened, for the exact reason that you cite.

When Knuth talks about the public domain in connection with TeX, he usually refers to the algorithms invented by him and used in the program. As far as copyright is concerned, this is a bit of a red herring, since algorithms as such cannot be copyrighted to begin with. (What can be copyrighted are implementations of algorithms, like the actual TeX program.) However, in some jurisdictions, such as the US, patent protection is available for algorithms, and to say that the algorithms of TeX are in the public domain means that they are not encumbered by patent claims. Hence they are free to use for other pieces of software (Adobe InDesign comes to mind, AFAIK). In fact it is a pity that TeX's algorithms such as those for line breaking do not see more uptake in the world of WYSIWYG word processors, most of which do an abominable job on problems that, thanks to Don Knuth's efforts, have been basically nailed for good more than two decades ago.

ODF more complete than LaTex

Posted Jun 11, 2007 19:58 UTC (Mon) by AJWM (guest, #15888) [Link]

LaTeX might be the bees knees for text, but ODF also supports spreadsheets and presentations (and a few other odds and ends). LaTeX doesn't do me much good if I want to exchange a spreadsheet - vs a static representation of a spreadsheet - with somebody.

ODF more complete than LaTex

Posted Jun 11, 2007 20:25 UTC (Mon) by khim (subscriber, #9252) [Link]

First of all, you mistake TeX and LaTeX.

Not at all. I know what is TeX and what is LaTeX. Very well indeed: I've spent enough nights in my life trying to make the book look like it should... TeX is the program Knuth wrote and LaTeX is set of macros - nothing more, nothing less...

LaTeX is a high level language that encodes the semantics of the document, not the physical details.

You can pretend that it's true but the illusion only holds as far as you don't stumble upon some "gray area" - and then you need to dig in LaTeX implementation and TeXbook...

Then you have TeX the language, that is processed by TeX the program to generate the document.

You don't have TeX language, you have TeX program - TeXbook is just translation from Web to English...

As a former member of the french translation team, I can say that LaTeX is EXTREMELY well documented.

I'd say that quality of documentation for OOXML and LaTeX is more-or-less the same: both are quite precisely documented, both have some "gray areas" where the implementation rules apply and no real documentation exist. LaTeX is much simples (thnx god), but that's the only difference... of course LaTeX does not support presentations and spreadsheets at all without additional packages...

Regarding the implementations of TeX, there are many: TeX itself, PDFTeX Omega, NTS etc

How many of them are independent ? I'd say: none. Or are there some project which does not start from Kunth's codebase ? NTS is written in Java, true, but it still starts from reimplementation of the same old TeX and then add new features...

ODF complete?

Posted Jun 12, 2007 8:56 UTC (Tue) by ekj (subscriber, #1524) [Link]

It depends on what you want to achieve, which ain't really generally very well-defined.

There are basically three alternative answers, no solution I'm aware of manages all three.

First, you want to be able to open the format at any time, on any platform. This is priority one. There is a real risk that many of the multitude of MS-DOC variants, for example won't be readable by *anything* in the near future, certainly they won't be readable by *everyone*. Gisle Hannemyr has a Word-document, created on NT-3.51 on an alpha if I remember correctly, that he has been unable to open in any current program. ODF and OOXML both manage this. If you store a document in either format, it is very likely that whoever wants to read it a hundred years from now, can infact do so without much hassle.

Second, you may want the document to stay editable. That is, you may want semantics to stay understandable. So that in a hundred years, not only can you still open the document, but you can also still work with it sensibly, say create a v2 of it, adding a chapter, fixing some problems, *without* needing to essentially just grab the data and do markup anew. A html-document, for example almost certainly fulfills this requirement.

Third, you may want to be able to, in the future, print the document, or view it, and have it look identical to the way it looks today. Many view-only formats fulfill this easily, but ODF and OOXML does not.

Fulfilling the third has problems. It means that things cannot progress, or an ever-growing set of new tags need to be added to specify each new behaviour that didn't use to be.

To take a trivial example. If you have a trivial document;

[document][paragraph]Hello World![/paragraph][/document]

Then everything that is default now, must be default FOREVER.

  • If antialiasing becomes common (is today, wasn't some years ago) you'll need to explicitly set a property for that on every newly created document.
  • If the default font changes, or improves, you'll need to explicitly state font (or worse yet, font-version) in every document generated thereafter.
  • If the kerning-algorithm improves, again, you'll need to explicitly state "use-new-kerning" or somesuch.
  • Indeed, it is not clear that "look identical" is possible at all, given improvements/changes in display and printing-technology. If whites in the future are "whiter" and blacks in the future are blacker than they are today, should we deliberately display old documents with "old-black" on "old-white" and specify in *all* newly generated documents "use-new-white" and "use-new-black" ?

I personally think it's perfectly fine that ODF doesn't specify details like *precisely* how the default font looks or *precisely* how antialiasing is done (or if it's done at all) or *precisely* how black *black* is.

True, this means that a document printed 100 years from now may look sligthly different (hopefully improved) relative to today. But the advantage is that the document-format doesn't need to bloat up for every new tiny improvement -- it's okay to improve the antialiasing-code without needing to change the document-spec.

ODF complete?

Posted Jun 12, 2007 9:59 UTC (Tue) by forthy (guest, #1525) [Link]

Then everything that is default now, must be default FOREVER.

Indeed, and that's how it works in LaTeX. The default font is cmr, and it's set in stone. The default paragraph breaking algorithm is set in stone. If you want a new one (e.g. pdftex's one which also stretches/compresses letters, not only spaces), you have to select it with a \usepackage command. If you want another font, you have to select it with a \usepackage command, as well. The common fonts to be selected are the default Postscript fonts, which are cast in stone as well, for obvious reasons (a Postscript document has to render identical, too). If someone comes with a new kerning algorithm, you'll have to select that as well.

That's how standards are supposed to work. If you want something new, you don't break compatibility, you add the feature in a way that doesn't impact the past usage. LaTeX is extremely well-defined in this respect: Packages and options define the feature-set of the document.

Antialiasing however doesn't need a new switch. The document rendering describes where the letters come to fit, the display resolution or ways to improve apparent resolution is not an issue. Nobody complained when the TeX previewer xdvi added antialiasing and subpixel positioning of letters, it was an improvement over previous previewers.

I indeed want three things from a document format: Long term stability, editability, and to-the-point accurate re-rendering on the same output page size (resolution device-dependent, position device-independent). Editability means that re-rendering with other output page sizes, fonts, and so on must provide a pleasant result, as well, but since that's a change to the document, it doesn't have to be identical.

So LaTeX isn't perfect, but in terms of documentation, ODF is much worse. The ODF standard basically describes the syntax of ODF, not the semantics.

I can create texts and presentations in LaTeX today, though the feature-set of beamer (in terms of silly animations and such) is different from OpenOffice or PowerPoint. I can't create spread-sheets, or access data bases. LaTeX parsing isn't as easy as XML parsing. Frontends like LyX aren't WYSIWYG (but TeXmacs tries much harder), and don't use LaTeX as their native format (they can export to, and to some extend import from, though).

Comparing Apples to Fungus

Posted Jun 11, 2007 18:21 UTC (Mon) by filker0 (guest, #31278) [Link]

ODF is a structured document format that was designed, from the start, to represent a document in an implementation independent manner, and leave the presentation to the application. Two applications may render the same ODF document differently, but no information should be permenantly lost in the exchange. Elements that are not understood by one application should be, by default, preserved, even if not presentable by that application, and not dropped from the document file after editing of other elements. As far as I know, it was not designed to mimic the internal data structures of StarOffice or OpenOffice.org.

OOXML is a representation of Microsoft Office binary document file formats in XML. The MS binary document formats are historicly based on the internal data structures used within the application, and therefore have a lot of implementation and/or design specific structures. It was not designed from the ground up to be a structured XML document representation. Presentation details are embedded in the MS documents, including printer settings and other details that are not even always compatible across Microsoft platforms. This general mess is carried over into OOXML.

Even if you ignore the IP and vendor lock-ins that OOXML implies, OOXML fails on its technical merits -- I doubt that even MS will be able to fully support it. It locks them into their current internal application design. The way you convert OOXML to ODF is the same way you convert a binary .DOC file to ODF -- you render it with "ODF" as the output device. In both cases, because of poor design decisions in the MS products, information will be lost, or only encodable in a binary form that is opaque to non-MS apps.

Biased comparison

Posted Jun 11, 2007 19:24 UTC (Mon) by pjm (subscriber, #2080) [Link]

The comparison does appear somewhat biased.

It says "OOXML Is Not Fully Implemented in Any Application" whereas "multiple implementations of ODF exist today", but without saying how incomplete this ODF support is. http://opendocumentfellowship.org/applications shows no software as having "perfect support" for ODF. KOffice and OpenOffice.org are each given four stars for overall support, but it is noted that "OpenOffice and KOffice have some problems opening each other's OpenDocument files." Of course, this could be because some parts of ODF are in fact unspecified: and according to http://en.wikipedia.org/wiki/OpenDocument_technical_speci..., the OpenDocument committee have even said that anything (such as the meaning of spreadsheet formulae) not expressed in XML is outside of the scope of the specification!

The comparison trumpets ODF's "reuse of existing standards", and highlights OOXML's ignoring of existing standards like SVG and MathML. One might think, then, that ODF's drawing format would start with SVG (or some other existing standard) and then specify only the behaviour relevant to editing; whereas in fact ODF has its own representation of drawings that differs rather gratuitously from SVG in some places.

It's unfortunate that this comparison of openness does not appear open to corrections.

Biased comparison

Posted Jun 12, 2007 2:41 UTC (Tue) by pjm (subscriber, #2080) [Link]

I should say that I still agree with the general conclusions of the comparison, but I think the aims of this document would be better served by a more even-handed approach and input from others including OOXML supporters.

I'm just a bit disappointed that we have yet another "standard" for text and graphics when I believe there are no complete implementations of existing standards like SVG and PDF. (Adobe's own PDF specification acknowledges a number of differences between the specification and Adobe's products — unless that's changed since.) If it's interoperability we want, then we should subset existing standards, not create new, more complex, ones.

SVG? PDF?

Posted Jun 14, 2007 16:05 UTC (Thu) by dwheeler (guest, #1216) [Link]

Huh? SVG only handles vector graphics, not text. PDF handles text and graphics, but is essentially uneditable (all the higher-level information you need for reasonable editing are lost in PDF). Both SVG and PDF are useful, but NOT as a format for editable office documents. There's still a need for OpenDocument.

Reuse of existing standards

Posted Jun 14, 2007 18:44 UTC (Thu) by pjm (subscriber, #2080) [Link]

Sorry, I was unclear, I meant SVG and PDF in the context of the graphics: i.e. my claim is that OpenDocument would be more widely implemented sooner if the graphics parts of OpenDocument (mainly §9.2 of the OpenDocument v1.1 spec) were written starting from an existing spec like SVG or PDF, and adding whatever hints for editing as necessary (that can be ignored by software that only needs to render it, and optionally ignored by software that converts from OpenDocument to another format).

As an example of editing/semantic information, draw:connector provides useful semantic information and it's good to include that information in OpenDocument files, but all renderers and most converters can't use that information, they just want to know how the connector should be rendered: if connectors in OpenDocument were implemented as drawing commands plus an ignorable "this is a connector" annotation (implemented as an attribute of a group, say) then there's less to implement.

Most of §9.2 has ready counterparts in SVG or PDF. The biggest functional difference from SVG that springs to mind is differences in how things should be scaled when the drawing is resized (due to different places where units can be specified), which is a fairly minor difference (given that mixing proportional and non-proportional units in one document is rarely a good idea, and that OpenDocument doesn't really provide the adaptation/hinting one would want in cases where it would be useful). If this is a deliberate and valuable difference from SVG, then it too could have been specified as attributes that can be ignored when not doing (non-uniform) resizing.

Incidentally, for other readers, SVG does handle text, though is not as well suited to editable text as OpenDocument is, which really what dwheeler means in his first statement.

I'm more familiar with the graphics parts of OpenDocument than other parts, but I believe one could make similar points about the text parts of OpenDocument compared to existing standards. If OpenDocument documents containing only "text" parts were actually valid (insert choice of existing standard here, say xhtml) plus ignorable hints for anything not part of that standard, then such formatted-text-only OpenDocument documents would be immediately parsable by hundreds of programs.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds