LWN.net Logo

Format Comparison Between ODF and MS XML (Groklaw)

Groklaw is running a technical comparison of ODF and MS XML. "Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship have provide this article for us, comparing OpenDocument Format and Microsoft's new MS XML format technically, not legally. Groklaw will be doing that separately, but this article addresses interoperability. That is the point of XML, after all, is it not?"
(Log in to post comments)

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 1:42 UTC (Mon) by gjheydon (guest, #4209) [Link]

it really looks like ms xml is just a copy of the binary doc format with xml tags instead any real thought into making things accessible.

They shouldn't really worry about trying to make to make it an ISO standard as no one in their right mind would consider using it.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 3:58 UTC (Mon) by XERC (guest, #14626) [Link]

as no one in their right mind would consider using it.

I'd say, "They shouldn't really worry about trying to make to make it an ISO standard as most of the people in the world are using it already." I personally also prefere the ODT and LaTeX, but the question is, what is more of a standard: a thing that is in wide use(like it or not) or something that some bunch of whitecollar clerks have given their ceremonial blessing?"

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 4:39 UTC (Mon) by allesfresser (subscriber, #216) [Link]

Is it actually in wide use yet? I was under the impression that the standard they're talking about submitting to ISO/whoever is the Office 12 format, which hasn't been released yet...and people are making noises about not switching because of the cost, incompatibility, etc., although people said that the last x times around too...

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 7:28 UTC (Mon) by dark (subscriber, #8483) [Link]

"MS XML" sometimes refers to the Office 12 format and sometimes to the Office 2003 format (which is different). I think Microsoft is deliberately obscuring the difference, so that they can claim that their XML format is already in use.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 9:30 UTC (Mon) by OldRabbit (guest, #30886) [Link]

This article compares ODF to the Microsoft Office Open XML format (.docx), sometimes abbreviated to MOOX. This is the format to be used in the forthcoming Office 12. The Office 2003 format has the extension .xml.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 13:29 UTC (Mon) by freemars (subscriber, #4235) [Link]

The Office 2003 format has the extension .xml.

Nobody can ever accuse Microsoft of missing a marketing opportunity. If the extension is .xml it must be XML.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 29, 2005 13:14 UTC (Tue) by cpm (subscriber, #3554) [Link]

I'm sorry, I really don't mean to behave like a troll,
But if history has shown us anything, it would be that
Microsoft does whatever it damned well pleases.

If they want to call a .doc a .xml, they can. And no
one can make them do any differently.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 13:09 UTC (Mon) by danielc (guest, #34150) [Link]

By that argument, OpenDocument is a lot more standard than MS XML. Right now OpenDocument is used by a lot of people (users of OpenOffice.org, and other applications). There are over 20 applications that either support it, or plan to support it soon:

http://opendocumentfellowship.org/Applications/HomePage

By contrast, MS Office 12 XML is not even out. There is no current application that supports it. It's market share is 0%. Office 12 won't come out for a few months. In contrast, OpenDocument has a market share around 10% or so. Microsoft will have a hard time making people switch to Office 12, so take up of their format is likely to be very slow.

Cheers,
Daniel.

Never underestimate the power of the dark side

Posted Nov 28, 2005 19:35 UTC (Mon) by proski (subscriber, #104) [Link]

By contrast, MS Office 12 XML is not even out. There is no current application that supports it. It's market share is 0%.
Microsoft has extensive experience in going from 0% to 95% of the market share in no time. Think of word processors and web browsers.

Never underestimate the power of the dark side

Posted Nov 29, 2005 21:14 UTC (Tue) by khim (subscriber, #9252) [Link]

Microsoft has extensive experience in going from 0% to 95% of the market share in no time. Think of word processors and web browsers.

It took few years in all cases. And in both cases competitors were other proprietary progarms so freedom argument was never ever raised (Netscape opened browser code when was was already lost and unlike OpenOffice 2.0 this code was unsable. Yes, Microsoft is not beaten yet - but this is toughest challege to date.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Dec 4, 2005 16:24 UTC (Sun) by dps (subscriber, #5725) [Link]

MS XML does not look very like the binary .doc format---where are the completely seperate pointers into the text that mark which bits are bold, italic, etc? It actually looks a lot easier to generate from a template than anything else you can edit in M$ office moudlo HTML.

My target format of choice is LaTeX with portions of plain TeX that goes way beyond Dr. Lamport's gnats and gnus are likely to have imagined. The current version of the editting angle is to apply one of the (commercial) PDF to M$ word conversion programs out there.

Human readability is important if you want to generate a document from a template and neither RTF nor open office scores well by this metric---I could not simply expand a template in either format such that the result did not crash openoffice. (I suspect exploit potential if you tickle whatever my results tickled in the right manner).

As an ISO standard M$ will have to drop some of its stipulations, like not using their documentation to design your own word processor format, not that anyone sane would want to do that anyway. Actually using the format as in memory format makes even less sense but is probably what M$ word does...

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 10:03 UTC (Mon) by dw (subscriber, #12017) [Link]

This is only a small point, but it is indicative of the quality of the rest of the article: XHTML does not have a 'B' element, which was last seen in HTML 4.0 or the transitional DTD for XHTML 1.0. It could be argued that 'B' is an XHTML element, but it is discontinued in favour of the semantic 'STRONG'.

Aside from that, the argument against non-mixed content seems to only be based on how visually appealing the resulting XML is to a human reader. No thought or mention is given to the design of a parser for either format, which could benefit as easily from the Microsoft approach as the ODF approach. This is more important to me, as I sure as hell hope I never have to edit XML files by hand in a few years time.

The brash approach taken by the article does nothing to enlighten the general public as to whether 'open' or proprietary formats are better, since it offers exactly zero objective criticisms of the way things currently stand. Perhaps it will appeal to a few web designers who hope to convert their existing vim/HTML skills into cash once the format becomes popular.

I am strongly in favour of ODF, but articles like this serve as nothing but bait for those among us who are more concerned with taking sides than understanding the true problem.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 10:12 UTC (Mon) by dmh (guest, #14528) [Link]

XHTML 1.0 most definately *does* have a 'b' element. XHTML 1.1 has it too, within the Presentation Module. (It is absent from XHTML Basic and the XHTML 2 Working Drafts, however.)

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 29, 2005 2:18 UTC (Tue) by mikov (subscriber, #33179) [Link]

My sentiments exactly. I support OpenOffice and ODF, but the article was terrible - incoherent, biased and ultimately not informative.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 11:47 UTC (Mon) by ballombe (subscriber, #9523) [Link]

The article pretends to address interoperability but the criterion used are totaly beside the point.

Interoperability means that the standard is precise enough so that two compliant implementations produce results indentical for all purposes stated in the standard.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 12:31 UTC (Mon) by diegor (guest, #1967) [Link]

I disagree. Interoperability means that you don't need to much effort to produce a program that can read the format. SGML was a standard meant to introduce interoperability between different program, but failed even if the standard was very precise.

The problem was that the standard was so complex, that was very hard to find a real compliant implementation. Having a precise standard is required, but not sufficient, to have interoperability.

A simple human understandable format means a more easy to write and to debug implementation, and eventually more compliant implementations.

(Sorry for my bad language, I hope was understandable enough)

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 13:04 UTC (Mon) by danielc (guest, #34150) [Link]

BBBzzzzz wrong.

Have you heard of Open Document Architecture? It was a standard designed years ago that tried to do what OASIS OpenDocument is doing today. ODA failed miserably because it was just too complicated.

Simplicity has a significant effect on interoperability because complex formats are just too hard to implement, and even harder to implement *correctly*. You forget that at a fundamental level, the whole reason why people want to move to XML is simplicity. XML guanratees a well-structured program, and a well-designed XML format can be *easily* manipulated.

A while ago I wrote a PmWiki plugin to produce OpenDocument files from wiki pages. I had a prototype working in an afternoon, and a fairly decent product after a few days (working afternoons). With MS XML I just wouldn't have done it at all. It's just too complex.

You also ignored the other half of the article. Simplicity was only half the point of the article. The other half was the reuse of established standards that are well tested. This has a significant effect on interoperability.

Best,
Daniel.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 14:46 UTC (Mon) by drag (subscriber, #31333) [Link]

That's why most programs 'working formats' are not the same as the file formats.

At least when it comes to graphics. Especially when it comes to graphics.

Take a look at photoshop or gimp. Both programs have their own special 'working' format.. with Photoshop it's the psd format, with Gimp it's XCF.

They do not just save the image files, they save much of the session information. How layers are arranged, how they filter thru each other.. masks, paths, etc etc. These files are huge compared to the final format.

But nobody in their right mind would expect these formats to be portable from one program to another or even portable between program versions beyond simple import-to-latest-version type things.

All image programs are like this.. Blender (3d app), inkscape has it's own 'native' svg format varient. Cinelerra has it's own "native" quicktime format, etc...

Then when your finished with the project you export the image in a format that you'd actually expect other programs to deal with in a rational manner, weither it's OpenEXR, Jpeg, tiff, Png, Cal3d, mpeg4, or whatever makes sense.

In each of these cases the end format is much simplier, much smaller, and much more portable format based around some sort of well established standard that many other programs can use.

When things like Word were created originally the end result was always going to be Paper.. so there wasn't any sort of 'presentation' format.

Eventually we have had standardized 'presentation' format in the form of PDFs.. which originate from postscript format intended for printer proccessing.

However the tradition for 'Wysiwyg' style word proccessors has continued. Your sending the working formats from one person to another and storing the working formats as archival stuff.. which is stupid. It's hard to deal with, it makes it difficult to future proof stuff, and it makes it difficult to keep things backward compatable with older programs, but still be able to utilize newer features of newer programs.

It's very silly. But that's why Microsoft's XML format is so very complicated.

They have the same program with their HTML generator with MS Office.. they shove so much gibberish into the html code that it's pretty much unusable once it's generated. It's nearly impossible to edit, yet they do this so that when you open the file back up in Office that it saves a lot of the information that a word document would save.

Thats why they say they can't use ODF, because ODF can't provide all the functionality that is nessicary for their program's new features..

What is needed for Office things like Koffice, OpenOffice.org, MS Office, etc is to have seperate working 'native' formats and then easily portable formats for archival (because they would be much smaller and retain the important/relevent data) and presentation purposes (because it would work with the widest amount programs and be future proof)

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 16:00 UTC (Mon) by HenrikH (guest, #31152) [Link]

But still, this wouldn't prevent MS Word from exporting and importing to ODF.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 16:40 UTC (Mon) by drag (subscriber, #31333) [Link]

Exactly.

But they still refuse to do this. I'd bet there is even BSD-licensed code then use directly in MS Office (which they have no adversion from using in other parts of their OS)

With MS Office with the export/import the ideal would be to use the Office format for a 'working format' and export to PDF or ODF for sending it to other people.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 16:25 UTC (Mon) by MathFox (subscriber, #6104) [Link]

Computer typography (Nowadays document processors are actually typesetting programs) is well studied and well understood. One essentially has "text" and typesetting instructions (font use, paragraph/page breaks, reserve space for illustrations, etc.). A computer program then lays out the text on the pages according to the typographic instructions. There haven't been fundamental changes since the early 1980s.
To portably transfer documents between different computers you only need to come to an agreement on how to represent the typesetting instructions and how to add them to the text. SGML and XML offer nice frameworks for creating a vendor independent interchange format. Archivists love to have well documented formats for the documents they archive.
It apparently even is possible to find a group of software vendors that agree that the interchange format "OpenDocument" is good enough to use as _native_ document format. It is a sure sign that office document processing has become mature.

The biggest question I have is why Microsoft refuses to support the OpenDocument format, even with just import and export filters.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 22:36 UTC (Mon) by mikec (guest, #30884) [Link]

> The biggest question I have is why Microsoft refuses to support the OpenDocument format, even with just import and export filters.

Very simple, MS is quite aware of the begrudging upgrade cycle they have repeatedly introduced...

That is, you stick with the version that you have and works until you start having trouble opening and exporting to other people (or it stops working on the OS you get without any choice on the next batch of desktops you buy).

They know they sell crap - Gates has said it himself (para - "we did not set out to build the _best_ operating system, we set out to create a cost-efective solution......"

So, when you sell crap, you have to provide some other incentive to purchase... aggrivation appears to be the weopon of choice...

They know that there are an awful lot of people who hate their products but use them anyway because they have to... If they interoperate too well, even with prior versions of their own software, noone uses theirs or upgrades.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 29, 2005 2:39 UTC (Tue) by Baylink (subscriber, #755) [Link]

> (Nowadays document processors are actually typesetting programs)

Good.

I can still make a living.

:-)

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 17:57 UTC (Mon) by iabervon (subscriber, #722) [Link]

I don't generally think of layers and such as session information; I tend to think of them are parts of a richer image format, sort of like how recent PDF versions allow for a table of contents that appears outside of the displayed pages when shown on a computer. There's no reason this stuff couldn't be part of a standard image format. (For that matter, SVG supports that kind of stuff, so Gimp could use an SVG file containing bitmaps and no other drawing operations, which would be a bit silly, but would be portable).

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 28, 2005 18:36 UTC (Mon) by Blaisorblade (guest, #25465) [Link]

> However the tradition for 'Wysiwyg' style word proccessors has continued. Your sending the working formats from one person to another and storing the working formats as archival stuff.. which is stupid.

It isn't so stupid if you remember that one big reason people use a computer rather than a typewriter is that you can change what you wrote time ago.

Sure, for archives it can make sense to have a "presentation format". But only when "archive" means "external archive", i.e. an archive of published papers from different people. I would never save anything in PDF, I'd only "export" in that format. Better yet, I would almost never read the content in PDF format (except maybe if acroread starts faster than *Office*, but that's a different thing).

While I would never open an image in .xcf rather than .jpeg.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 29, 2005 2:39 UTC (Tue) by Baylink (subscriber, #755) [Link]

> What is needed for Office things like Koffice, OpenOffice.org, MS Office, etc is to have seperate working 'native' formats and then easily portable formats for archival (because they would be much smaller and retain the important/relevent data) and presentation purposes (because it would work with the widest amount programs and be future proof)

Ah... that's exactly what They *want* you to think (much less say). :-)

Look up "second-class citizen".

Your *native* file format needs to be sufficiently capable to store everything you need. And it shouldn't be that hard. Rough consensus, and working code, folks.

Format Comparison Between ODF and MS XML (Groklaw)

Posted Nov 29, 2005 2:36 UTC (Tue) by Baylink (subscriber, #755) [Link]

semi-offtopic: how hard was that? I need to go from MediaWikiText to (preferably) DocBook.

I laugh at your funny joke!

Posted Nov 29, 2005 2:46 UTC (Tue) by xoddam (subscriber, #2322) [Link]

> XML guanratees a well-structured program

Hahahaha!

Good article but pity it only talks about word processing

Posted Nov 29, 2005 11:28 UTC (Tue) by koriordan (subscriber, #3490) [Link]

A good article and it's nice to see that the OpenDocument text format is superior to the competition.

It would be nice if they did another article for spreadsheets though. After hearing how the Gnumeric team aren't at all impressed with ODS (and I'd trust them to know what they're talking about), I'd be very interesting to hear how it compares with other XML spreadsheet formats.

Good article but pity it only talks about word processing

Posted Nov 29, 2005 21:24 UTC (Tue) by khim (subscriber, #9252) [Link]

It would be nice if they did another article for spreadsheets though. After hearing how the Gnumeric team aren't at all impressed with ODS (and I'd trust them to know what they're talking about), I'd be very interesting to hear how it compares with other XML spreadsheet formats.

Have you actually seen reasons why Gnumeric team was not impressed ? ODS was not in fault - lack of information needed to keep formulas compatible was a problem. And this was deliberately not included in ODF: it'll require separate standard of roughly the same size as ODF! I'm pretty sure Microsoft's format does not include this information as well...

Good article but pity it only talks about word processing

Posted Dec 10, 2005 15:44 UTC (Sat) by koriordan (subscriber, #3490) [Link]

I've seen the formula stuff alright and I know there's OpenFormula in the works which should sort that out. They also complained that ODS was bloated though and they feared vendors would end up only supporting a subset of ODS. They said the Microsoft format was actually better in that regard.

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds