On text documents
Posted May 10, 2012 9:23 UTC (Thu) by philipstorry
Parent article: Who owns your data?
Proprietary vendors come and go; their formats right along with them. Trying to read a Microsoft Word document from 20 years ago is likely to be an exercise in frustration, but trying to read a Windows 3.0 WordStar document will be far worse.
Assuming that we can't find a specification for the WordStar file in question, it will be an inconvenience. It would probably be easier to find the application, install it on a VM, and re-save the file to an acceptable intermediate format than to do any kind of reverse engineering.
But if file fidelity is important, then the original software may be the only option anyway. I've tried to open old Word 2.x/6.x for Windows documents with recent versions of Word - and if there's any complex formatting, it's pretty much a waste of time.
There's a naive assumption here that software with the same brand name (if it survives the years) is always going to be backwards compatible. Not only is that not born out with my own experience today, but I suspect it will only get worse.
Ultimately, if you want to still be able to access it in the future with decent fidelity, I see only three options.
- Plain Text (as plain as you can get, but if you need unicode that'll probably work too)
- PDF/A (it seems like a reasonable bet)
- The exact format you're using now, and a VM image you update/migrate yearly
Yes, having to test (and upgrade to later versions if necessary) a VM image every year will be a pain. But it's probably the only reliable way.
If text documents are this much of a hassle, despite being the largest type of file by count, imagine how painful the other formats are going to be!
to post comments)