Who owns your data?
The Economist is concerned that our "digital heritage" may be lost because the formats (or media) may be unreadable in, say, 20 years time. The problem is complicated by digital rights management (DRM), of course, and the magazine is spot on with suggestions that circumventing those restrictions is needed to protect that heritage. But in calls for more regulation (not a usual Economist stance) the magazine misses one of the most important ways that digital formats can be future-proofed: free and open data standards.
DRM is certainly a problem, but a bigger problem may well be the formats that much of digital data is stored in. The vast majority of that data is not stored in DRM-encumbered formats, it is, instead, stored in "secret" data formats. Proprietary software vendors are rather fond of creating their own formats, updating them with some frequency, and allowing older versions to (surprise!) become unsupported. If users of those formats are not paying attention, documents and other data from just a few years ago can sometimes become unreadable.
There are few advantages to users from closed formats, but there are several for the vendors involved, of course. Lock-in and the income stream from what become "forced" upgrades are two of the biggest reasons that vendors continue with their "secret sauce" formats. But it is rather surprising that users, businesses and governments in particular, haven't rebelled. How did we get to a point where we will pay for the "privilege" of having a vendor take our data and lock it up such that we have to pay them, again and again, to access it?
There is a cost associated with documenting a data format, so the proprietary vendors would undoubtedly cite that as leading to higher purchase prices. But that's largely disingenuous. In many cases, there are existing formats (e.g. ODF, PNG, SVG, HTML, EPUB, ...) that could be used, or new ones that could be developed. The easiest way to "document" a format is to release code—not binaries—that can read it, but that defeats much of the purpose for using the proprietary formats in the first place so it's not something that most vendors are willing to do.
Obviously, free software fits the bill nicely here. Not only is code available to read the format, but the code that writes the format is there as well. While documentation that specifies all of the different values, flags, corner cases, and so on, would be welcome, being able to look at the code that actually does the work will ensure that data saved in that format can be read for years (centuries?) to come. As long as the bits that make up the data can be retrieved from the storage medium and that quantum computers running Ubuntu 37.04 ("Magnificent Mastodon") can still be programmed, the data will still be accessible. There may even be a few C/C++ programmers still around who can be lured out of retirement to help—if they aren't all busy solving the 2038 problem, anyway.
More seriously, though, maintaining access to digital data will require some attention. Storage device technology continues to evolve, and there are limits on the lifetime of the media itself. CDs, DVDs, hard drives, tapes, flash, and so on all will need refreshing from time to time. Moving archives from one medium to another is costly enough, why add potentially lossy format conversions and the cost of upgrading software to read the data—if said software is even still available.
Proprietary vendors come and go; their formats right along with them. Trying to read a Microsoft Word document from 20 years ago is likely to be an exercise in frustration, but trying to read a Windows 3.0 WordStar document will be far worse. There are ways to do so, of course, but they are painful—if one can even track down a 3.5" floppy drive (not to mention 5.25"). If the original software is still available somewhere (e.g. Ebay, backup floppies, ...) then it may be possible to use emulators to run the original program, but that still may not help with getting the data into a supported format.
Amusingly, free software often supports older formats far longer than the vendors do. While the results are often imperfect, reverse engineering proprietary data formats is a time-honored tradition in our communities. Once that's been done, there's little reason not to keep supporting the old format. That's not to say that older formats don't fall off the list at times, but the code is still out there for those who need it.
As internet services come and go, there will also be issues with preserving data from those sources. Much of it is stored in free software databases, though that may make little difference if there is no access to the raw data. In addition, the database schema and how it relates articles, comments, status updates, wall postings, and so on, is probably not available either. If some day Facebook, Google+, Twitter, Picasa, or any of the other proprietary services goes away—perhaps with little or no warning—that data may well be lost to the ages too. Some might argue that the majority of it should be lost, but some of it certainly qualifies as part of our digital heritage.
Beyond the social networks and their ilk, there are a huge number of news and information sites with relevant data locked away on their servers. Data from things like the New York Times (or Wall Street Journal), Boing Boing and other blogs, the article from The Economist linked above, the articles and comments here at LWN, and thousands (perhaps millions) more, are all things that one might like to preserve. The Internet Archive can only do so much.
Solutions for data from internet sites are tricky, since the data is closely held by the services and there are serious privacy considerations for some of it. But some way to archive some of that data is needed. By the time the service or site itself is on the ropes, it may well be too late.
Users should think long and hard before they lock up their long-term data in closed formats. While yesterday's email may not be all that important (maybe), that unfinished novel, last will and testament, or financial records from the 80s may well be. Beyond that, shareholders and taxpayers should be pressuring businesses and governments to store their documents in open formats. In the best case scenario, it will just cost more money to deal with old, closed-format data; in the worst case, after enough time passes, there may be no economically plausible way to retrieve it. That is something worth avoiding.
Posted May 10, 2012 2:36 UTC (Thu)
by Comet (subscriber, #11646)
[Link] (1 responses)
They have archived copies of the Microsoft document format specifications; much as we might dislike it, the content they need to preserve is the content created by most of the populace. But folks trying to establish parity for other formats should probably reach out to the preservation officers of the BL to get their specifications archived too.
Posted May 10, 2012 13:55 UTC (Thu)
by pboddie (guest, #50784)
[Link]
Although welcome, this raises additional issues. Given this apparent safety net, people are now likely to say "Great, we're covered!" And then they will carry on churning out proprietary format content. But we are not covered. Firstly, we don't even know if the specifications are complete or accurate. This is Microsoft we're talking about, so although it is possible that these published specifications have had some auditing as part of a regulatory action in the European Union, we can't be sure that they are usable until someone produces a separate implementation. Secondly, people will happily start producing content in later versions of those formats which aren't covered by publicly available specifications. Again, we're talking about Microsoft, so any remedy for trouble they have managed to get themselves into will only last as long as the company is under scrutiny. Then, it's back to business as usual. Meanwhile, nobody in wider society will have been educated about the pitfalls of such proprietary formats and systems. Thirdly, the cost of preservation under such initiatives may well be borne by the people whose data is now imprisoned in such formats, instead of the people responsible for devising the format in the first place. In various environments, there are actually standards for archiving, although I can well imagine that those responsible for enforcing such standards have been transfixed by the sparkle of new gadgetry, the soothing tones of the sales pitch, and the quick hand-over of an awkward problem to a reassuring vendor. Public institutions and the public in general should not have to make up the shortfall in the vendors' lack of investment. Finally, standards compliance is awkward enough even when standards are open and documented. One can argue that a Free Software reference implementation might encourage overdependence on a particular technology and its peculiarities, potentially undermining any underdocumented standard, but this can really only be fixed when you have a functioning community and multiple Free Software implementations: then, ambiguities and inconsistencies are brought to the surface and publicly dealt with. Sustainable computing and knowledge management requires a degree of redundancy. Mentions of the celebrated case of the BBC Domesday Project often omit the fact that efforts were made to properly document the technologies involved - it is usually assumed that nobody had bothered, which is not the case - but had that project been able to take advantage of widely supported, genuinely open standards, misplacing documentation would have had a substantially smaller impact on preservation activities. Indeed, with open formats and appropriate licensing of the content, the output of the project might have been continuously preserved, meaning that the content and the means of deploying it would have adapted incrementally as technology progressed. That's a much more attractive outcome than sealing some notes in a box and hoping that future archaeologists can figure them out.
Posted May 10, 2012 3:40 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link] (3 responses)
Multimedia seems to be at least a minor exception to this. Most of the important formats were created by industry consortia, so they're fairly well documented and widely available. That also limits the extent to which companies can tamper with the formats in an attempt at user control, since they have to retain compatibility with well established standards. The biggest exception I can think of are the huge range of proprietary raw image formats created by digital camera companies, and even there we have projects like dcraw that have effectively documented the formats in the form of functioning decoding code.
Posted May 10, 2012 11:15 UTC (Thu)
by robbe (guest, #16131)
[Link] (2 responses)
May I remind you of realaudio, indeo, cinepak, etc.? Videos of this time (1990s) were generally too crappy to remember, but a lot of actually useful audio recordings are still locked up in RA format.
Posted May 11, 2012 21:27 UTC (Fri)
by rgmoore (✭ supporter ✭, #75)
[Link]
Didn't Real Audio eventually release an Open Source version of their player?
Posted May 12, 2012 15:51 UTC (Sat)
by jengelh (guest, #33263)
[Link]
Posted May 10, 2012 4:19 UTC (Thu)
by djfoobarmatt (guest, #6446)
[Link]
Posted May 10, 2012 5:42 UTC (Thu)
by eru (subscriber, #2753)
[Link] (1 responses)
At work I have sometimes had reasons to do precisely that, and found OpenOffice manages it better than modern MS Office. Only the layout may be a bit off. Actually the old Word format is simple enough that even strings(1) can be used to recover the plain text parts.
if one can even track down a 3.5" floppy drive (not to mention 5.25").
That's one reason I'm keeping a couple of those in my basement (along with a couple of old computers). Actually 5.25" formatted as 360k (DSDD) is surprisingly durable. I once did a transfer job for an author who wanted to access some old manuscripts (?, can you call them that?) written on a MS-DOS machine with Wordstar and kept in boxes of 5.25" disks for 10 years, and found just two or three files that failed to be read.
Posted May 10, 2012 14:43 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted May 10, 2012 9:23 UTC (Thu)
by philipstorry (subscriber, #45926)
[Link] (25 responses)
Assuming that we can't find a specification for the WordStar file in question, it will be an inconvenience. It would probably be easier to find the application, install it on a VM, and re-save the file to an acceptable intermediate format than to do any kind of reverse engineering. But if file fidelity is important, then the original software may be the only option anyway. I've tried to open old Word 2.x/6.x for Windows documents with recent versions of Word - and if there's any complex formatting, it's pretty much a waste of time. There's a naive assumption here that software with the same brand name (if it survives the years) is always going to be backwards compatible. Not only is that not born out with my own experience today, but I suspect it will only get worse. Ultimately, if you want to still be able to access it in the future with decent fidelity, I see only three options.
Yes, having to test (and upgrade to later versions if necessary) a VM image every year will be a pain. But it's probably the only reliable way. If text documents are this much of a hassle, despite being the largest type of file by count, imagine how painful the other formats are going to be!
Posted May 10, 2012 17:36 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (10 responses)
Posted May 10, 2012 23:42 UTC (Thu)
by mrons (subscriber, #1751)
[Link] (9 responses)
Posted May 11, 2012 2:34 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (5 responses)
(Not to mention that building TeX requires implementations of at least two language dialects (WEB and \ph) which aren't used for anything else on any modern system; it's easier to make an emulator for the computers that Wordperfect ran on than to make a compiler able to build TeX, although people have done both.)
Posted May 11, 2012 4:33 UTC (Fri)
by eru (subscriber, #2753)
[Link] (2 responses)
Huh? Most Linux distributions provide a TeX package. I believe it is build using a portable C implementation of WEB (web2c), which is a source to source translator. So just C is required for that part. Browsing the READMEs of a recent TeX for Linux implementation (http://www.tug.org/svn/texlive/trunk/Build/) there certainly are also other dependencies for building and auxiliary programs, but that is stuff that typical Linux implementations already provide. Of course bootstrapping TeX for a very different computer and OS from scratch would be a lot of work, but at least it is possible, thanks to the good documentation of teX and its source.
Posted May 11, 2012 16:03 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (1 responses)
Posted May 11, 2012 17:07 UTC (Fri)
by eru (subscriber, #2753)
[Link]
Posted May 22, 2012 21:15 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (1 responses)
To the best of my knowledge, WordPerfect files are both backwards AND forwards compatible between v6.0 (released in 1994 as I said) and the latest version.
So incompatibility like this is a deliberate or accidental vendor choice, not something that is inevitable ...
Cheers,
Posted May 26, 2012 19:06 UTC (Sat)
by mirabilos (subscriber, #84359)
[Link]
Posted May 11, 2012 4:49 UTC (Fri)
by eru (subscriber, #2753)
[Link]
I mostly agree from personal experience. I have some large LaTeX documents that were started that long ago, and which I still maintain now and then. Not quite pure LaTeX, because they contain diagrams that were done with xfig (but that also is still available, and quite good for simple diagrams). Some changes in LaTeX (mainly the transition to 3.x) required minor changes to the source, but these were limited just to the macro settings at the beginning of the document. Also I started to use some PostScript-related font packages for much improved PDF output, which slightly changed final layout. But the bulk of the text has not needed any changes attributable only to the formatting tool evolution. Supposing I had not been maintaining the documents for 20 years, suddenly getting them formatted with current versions of the tools might be slightly more work, but not much.
Posted May 11, 2012 5:07 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted May 12, 2012 15:56 UTC (Sat)
by jengelh (guest, #33263)
[Link]
Posted May 12, 2012 19:20 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (12 responses)
...
Yes, having to test (and upgrade to later versions if necessary) a VM image every year will be a pain. But it's probably the only reliable way.
Do you mean every year forever, even long after you're dead, or just every year while you're creating documents?
Posted May 13, 2012 18:59 UTC (Sun)
by philipstorry (subscriber, #45926)
[Link] (11 responses)
But I mean every year that you want to be able to retrieve the documents, you should make sure your VM works, migrate it to new storage if necessary, and (if it's needed) upgrade it to work with the version of VM software you're using.
Otherwise, in a decade's time, you'll probably end up firing your VM up, only to find that the image is no longer a supported version and doesn't run anymore.
Posted May 13, 2012 19:57 UTC (Sun)
by giraffedata (guest, #1954)
[Link] (10 responses)
OK, well I think that misses the point of the article, which talks about "heritage." Keeping your own active data usable is one thing, but a more complex concern is storing data for many generations and having it be usable by society at large at a point when it's considered history.
For that, something that requires a significant amount of effort to keep the data vital would probably be more costly than just discarding the data, so people are looking for ways just to stick something in a corner for 50 years, largely forget about it, and still have a decent chance of being able to use it.
Updating all your document reading tools each year to be compatible with this year's environment is an example of something so costly we assume it won't be done. In fact, I think updating the documents regularly would be more practical.
Posted May 13, 2012 20:00 UTC (Sun)
by philipstorry (subscriber, #45926)
[Link] (9 responses)
You only need one VM for all your data, because the access method would be to present some storage with the files you want to the VM.
I don't think updating all my documents each year would be practical. The idea that it would be practical for a heritage model seems ridiculous.
The VM is probably the best method we will have to ensure fidelity. It's the least amount of work for the best return.
Posted May 13, 2012 20:53 UTC (Sun)
by giraffedata (guest, #1954)
[Link] (8 responses)
The problem is that some day, your old VM won't run on the new VM host, so you have to update the VM operating system, and you old Word won't run on the new VM operating system.
You acknowledged the concern that the new VM host might not be able to read a CD, but your solution of a disk image (a VM host file that the VM sees as a disk drive, I presume) has the same problem. Not only do you have to store the disk image file on some medium from which the new VM host can read bits, but the new VM host has to be able to interpret those bits as virtual disk content.
Agreed.
It still looks to me less likely to succeed than updating the documents.
If we solve the problem (i.e. improve our data heritage), I think it will be like the Economist proposes: with agreements among ourselves to maintain archive formats.
Posted May 13, 2012 22:04 UTC (Sun)
by apoelstra (subscriber, #75205)
[Link] (4 responses)
Posted May 13, 2012 22:29 UTC (Sun)
by giraffedata (guest, #1954)
[Link] (3 responses)
I can't tell what you're describing. Can you phrase this without the word "support" so it's more precise?
I'm also unclear on what "the VM you want" is and whether when you say OS, you're talking about a particular instance or a class such as "Fedora".
Posted May 13, 2012 22:54 UTC (Sun)
by apoelstra (subscriber, #75205)
[Link] (2 responses)
Suppose your documents live in Wordstar for windows 3.1. So you keep Windows 3.1/MS-DOS 5 on a VM for a while. But one day you wake up to Windows 7, and it's 64-bit only, and won't your old DOS-supporting VM software anymore.
(I don't know if this is actually a problem. It's just an example.)
So you go ahead and install XP in a VM under Windows 7. On XP, you run a VM containing DOS, on which you run Wordstar.
Some years later, XP won't run on a VM since it's 2045 and nobody has heard of BIOS anymore. So you have a VM running Win7, which runs a VM running XP, which runs a VM running DOS, which finally runs Wordstar.
Then 25 years later, your VM software doesn't work, so you add another layer...
Posted May 14, 2012 13:00 UTC (Mon)
by philipstorry (subscriber, #45926)
[Link] (1 responses)
I think that at some point - probably host-architecture bound - we have to switch from VM-as-supervisor to straight emulation.
I've mentioned this in another reply, so apologies if you've read it already - but basically, when your VM solution finally stops supporting your version of the client OS then it's time to look at a switch to emulating the entire machine, QEMU style.
The advantage of that is that the emulation is much more likely to last longer, albeit be somewhat slower to run.
The Intel/AMD 64-bit chips are (I believe) incapable of running 16-bit code when in 64-bit mode. They can run 32-bit, just not 16-bit.
Rather than rely on that assumed emulation, I think we should build in a stage where we simple say "all 16-bit code is emulated", and prepare for the idea that 128-bit processors in a decade or two might mean we have to add 32-bit code to the "emulated by default" pile.
That stops us from having to do VMs within VMs, as you describe. (And if the chip won't run the code, and there's no emulator, I'm not sure VMs within VMs will work anyway.)
Posted May 14, 2012 13:19 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Further, you are assuming that for every such system that becomes obsolete there will be a VM on the newer system to run the older system. This is far from guaranteed. If that assumption ever fails, access to that document is lost after that time. For that assumption to always be true, every system needs to be sufficiently well documented by its maker that someone in the future will be able to emulate it. I.e. it assumes your chain of VMs will never become dependent on a monolithically proprietary system.
So, given that your VM system also relies on open specifications, wouldn't it be much better & simpler to just work towards ensuring documents are stored in openly specified formats? That seems far more future proof to me..
Posted May 14, 2012 12:36 UTC (Mon)
by philipstorry (subscriber, #45926)
[Link] (2 responses)
For example, if you're using VirtualBox, then at some point the version of Windows may no longer be supported as a client OS. That's the time to shift the image to something like QEMU.
I should point out I wasn't envisaging the idea of a VM disk image as long-term storage, more as a transport medium. If there's genuinely no other way to get the data into the VM to be used, then simply giving it a fake hard disk is the ideal method - use a more modern VM to save the data to the disk, shut that down and then present it to the VM that has the software you need.
I envisage the data itself being seperate to the VMs themselves in all of this - the VMs should be small "access points". The disk image idea is just a way to get data into them temporarily.
So, to be clear, we have a two parts to the solution - your storage, which you can do what you want with. Keep multiple copies, keep checking the medium is good (via md5sum or similar), and so forth. And the access system, which is a VM you check once a year. And if it needs to be updated/transitioned to emulation, at least you know and can deal with tha.
On a very large scale, this divides the work between two teams - a storage team maintain the actual archives, and an apps team who maintain the access.
Of course, this is only if we want full fidelity. If we're OK with bad reformatting by a later version of the program, then we don't need the second team at all. :-)
Posted May 14, 2012 14:25 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Why? There's no innate reason for that. Emulators can be rewritten and/or forward ported. Besides, x86 is highly documented and known. I wouldn't be surprised if it would still be used in 1000 years.
Posted May 14, 2012 16:08 UTC (Mon)
by giraffedata (guest, #1954)
[Link]
Remember the parameters of the problem. We're not talking in this thread about what society could do here; we're talking about a strategy one person could use to make his data live forever. (If we branch out into the larger question, then we can consider things like making laws that people have to make emulators available to other people).
The fear is that people won't care enough about old documents to make the substantial investment in that forward porting. We see backward compatibility broken all the time, so it's a valid concern.
Given that, a QEMU platform is surely a better guess at something the next Windows will run on than a VirtualBox platform. (If VirtualBox VMs become far more common hosts of Windows than x86 hardware, the opposite will be true).
A system based on a chain of virtualization, which relies on there always being N-1 compatibility (the world will never switch to a new platform that can't run the previous one as a guest) also could work, but I think there's a good chance that compatibility chain will be broken in the natural course of things.
Posted May 14, 2012 19:41 UTC (Mon)
by rgmoore (✭ supporter ✭, #75)
[Link]
I disagree. As I see it, there are only two ways of preserving a file: in an editable form that's intended to be updated further or in an archival form that's intended to preserve the file as close to its existing form as possible. If you intend to edit the file further, you can't guarantee that you'll be able to preserve its existing formatting anyway, so you might as well migrate it to a modern, well documented format like ODF while trying to preserve the existing formatting as well as possible. If you're trying to preserve it as a finished, archival document, you're best off translating it into a format like Postscript or PDF that is properly designed to preserve formatting at the expense of being editable.
What you really don't want to do is to rely on a brittle solution like running old software in a VM. It may be able to preserve fidelity a little bit better than the alternatives, but that only works as long as you have a working VM. Going from perfect fidelity to nothing is not a graceful failure! Rather than worrying about maintaining a working VM indefinitely, you'd be much better off spending your effort on a virtual printer for your existing VM that would let you export all your documents to PDF.
Posted May 10, 2012 10:48 UTC (Thu)
by stevan (guest, #4342)
[Link]
I'm not complaining - I think both free software and free and open formats are by no means lacking, and intellectually they are the way to go, but when it comes to applying them it's quite difficult to think practically into the longer term.
And when I say "we," above, I mean "I," as many of these nuances are lost on users of the Archive. It is necessary to have a story to tell to explain why docx gets a swift blow from the digital lead piping when it reaches the Archive.
S
Posted May 10, 2012 11:45 UTC (Thu)
by robbe (guest, #16131)
[Link]
That's where DRM comes in. Since it is basically an arm's race there is always motivation to crank out new schemes and formats.
The most problematic restrictions managment requires an online server to open a document. When this server inevitably goes away, the only hope of future historians is that they can easily crack our puny crypto on their (quantum?)computers.
Posted May 10, 2012 14:13 UTC (Thu)
by nsheed (subscriber, #5151)
[Link]
As an example, TIFF of all things has proved to be an ongoing source of pain due to the joys of a) odd JPEG usage in older files, b) explicitly allowing for vendor extensions in the specification (annotations & highlighting that appear/disappear depending on the viewing app).
So far most of the issues of this type are work-aroundable, the issue is every time we hit a new scenario it takes time to investigate/find workarounds (if possible)/go back to the source (again if possible - discovery of an issue may be months/years after file creation).
Posted May 10, 2012 20:47 UTC (Thu)
by roblatham (guest, #1579)
[Link]
In the end, i installed debian sarge in a chroot (thanks archive.debian.org!) so I could run gnucash 1.8 without building the 20 little dependencies.
Posted May 11, 2012 4:03 UTC (Fri)
by ringerc (subscriber, #3071)
[Link] (1 responses)
We have a vast library of material in QuarkXPress 3.3 and QuarkXPress 4.0 format. Quark has never been what you'd call an "open" company; this is, after all, the company whose CEO has said that "all customers are liars, thieves and bastards".
Quark upgrades are expensive. Old versions of Quark don't work on newer OSes, and Quark doesn't fix even simple bugs in old versions. New versions of Quark don't import documents from old versions all that reliably, especially where things like font format changes are involved. More importantly, if you move to a non-Quark product, you lose access to all your historical work, because you have to keep on paying Quark upgrade fees to retain access to it on updated systems.
We landed up keeping an old Mac around to open Quark docs, and another slightly-less-old machine that has an importer plugin for InDesign that lets us open old Quark docs, convert them to a slightly less old InDesign format, save that, and open it in our current versions of InDesign.
Of course, InDesign has exactly the same problems as Quark; it's a locked down format under Adobe's total control. The problem continues to grow as we produce more work.
While everything is in PDF format too, that's not much good if we need to edit it - and there simply are no good open standard desktop publishing formats. OpenDocument is very poorly suited to DTP's layout-oriented approach, detailed typography, etc. Scribus's format isn't specified formally, is painful to work with, evolves continuously, and may as well be closed because nothing else supports it. There isn't anything else out there.
My point: Sometimes we'd like to avoid closed formats, but there aren't any alternatives to choose. The newspaper's technical debt keeps on growing, and there's not much I can do about it, as we're way too small to have the resources to create a competing format and support for it.
Posted May 11, 2012 10:37 UTC (Fri)
by ebirdie (guest, #512)
[Link]
However Quark-files make an exception. The end product of publishing still finds its way to paper many times, on paper it gets distributed and can't be in such a stranglehold of file formats, DRM, cloud services etc as digital information. As the famous clause goes "information wants to be free" it is still too easy to forget that the freedom has independence coded into it. And here everyone knows, what free code is, but it seems like it hasn't yet produced as free information as paper still does - although there is arguably more restrictions printed on paper nowadays.
Seeing the current challenges and threads in future in maintaining free information, I'm glad that paper as medium was invented first in humankind. At least the paper offers some reference. It seems to be good business to reinvent everything digitally, so digital information is doomed. It is much cheaper and effortless to me give space to my books and carry them while moving (except I'll do everything in my power to not move anymore) than do the required work to maintain digital information usable and accessible.
Posted May 12, 2012 15:59 UTC (Sat)
by jengelh (guest, #33263)
[Link]
Sometimes we would be glad if Facebook, Google, et al lost all their user profiling data about us in an instant because of that.
Posted May 18, 2012 1:57 UTC (Fri)
by steffen780 (guest, #68142)
[Link] (3 responses)
Alternatively, could we get an official ok for using a script/tool to (slowly) run through the archives and download everything? Feels naughty to just download it all.
Posted May 18, 2012 2:26 UTC (Fri)
by apoelstra (subscriber, #75205)
[Link]
Posted May 18, 2012 13:43 UTC (Fri)
by corbet (editor, #1)
[Link] (1 responses)
We have various schemes for improving access to the archives. A lot of things are on hold at the moment, unfortunately, but stay tuned, we'll get there.
Posted May 18, 2012 14:52 UTC (Fri)
by jackb (guest, #41909)
[Link]
Who owns your data?
Who owns your data?
They have archived copies of the Microsoft document format specifications; much as we might dislike it, the content they need to preserve is the content created by most of the populace.
Who owns your data?
Who owns your data?
Who owns your data?
Who owns your data?
Who owns your data?
Trying to read a Microsoft Word document from 20 years ago is likely to be an exercise in frustration,
Who owns your data?
The Floppies with Prince of Persia source code for the Apple II were also successfully read after 20 years!
I don't recall 1.44 MB floppies to be particularly durable though.
Who owns your data?
On text documents
Proprietary vendors come and go; their formats right along with them. Trying to read a Microsoft Word document from 20 years ago is likely to be an exercise in frustration, but trying to read a Windows 3.0 WordStar document will be far worse.
On text documents
On text documents
On text documents
(Not to mention that building TeX requires implementations of at least two language dialects (WEB and \ph) which aren't used for anything else on any modern system; it's easier to make an emulator for the computers that Wordperfect ran on than to make a compiler able to build TeX, although people have done both.)
On text documents
On text documents
Still have to disagree here. I'm pretty sure I could port web2c to a new platform in an evening or two, provide it has a decent ANSI C compiler (which is now a very common piece of infrastructure and can legitimately be assumed). Porting DOSBOX would be a much larger task, unless the new target is very similar to some of the existing ones. Yes, there is more documentation about x86 and DOS, because a lot more is needed to describe the complicated and ugly interface, and it is still incomplete...On text documents
I have found bugs in DOSBOX, which I currently use to support some legacy cross-compilation tools at my workplace. Also used DOSEMU+FreeDOS for the same task, and found it has some different bugs... I could work around the problems for the limited set of programs that were needed. But the fact is the only thing that is completely MS-DOS compatible for all programs still is the original MS-DOS.
the computers that Wordperfect ran on
Wol
the computers that Wordperfect ran on
Of course if you had used TeX 20 years ago, the document would look exactly the same today, even down to the line-breaks, and be in an easily editable form.
On text documents
On text documents
On text documents
On text documents
Ultimately, if you want to still be able to access it in the future with decent fidelity, I see only three options.
On text documents
On text documents
On text documents
(Granted, there may be a point where the lack of USB or CD support on hardware may mean that you have to present it with a disk image, but it's still fairly trivial.)
On text documents
I don't think updating all my documents each year would be practical.
The idea that it would be practical for a heritage model seems
ridiculous.
The VM is probably the best method we will have to ensure fidelity
On text documents
On text documents
When your OS no longer supports the VM you want, you should just run an
old version in a VM, so your document will be on a VM-within-a-VM. Then
eventually you'll need a VM-within-a-VM-within-a-VM, and so on...
On text documents
On text documents
So we're already at the point where VM systems are unable to run some old OSes or apps without resorting to emulation behind the scenes.
On text documents
On text documents
On text documents
On text documents
Emulators can be rewritten and/or forward ported.
On text documents
Yes, having to test (and upgrade to later versions if necessary) a VM image every year will be a pain. But it's probably the only reliable way.
What it means in practice.
Digital Restrictions Management
Who owns your data?
Even free software can age less than gracefully
Who owns your data?
Who owns your data?
Who owns your data?
Who owns your data?
Even more bonus points if it's under a free or CC license (I say "or" because for this I'd consider NC-ND perfectly acceptable, though I'm not a huge fan of that one).
Who owns your data?
Please don't play "download the whole LWN site." We have enough people doing that as it is for no real reason that I can figure out.
Who owns your data?
Who owns your data?
We have enough people doing that as it is for no real reason that I can figure out.
It may be related to brand management businesses.
