Accessible? Yeah, right.
Accessible? Yeah, right.
Posted Mar 25, 2009 10:24 UTC (Wed) by khim (subscriber, #9252)In reply to: Huh? by pboddie
Parent article: Stallman: the JavaScript trap
Frequently, document formats can be converted to purely textual formats and remain accessible.
Have you actually tried to convert document with a lot of rich data (tables, graphs, etc) to textual format? It's as legible as text on websites with JavaScript turned off: you can decipher the content... sometimes... if you are lucky... But in general it's often illegible.
I haven't studied PDF in depth, but it would appear to be a lot more like a genuine document format, despite various programmatic extensions for things like form filling, than PostScript.
It was true some time ago. Last versions iclude ECMAScript inyerpreter - and you can do a lot with it... actually some tools already use this capability. And a lot of texts are only available in PostScript form.
Sure, the text in a PostScript document is "in there somewhere", but you don't really want to be given the job of writing a program to get at it.
It's not even necessary true. Have you tried to work with PDF created from TeX not via pdftex, but via "old good" tex->dvi->ps->pdf way? It's a mess. DVI to PS conversion transfers "Hello world!" to something like !"##$%&$'#() and there are no easy way to get legible text back (it's done not out of malice but because it was easier: to reduce size of PS file dvips will create special fonts with glyphs: first letter in the document will be put as " ", next one as "!" and so on - when all 95 positions in first font are filled out the second one is started). Thus resulting PS (and then PDF) can be viewed and printed but that's it - you can not easily pull text back... It's easy for any decent cryptoanalyst, but not for normal person...
In contrast, HTML documents should generally preserve the accessibility of their content.
Yeah, it was the idea behind HTML. But like PDF HTML evolves and this idea is in the past. Today HTML is treated like "new PostScript": you have original version of content somewhere, but what the site actually serves is not an easily parseable document but more like opaque program for web- browser...
Try saving the page source in a JavaScript-intense application - you won't get anything meaningful, even though getting the content being shown is a legitimate thing to do. That's why the ability to control and modify the code has become an important and desirable thing to do.
You lost me at the last step. Why this ability is not important and desirable for PostScript and PDF but suddenly important and desirable for HTML? If HTML is "a new PostScript" then it should be treated as such: demand content in easy to use and understand formats (like ODS or even "simple HTML" with just a few markup tegs), don't try to turn sausage back to cow...
