By Jake Edge
May 9, 2012
The Economist is concerned that our
"digital heritage" may be lost because the formats (or media) may be
unreadable in, say, 20 years time. The problem is complicated by digital
rights management (DRM), of course, and the magazine is spot on with
suggestions that circumventing those restrictions is needed to protect that
heritage. But in calls for more regulation (not a usual Economist
stance) the magazine misses one of the most important ways that digital
formats can be future-proofed: free and open data standards.
DRM is certainly a problem, but a bigger problem may well be the formats
that much of digital data is stored in. The vast majority of that data
is not stored in DRM-encumbered formats, it is, instead, stored in "secret" data
formats.
Proprietary software vendors are
rather fond of creating their own formats, updating them with some
frequency, and allowing older versions to (surprise!) become unsupported.
If users of those formats are not paying attention, documents and other
data from just a few years ago can sometimes become unreadable.
There are few advantages to users from closed formats, but there are
several for the vendors involved, of course. Lock-in and the income stream
from what become "forced" upgrades are two of the biggest reasons that
vendors continue with their "secret sauce" formats. But it is rather
surprising that users, businesses and governments in particular, haven't
rebelled. How did we get to a point where we will pay for the "privilege"
of having a vendor take our data and lock it up such that we have to pay
them, again and again, to access it?
There is a cost associated with documenting a data format, so the
proprietary vendors would undoubtedly cite that as leading to higher
purchase prices. But that's largely disingenuous. In many cases, there
are existing formats (e.g. ODF, PNG, SVG, HTML, EPUB, ...) that could be
used, or new ones that
could be developed. The easiest way to "document" a format is to release
code—not binaries—that can read it, but that defeats
much of the purpose for using the
proprietary formats in the first place so it's not something that most
vendors are willing to do.
Obviously, free software fits the bill nicely here. Not only is code
available to read the format, but the code that writes the format is there
as well. While documentation that specifies all of the different values,
flags, corner cases, and so on, would be welcome, being able to look at the
code that actually does the work will ensure that data saved in that format
can be read for years (centuries?) to come. As long as the bits that make
up the data can be retrieved from the storage medium and that quantum
computers running Ubuntu 37.04 ("Magnificent Mastodon") can still be
programmed, the data will still be accessible. There may even be a few
C/C++ programmers still around who can be lured out of retirement to help—if they aren't all busy solving the 2038 problem, anyway.
More seriously, though, maintaining access to digital data will require
some attention. Storage device technology continues to evolve, and there
are limits on the lifetime of the media itself. CDs, DVDs, hard drives,
tapes, flash, and so on all will need refreshing from time to time. Moving
archives from one medium to another is costly enough, why add potentially
lossy format
conversions and the cost of upgrading software to read the data—if
said software is even still available.
Proprietary vendors come and go; their formats right along with them.
Trying to read a Microsoft Word document from 20 years ago is likely to be
an exercise in frustration, but trying to read a Windows 3.0 WordStar
document will be far worse. There are ways to do so, of course, but they
are painful—if one can even track down a 3.5" floppy drive (not to
mention 5.25"). If the original software is still available somewhere
(e.g. Ebay, backup floppies, ...) then it may be possible to use emulators
to run the original program, but that still may not help with getting the
data into a supported format.
Amusingly, free software often supports older formats far longer than the
vendors do. While the results are often imperfect, reverse engineering
proprietary data formats is a time-honored tradition in our communities.
Once that's been done, there's little reason not to keep supporting the old
format. That's not to say that older formats don't fall off the list at
times, but the code is still out there for those who need it.
As internet services come and go, there will also be issues with preserving
data from those sources. Much of it is stored in free software
databases, though that may make little difference if there is no access to
the raw data. In addition, the database schema and how it relates articles,
comments, status updates, wall postings, and so on, is probably not
available either.
If some day Facebook, Google+, Twitter, Picasa, or any of the other
proprietary services goes away—perhaps with little or no
warning—that data may well be lost to the ages too. Some might argue
that the majority of it should be lost, but some of it certainly
qualifies
as part of our digital heritage.
Beyond the social networks and their ilk, there are a huge number of news
and information sites with relevant data locked away on their servers.
Data from things like the New York Times (or Wall Street Journal),
Boing Boing and other blogs, the article from The Economist linked above, the
articles and comments here at LWN, and thousands (perhaps millions) more,
are all things that one might like to preserve. The Internet Archive can only do so much.
Solutions for data from internet sites are tricky, since the data is
closely
held by the services and there are serious privacy considerations for some
of it. But some way to archive some of that data is needed. By the time
the service or site itself is on the ropes, it may well be too late.
Users should think long and hard before they lock up their long-term data
in closed formats. While yesterday's email may not be all that important
(maybe), that unfinished novel, last will and testament, or financial
records from the 80s may well be. Beyond that, shareholders and taxpayers
should be pressuring businesses and governments to store their documents in
open formats. In the best case scenario, it will just cost more money to
deal with old, closed-format data; in the worst case, after enough time
passes, there may be no economically plausible way to retrieve it. That is
something worth avoiding.
Comments (46 posted)
By Nathan Willis
May 9, 2012
The Tizen Project has considerable technical history on its side, as
it is the successor to the well-known Moblin, MeeGo, and
LiMo projects. Yet in a way that pedigree also works against it, as the project makes
its pitch to third-party application developers who have seen the aforementioned
predecessors come and go — sometimes first-hand.
At the first Tizen Developer Conference in San Francisco, the project
worked hard to establish its "developer story" — in particular
highlighting the broader support from industry players and the
stability of HTML5 and related open web specifications as a
development platform.
The industry
In Tuesday's keynote sessions, Intel's Imad Sousou and Samsung's
J.D. Choi took a quick tour through the platform as exposed to
application developers (a detailed examination was reserved for the
break-out sessions); the project defines a Web API that
uses the World Wide Web Consortium (W3C)'s packaging and configuration format, and "custom" APIs
for accessing contact data, NFC, Bluetooth, and other subsystems. They then
went deeper into three specific areas of the stack: security,
animation, and connection management.
The security framework is based
on Smack, which Sousou described as being preferable to other Linux
alternatives that required "setting up 8,000 policy
files." The platform also provides integrity protection by
checking application signatures at install time, and isolates each
application in its own process (although he did not go into specifics,
Sousou described the setup as less complicated than the
"draconian" measures taken by other platforms).
The animation framework is based on OpenGL ES and the Emotion scene
graph library provided by the Enlightenment Foundation Libraries (EFL),
LiMo's underlying application framework. Connection management is
handled by ConnMan, which Sousou announced had finally been declared
1.0. The project has worked on reducing ConnMan's overhead in the
past three years, specifically for mobile devices, where the typical
2-3 second DHCP configuration time is a deal-breaker for users. The
enhanced ConnMan now performs DHCP setup in milliseconds.
Several points in Sousou and Choi's talk about the architecture drew
contrasts with other mobile platforms — primarily Android and
the latest Blackberry offering. The point they made was that Tizen is
open to input on the design from anyone willing to join the project
and contribute — which is hardly the case, they suggested, for Android.
They also used their time to discuss the distinction
between the Tizen Project and the Tizen Association. The project is
the actual open source software project, which is led by a technical
steering group (headed by Sousou and Choi), and at this stage largely
developed by full-time employees from the two companies, plus smaller
partners. In contrast, the Tizen Association is the marketing group
that works to sell Tizen as a solution to OEM device makers, carriers,
third-party application vendors, and any other industry customers. In addition to marketing the project to industry players, though, the Association also attempts to gather their requirements for an OS platform.
The next keynote was presented by Kiyohito Nagata, chairman of the
Tizen Association. Nagata is also senior vice-president of NTT
Docomo, Japan's largest wireless carrier. He talked about Docomo's
research in user demands of smartphone devices, making the case that
Tizen offers carriers the flexibility to implement their own
application stores and custom services — across a range of
devices. Again, this aspect of Tizen was placed in contrast to
the competition.
Nagata ended his talk by discussing the board membership of the Tizen
Association, which includes other large mobile phone carriers —
notably Orange, Telefónica, SK Telecom, and Sprint. Tizen is
marketing itself as a cross-device platform, serving in-vehicle
systems (IVI), set-top boxes, tablets, and smartphones. That list is
identical to MeeGo's target platforms, of course, but like MeeGo the
vast majority of the talk centered around handsets — including
the keynotes and the current work of the Tizen Association.
The web
Buy-in from mobile carriers is a plus, but third-party applications
are what those carriers are interested in attracting in order to make
their plans appealing. Tizen's case as a development platform comes
down to its HTML5-based API, which was the subject of numerous
breakout sessions at the conference: from the overall API to specific
components (e.g., graphics, I/O, NFC, and Bluetooth).
Intel's Sakari Poussa and Samsung's Taehee Lee led a breakout session that covered the overall Web API suite. As we covered
when we looked at the SDK in January, a significant chunk of the Web API is drawn
from existing work spearheaded by the W3C. But there are other APIs,
some exploring ways to expose mobile device functionality to web
applications (for example, the ability to lock the screen rotation
into landscape mode, which is reportedly of interest to game
developers), others defining new general-purpose functionality like mapping-and-routing. The Tizen APIs also cover system-maintenance tasks, such as application installation, update and removal, and creating and
managing user accounts for online services.
The bigger news, however, was Sousou's announcement that the Tizen
project is working with the W3C to develop these "missing piece" APIs
into general standards. The project wants them to be standard APIs,
not "Tizen APIs," he said. In particular, Tizen is part
of the W3C's new Core Mobile Web Platform Group, and
Tizen is committed to adhering to the standard, whatever decisions the
working group makes.
Of course, standards are just words, and many developers have
heard the "write once, run anywhere" song multiple times. The "Advanced HTML5 Features" session dealt with that
question specifically, arguing that the web has always been a
fragmented platform, but that web development has evolved to cope with
varying implementation details on desktop browsers, and has done so
better than most other development platforms.
If that seems like a mild assurance, Facebook's head of mobile
developer relations James Pearce was on hand to offer a more concrete
testing tool, the company's new compliance tester RingMark. RingMark defines three levels (or
to be more precise, "rings") of compatibility: Ring 0 covers the
status quo of existing W3C device APIs, Ring 1 covers
"aspirational" extensions to Ring 0, including
audio/video and other high-performance tasks that are currently the
domain of native APIs on most platforms. Ring 2 covers the
still-in-development suite of web APIs for the future, such as WebGL.
Attendees in several of the sessions I sat in on expressed interest in
Tizen's compliance program. Although Tizen so far has no formal
compliance plan, it was made clear that compliance will be assessed
based on a product's adherence to the API. That makes for a stark
contrast against MeeGo, which demanded specific versions of specific
libraries and Linux system components — a requirements set that
ultimately proved too arduous for even MeeGo co-founder Nokia to pass
with its N9 phone.
The future
The project, then, is making its case as an HTML5-based development
platform; the next question is how it will be received by the developer community. One independent developer I talked to (who requested anonymity) expressed his doubts that HTML5 scales up to industrial devices and serious applications; he cited medical tablets among other possible upscale device classes. Most of the speakers addressed JavaScript performance and latency as points needing work in HTML5 applications, although as you might expect, most also said they were pleased with Tizen's performance.
There were a handful of companies present who are already developing applications on Tizen. Cell phone carrier Orange was among them, and presented a session on its experiences. The team from Orange has deployed HTML5 applications for news, movie ticket offers, and streaming TV, and has built enhanced user-information tools, integrating items like data and SMS counters into the phone UI.
Tizen's community manager Dawn Foster dealt with the outreach question in her state-of-the-community talk on Tuesday. In brief, the Tizen community at the moment is small; considerably smaller than the MeeGo community was, with fewer volunteer contributors joining the paid developers from Intel and Samsung. But that is to be expected, she said, primarily because it is hard to build excitement about a platform before consumer devices are available. On that front, she added, Tizen is
trying to take a different approach, by underplaying the hype of the platform and "letting the code lead." Likewise, while MeeGo established a complicated working group structure at the outset, well before any code was delivered, Tizen's project structure is intentionally loose at this stage.
Perhaps that "release-first" strategy will also help deal with the other hurdle facing Tizen, developer burnout among veterans of the earlier projects in Tizen's lineage. Fundamentally, burnout with platform-switching may be one of the reasons Tizen is pressing so hard on the HTML5 front at the moment. Whatever else developers may think of HTML5, it is at least a platform-neutral approach to application development. The keynotes talked of more options still-to-come
in the Tizen 2.0 release currently scheduled for the end of 2012 — for example, the Emotion animation framework mentioned by Choi. But at least for now, HTML5 and the web APIs remain the sole story for application developers.
Intel and Samsung are both ramping up their outreach to those
developers. Intel is running an application developer contest, while
Samsung distributed mobile developer devices to registered attendees.
Foster also highlighted two tools to develop HTML5 applications that are
designed to be lighter-weight than the full Tizen SDK: the Rapid Interface
Builder (RIB) and Web
Simulator. The contest runs until August — which is plenty of
time for developers to explore the code base. As of May 9, however, there
had still not been any consumer device announcements.
It is understandable that independent developers might be wary of Tizen given how recently they were being told about MeeGo. Ultimately no trick can undo that wariness; the only remedy will be to see the project grow in its own right and earn its own place. There are some key differences already — fairly or not, MeeGo was always perceived largely as a Nokia-only party without much connection to the all-important phone carrier industry, while Tizen has a longer list of mobile partners on board. MeeGo also presented potential contributors with a top-heavy compliance process and byzantine project structure, all well before there was any code to examine. With Tizen, however a developer feels about the commercial parties behind the scenes, there is code to see, and an API that exists outside the project itself; both of which are in the "plus" column.
[ The author would like to thank the Tizen
project and the Linux Foundation for support to attend the conference. ]
Comments (15 posted)
By Jonathan Corbet
May 8, 2012
Attentive long-time readers of LWN may remember that this business is based
entirely on free software with one distressing exception: our business
accounting is still done using the proprietary "QuickBooks Pro" package.
QuickBooks does not lack for aggravations, but the task of replacing it has
never quite attained a high enough priority for something to actually
happen. Good replacements in the free
software community are hard to come by, accounting is boring, our
accountant deals easily (and cheaply) with QuickBooks files, and the
existing solution, for the most part, simply works. Or, at least, it
used
to simply work.
The monthly accounting ritual involves importing a lot of data from the web
site into the accounting application; in particular, subscription sales
need to be properly fed in so that we can minimize our taxes on the income
in the proper American tradition. This process normally works just fine,
but, recently, it failed, saying: "Cannot import, not enough disk space or
too many records exist." Naturally, in QuickBooks style, it failed partway
through the import process, leaving a corrupted accounting file behind.
But QuickBooks users usually learn to make backups frequently and can take such
things in stride.
The inability to feed data into the system is a little harder to take in
stride, though, especially once some investigation proved that disk space
is not in short supply and the failure is elsewhere. It didn't take much
time searching to turn up an interesting, unadvertised QuickBooks
antifeature: there is a software-imposed limit of 14,500 "list items,"
which include products offered by the company, vendors, customers, and
more. Once that
limit is hit, QuickBooks will not allow any more items to be entered; the
only supported way out is to upgrade to the "enterprise" version, which can
currently be done for a special offer price of only $2400.
In other words: Intuit sells a program that is intended to become an
integral part of a business's core processes, perhaps even functioning as a
point-of-sale system. This program will, without warning, simply cease to
function once the business accumulates an arbitrary number of entries. The
only way for that business to get a working accounting system back is to
"upgrade" to a new version that costs ten times as much. One can only
conclude that this proprietary software package has not been written with
its users' needs as the top priority. Instead, it contains a hidden
trap to force them into more expensive offerings at a time when they may
have little alternative. Who would have ever thought proprietary programs
could be that way?
Here at LWN,
we had no particularly urgent need to get things working again; other
businesses may well not have the luxury of enough time to find an
acceptable way out of this situation. It is, thus, unsurprising that there
are entire businesses being built
around this little surprise from Intuit.
Needless to say, there is little enthusiasm in the LWN head office for the
purchase of an expensive and proprietary "enterprise" accounting system.
In the short term, a workaround has been found: sacrifice most of our
accounting history to bring the record count to a level where the program
will consent to function as advertised. That has other interesting side
effects, like mysteriously changing the balances of reconciled accounts
from previous years, but it does take the immediate pressure off. For now,
we can continue to do our books.
But a clear message has been delivered here: it is about time that we at
LWN read some pages from our own publication and realize that a dependence
on proprietary software poses a real risk to our business. A company that
is willing to put one such hostile surprise into an important application
will put in others and, without the source, there is no way anybody can look
for them or remove them if they are found. QuickBooks is too risky to
continue to use.
It is, in other words, time to make the move to a free accounting program.
When we have looked at the available tools in the past, the results have
always been a little disappointing. There is no shortage of software that
can maintain a chart of accounts and a set of double-ledger books. But
there has been, in the past, a relative scarcity of useful accounting tools
for small businesses. Instead, what's out there is:
- Various personal finance utilities, including GnuCash, KMyMoney,
and others. For basic accounting they work well, but they fall short
of a business's needs.
- Massive enterprise-oriented toolkits that can be used to build
systems implementing accounting, inventory-tracking, point-of-sale, customer
relationship management, supply-chain management, human resources,
and invoicing, with add-on modules for bill collection, weather
prediction, automated trading, and bread baking. These systems have
names like ADempiere, Compiere, OpenERP, LedgerSMB, and Apache OFBiz. The target users
for these projects appear to be consultants and businesses with
full-time people dedicated to keeping the system running. To a
business like LWN, they tend to look like a box with hundreds of
nearly identical parts and a little note saying "some assembly
required."
What is missing in the middle is a package for a business with no special
accounting needs, but which needs to be able to automate data entry,
generate tax forms at the end of the year, and interface with an
accountant so it can get its taxes done. Given how incredibly exciting
small-business accounting is, it's surprising that so few developers have
felt a burning need to scratch that particular itch. There is no
accounting for taste, it seems.
That said, it has been a few years since we last made a serious effort to
learn about free software accounting alternatives; clearly the time has
come for another pass. So we'll be doing it, with an eye toward,
hopefully, making the transition at the end of the calendar year. That
gives us several months to forget about the problem while still allowing a
few months of panic at the end, so the schedule should be plausible.
Stay tuned for updates, it should be an interesting ride. But we are
pretty well determined not to find out what other surprises our
proprietary accounting system may have in store for us. In 2012, it should
be possible to run a small, simple business on free software and never have
to wonder when the accounting system will stop functioning and demand more
money. We intend to prove it.
Comments (81 posted)
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Internet censorship and OONI; New vulnerabilities in argyllcms, kernel, php, python3, ...
- Kernel: The CoDel queue management algorithm; Statistics from the 3.4 development cycle; Supporting multi-platform ARM kernels.
- Distributions: Who should maintain Python for Debian?; Fedora, Mandriva, ...
- Development: LGM: Inkscape quietly evolves into a development platform; Apache OpenOffice, GIMP, nPth, sigrok, ...
- Announcements: GNOME's outreach program for women, TDF Certification program, Oracle v. Google, SAS v. WPL, ...
Next page:
Security>>