User: Password:
Subscribe / Log in / New account Weekly Edition for May 10, 2012

Who owns your data?

By Jake Edge
May 9, 2012

The Economist is concerned that our "digital heritage" may be lost because the formats (or media) may be unreadable in, say, 20 years time. The problem is complicated by digital rights management (DRM), of course, and the magazine is spot on with suggestions that circumventing those restrictions is needed to protect that heritage. But in calls for more regulation (not a usual Economist stance) the magazine misses one of the most important ways that digital formats can be future-proofed: free and open data standards.

DRM is certainly a problem, but a bigger problem may well be the formats that much of digital data is stored in. The vast majority of that data is not stored in DRM-encumbered formats, it is, instead, stored in "secret" data formats. Proprietary software vendors are rather fond of creating their own formats, updating them with some frequency, and allowing older versions to (surprise!) become unsupported. If users of those formats are not paying attention, documents and other data from just a few years ago can sometimes become unreadable.

There are few advantages to users from closed formats, but there are several for the vendors involved, of course. Lock-in and the income stream from what become "forced" upgrades are two of the biggest reasons that vendors continue with their "secret sauce" formats. But it is rather surprising that users, businesses and governments in particular, haven't rebelled. How did we get to a point where we will pay for the "privilege" of having a vendor take our data and lock it up such that we have to pay them, again and again, to access it?

There is a cost associated with documenting a data format, so the proprietary vendors would undoubtedly cite that as leading to higher purchase prices. But that's largely disingenuous. In many cases, there are existing formats (e.g. ODF, PNG, SVG, HTML, EPUB, ...) that could be used, or new ones that could be developed. The easiest way to "document" a format is to release code—not binaries—that can read it, but that defeats much of the purpose for using the proprietary formats in the first place so it's not something that most vendors are willing to do.

Obviously, free software fits the bill nicely here. Not only is code available to read the format, but the code that writes the format is there as well. While documentation that specifies all of the different values, flags, corner cases, and so on, would be welcome, being able to look at the code that actually does the work will ensure that data saved in that format can be read for years (centuries?) to come. As long as the bits that make up the data can be retrieved from the storage medium and that quantum computers running Ubuntu 37.04 ("Magnificent Mastodon") can still be programmed, the data will still be accessible. There may even be a few C/C++ programmers still around who can be lured out of retirement to help—if they aren't all busy solving the 2038 problem, anyway.

More seriously, though, maintaining access to digital data will require some attention. Storage device technology continues to evolve, and there are limits on the lifetime of the media itself. CDs, DVDs, hard drives, tapes, flash, and so on all will need refreshing from time to time. Moving archives from one medium to another is costly enough, why add potentially lossy format conversions and the cost of upgrading software to read the data—if said software is even still available.

Proprietary vendors come and go; their formats right along with them. Trying to read a Microsoft Word document from 20 years ago is likely to be an exercise in frustration, but trying to read a Windows 3.0 WordStar document will be far worse. There are ways to do so, of course, but they are painful—if one can even track down a 3.5" floppy drive (not to mention 5.25"). If the original software is still available somewhere (e.g. Ebay, backup floppies, ...) then it may be possible to use emulators to run the original program, but that still may not help with getting the data into a supported format.

Amusingly, free software often supports older formats far longer than the vendors do. While the results are often imperfect, reverse engineering proprietary data formats is a time-honored tradition in our communities. Once that's been done, there's little reason not to keep supporting the old format. That's not to say that older formats don't fall off the list at times, but the code is still out there for those who need it.

As internet services come and go, there will also be issues with preserving data from those sources. Much of it is stored in free software databases, though that may make little difference if there is no access to the raw data. In addition, the database schema and how it relates articles, comments, status updates, wall postings, and so on, is probably not available either. If some day Facebook, Google+, Twitter, Picasa, or any of the other proprietary services goes away—perhaps with little or no warning—that data may well be lost to the ages too. Some might argue that the majority of it should be lost, but some of it certainly qualifies as part of our digital heritage.

Beyond the social networks and their ilk, there are a huge number of news and information sites with relevant data locked away on their servers. Data from things like the New York Times (or Wall Street Journal), Boing Boing and other blogs, the article from The Economist linked above, the articles and comments here at LWN, and thousands (perhaps millions) more, are all things that one might like to preserve. The Internet Archive can only do so much.

Solutions for data from internet sites are tricky, since the data is closely held by the services and there are serious privacy considerations for some of it. But some way to archive some of that data is needed. By the time the service or site itself is on the ropes, it may well be too late.

Users should think long and hard before they lock up their long-term data in closed formats. While yesterday's email may not be all that important (maybe), that unfinished novel, last will and testament, or financial records from the 80s may well be. Beyond that, shareholders and taxpayers should be pressuring businesses and governments to store their documents in open formats. In the best case scenario, it will just cost more money to deal with old, closed-format data; in the worst case, after enough time passes, there may be no economically plausible way to retrieve it. That is something worth avoiding.

Comments (46 posted)

TizenConf: Pitching HTML5 as a development framework

By Nathan Willis
May 9, 2012

The Tizen Project has considerable technical history on its side, as it is the successor to the well-known Moblin, MeeGo, and LiMo projects. Yet in a way that pedigree also works against it, as the project makes its pitch to third-party application developers who have seen the aforementioned predecessors come and go — sometimes first-hand. At the first Tizen Developer Conference in San Francisco, the project worked hard to establish its "developer story" — in particular highlighting the broader support from industry players and the stability of HTML5 and related open web specifications as a development platform.

The industry

In Tuesday's keynote sessions, Intel's Imad Sousou and Samsung's J.D. Choi took a quick tour through the platform as exposed to application developers (a detailed examination was reserved for the break-out sessions); the project defines a Web API that uses the World Wide Web Consortium (W3C)'s packaging and configuration format, and "custom" APIs for accessing contact data, NFC, Bluetooth, and other subsystems. They then went deeper into three specific areas of the stack: security, animation, and connection management.

[Imad Sousou]

The security framework is based on Smack, which Sousou described as being preferable to other Linux alternatives that required "setting up 8,000 policy files." The platform also provides integrity protection by checking application signatures at install time, and isolates each application in its own process (although he did not go into specifics, Sousou described the setup as less complicated than the "draconian" measures taken by other platforms).

The animation framework is based on OpenGL ES and the Emotion scene graph library provided by the Enlightenment Foundation Libraries (EFL), LiMo's underlying application framework. Connection management is handled by ConnMan, which Sousou announced had finally been declared 1.0. The project has worked on reducing ConnMan's overhead in the past three years, specifically for mobile devices, where the typical 2-3 second DHCP configuration time is a deal-breaker for users. The enhanced ConnMan now performs DHCP setup in milliseconds.

Several points in Sousou and Choi's talk about the architecture drew contrasts with other mobile platforms — primarily Android and the latest Blackberry offering. The point they made was that Tizen is open to input on the design from anyone willing to join the project and contribute — which is hardly the case, they suggested, for Android.

They also used their time to discuss the distinction between the Tizen Project and the Tizen Association. The project is the actual open source software project, which is led by a technical steering group (headed by Sousou and Choi), and at this stage largely developed by full-time employees from the two companies, plus smaller partners. In contrast, the Tizen Association is the marketing group that works to sell Tizen as a solution to OEM device makers, carriers, third-party application vendors, and any other industry customers. In addition to marketing the project to industry players, though, the Association also attempts to gather their requirements for an OS platform.

The next keynote was presented by Kiyohito Nagata, chairman of the Tizen Association. Nagata is also senior vice-president of NTT Docomo, Japan's largest wireless carrier. He talked about Docomo's research in user demands of smartphone devices, making the case that Tizen offers carriers the flexibility to implement their own application stores and custom services — across a range of devices. Again, this aspect of Tizen was placed in contrast to the competition.

Nagata ended his talk by discussing the board membership of the Tizen Association, which includes other large mobile phone carriers — notably Orange, Telefónica, SK Telecom, and Sprint. Tizen is marketing itself as a cross-device platform, serving in-vehicle systems (IVI), set-top boxes, tablets, and smartphones. That list is identical to MeeGo's target platforms, of course, but like MeeGo the vast majority of the talk centered around handsets — including the keynotes and the current work of the Tizen Association.

The web

Buy-in from mobile carriers is a plus, but third-party applications are what those carriers are interested in attracting in order to make their plans appealing. Tizen's case as a development platform comes down to its HTML5-based API, which was the subject of numerous breakout sessions at the conference: from the overall API to specific components (e.g., graphics, I/O, NFC, and Bluetooth).

Intel's Sakari Poussa and Samsung's Taehee Lee led a breakout session that covered the overall Web API suite. As we covered when we looked at the SDK in January, a significant chunk of the Web API is drawn from existing work spearheaded by the W3C. But there are other APIs, some exploring ways to expose mobile device functionality to web applications (for example, the ability to lock the screen rotation into landscape mode, which is reportedly of interest to game developers), others defining new general-purpose functionality like mapping-and-routing. The Tizen APIs also cover system-maintenance tasks, such as application installation, update and removal, and creating and managing user accounts for online services.

The bigger news, however, was Sousou's announcement that the Tizen project is working with the W3C to develop these "missing piece" APIs into general standards. The project wants them to be standard APIs, not "Tizen APIs," he said. In particular, Tizen is part of the W3C's new Core Mobile Web Platform Group, and Tizen is committed to adhering to the standard, whatever decisions the working group makes.

Of course, standards are just words, and many developers have heard the "write once, run anywhere" song multiple times. The "Advanced HTML5 Features" session dealt with that question specifically, arguing that the web has always been a fragmented platform, but that web development has evolved to cope with varying implementation details on desktop browsers, and has done so better than most other development platforms.

If that seems like a mild assurance, Facebook's head of mobile developer relations James Pearce was on hand to offer a more concrete testing tool, the company's new compliance tester RingMark. RingMark defines three levels (or to be more precise, "rings") of compatibility: Ring 0 covers the status quo of existing W3C device APIs, Ring 1 covers "aspirational" extensions to Ring 0, including audio/video and other high-performance tasks that are currently the domain of native APIs on most platforms. Ring 2 covers the still-in-development suite of web APIs for the future, such as WebGL.

Attendees in several of the sessions I sat in on expressed interest in Tizen's compliance program. Although Tizen so far has no formal compliance plan, it was made clear that compliance will be assessed based on a product's adherence to the API. That makes for a stark contrast against MeeGo, which demanded specific versions of specific libraries and Linux system components — a requirements set that ultimately proved too arduous for even MeeGo co-founder Nokia to pass with its N9 phone.

The future

The project, then, is making its case as an HTML5-based development platform; the next question is how it will be received by the developer community. One independent developer I talked to (who requested anonymity) expressed his doubts that HTML5 scales up to industrial devices and serious applications; he cited medical tablets among other possible upscale device classes. Most of the speakers addressed JavaScript performance and latency as points needing work in HTML5 applications, although as you might expect, most also said they were pleased with Tizen's performance.

There were a handful of companies present who are already developing applications on Tizen. Cell phone carrier Orange was among them, and presented a session on its experiences. The team from Orange has deployed HTML5 applications for news, movie ticket offers, and streaming TV, and has built enhanced user-information tools, integrating items like data and SMS counters into the phone UI.

Tizen's community manager Dawn Foster dealt with the outreach question in her state-of-the-community talk on Tuesday. In brief, the Tizen community at the moment is small; considerably smaller than the MeeGo community was, with fewer volunteer contributors joining the paid developers from Intel and Samsung. But that is to be expected, she said, primarily because it is hard to build excitement about a platform before consumer devices are available. On that front, she added, Tizen is trying to take a different approach, by underplaying the hype of the platform and "letting the code lead." Likewise, while MeeGo established a complicated working group structure at the outset, well before any code was delivered, Tizen's project structure is intentionally loose at this stage.

Perhaps that "release-first" strategy will also help deal with the other hurdle facing Tizen, developer burnout among veterans of the earlier projects in Tizen's lineage. Fundamentally, burnout with platform-switching may be one of the reasons Tizen is pressing so hard on the HTML5 front at the moment. Whatever else developers may think of HTML5, it is at least a platform-neutral approach to application development. The keynotes talked of more options still-to-come in the Tizen 2.0 release currently scheduled for the end of 2012 — for example, the Emotion animation framework mentioned by Choi. But at least for now, HTML5 and the web APIs remain the sole story for application developers.

Intel and Samsung are both ramping up their outreach to those developers. Intel is running an application developer contest, while Samsung distributed mobile developer devices to registered attendees. Foster also highlighted two tools to develop HTML5 applications that are designed to be lighter-weight than the full Tizen SDK: the Rapid Interface Builder (RIB) and Web Simulator. The contest runs until August — which is plenty of time for developers to explore the code base. As of May 9, however, there had still not been any consumer device announcements.

It is understandable that independent developers might be wary of Tizen given how recently they were being told about MeeGo. Ultimately no trick can undo that wariness; the only remedy will be to see the project grow in its own right and earn its own place. There are some key differences already — fairly or not, MeeGo was always perceived largely as a Nokia-only party without much connection to the all-important phone carrier industry, while Tizen has a longer list of mobile partners on board. MeeGo also presented potential contributors with a top-heavy compliance process and byzantine project structure, all well before there was any code to examine. With Tizen, however a developer feels about the commercial parties behind the scenes, there is code to see, and an API that exists outside the project itself; both of which are in the "plus" column.

[ The author would like to thank the Tizen project and the Linux Foundation for support to attend the conference. ]

Comments (15 posted)

Accounting systems: a rant and a quest

By Jonathan Corbet
May 8, 2012
Attentive long-time readers of LWN may remember that this business is based entirely on free software with one distressing exception: our business accounting is still done using the proprietary "QuickBooks Pro" package. QuickBooks does not lack for aggravations, but the task of replacing it has never quite attained a high enough priority for something to actually happen. Good replacements in the free software community are hard to come by, accounting is boring, our accountant deals easily (and cheaply) with QuickBooks files, and the existing solution, for the most part, simply works. Or, at least, it used to simply work.

The monthly accounting ritual involves importing a lot of data from the web site into the accounting application; in particular, subscription sales need to be properly fed in so that we can minimize our taxes on the income in the proper American tradition. This process normally works just fine, but, recently, it failed, saying: "Cannot import, not enough disk space or too many records exist." Naturally, in QuickBooks style, it failed partway through the import process, leaving a corrupted accounting file behind. But QuickBooks users usually learn to make backups frequently and can take such things in stride.

The inability to feed data into the system is a little harder to take in stride, though, especially once some investigation proved that disk space is not in short supply and the failure is elsewhere. It didn't take much time searching to turn up an interesting, unadvertised QuickBooks antifeature: there is a software-imposed limit of 14,500 "list items," which include products offered by the company, vendors, customers, and more. Once that limit is hit, QuickBooks will not allow any more items to be entered; the only supported way out is to upgrade to the "enterprise" version, which can currently be done for a special offer price of only $2400.

In other words: Intuit sells a program that is intended to become an integral part of a business's core processes, perhaps even functioning as a point-of-sale system. This program will, without warning, simply cease to function once the business accumulates an arbitrary number of entries. The only way for that business to get a working accounting system back is to "upgrade" to a new version that costs ten times as much. One can only conclude that this proprietary software package has not been written with its users' needs as the top priority. Instead, it contains a hidden trap to force them into more expensive offerings at a time when they may have little alternative. Who would have ever thought proprietary programs could be that way?

Here at LWN, we had no particularly urgent need to get things working again; other businesses may well not have the luxury of enough time to find an acceptable way out of this situation. It is, thus, unsurprising that there are entire businesses being built around this little surprise from Intuit. Needless to say, there is little enthusiasm in the LWN head office for the purchase of an expensive and proprietary "enterprise" accounting system. In the short term, a workaround has been found: sacrifice most of our accounting history to bring the record count to a level where the program will consent to function as advertised. That has other interesting side effects, like mysteriously changing the balances of reconciled accounts from previous years, but it does take the immediate pressure off. For now, we can continue to do our books.

But a clear message has been delivered here: it is about time that we at LWN read some pages from our own publication and realize that a dependence on proprietary software poses a real risk to our business. A company that is willing to put one such hostile surprise into an important application will put in others and, without the source, there is no way anybody can look for them or remove them if they are found. QuickBooks is too risky to continue to use.

It is, in other words, time to make the move to a free accounting program.

When we have looked at the available tools in the past, the results have always been a little disappointing. There is no shortage of software that can maintain a chart of accounts and a set of double-ledger books. But there has been, in the past, a relative scarcity of useful accounting tools for small businesses. Instead, what's out there is:

  • Various personal finance utilities, including GnuCash, KMyMoney, and others. For basic accounting they work well, but they fall short of a business's needs.

  • Massive enterprise-oriented toolkits that can be used to build systems implementing accounting, inventory-tracking, point-of-sale, customer relationship management, supply-chain management, human resources, and invoicing, with add-on modules for bill collection, weather prediction, automated trading, and bread baking. These systems have names like ADempiere, Compiere, OpenERP, LedgerSMB, and Apache OFBiz. The target users for these projects appear to be consultants and businesses with full-time people dedicated to keeping the system running. To a business like LWN, they tend to look like a box with hundreds of nearly identical parts and a little note saying "some assembly required."

What is missing in the middle is a package for a business with no special accounting needs, but which needs to be able to automate data entry, generate tax forms at the end of the year, and interface with an accountant so it can get its taxes done. Given how incredibly exciting small-business accounting is, it's surprising that so few developers have felt a burning need to scratch that particular itch. There is no accounting for taste, it seems.

That said, it has been a few years since we last made a serious effort to learn about free software accounting alternatives; clearly the time has come for another pass. So we'll be doing it, with an eye toward, hopefully, making the transition at the end of the calendar year. That gives us several months to forget about the problem while still allowing a few months of panic at the end, so the schedule should be plausible.

Stay tuned for updates, it should be an interesting ride. But we are pretty well determined not to find out what other surprises our proprietary accounting system may have in store for us. In 2012, it should be possible to run a small, simple business on free software and never have to wonder when the accounting system will stop functioning and demand more money. We intend to prove it.

Comments (84 posted)

Page editor: Jonathan Corbet

Inside this week's Weekly Edition

  • Security: Internet censorship and OONI; New vulnerabilities in argyllcms, kernel, php, python3, ...
  • Kernel: The CoDel queue management algorithm; Statistics from the 3.4 development cycle; Supporting multi-platform ARM kernels.
  • Distributions: Who should maintain Python for Debian?; Fedora, Mandriva, ...
  • Development: LGM: Inkscape quietly evolves into a development platform; Apache OpenOffice, GIMP, nPth, sigrok, ...
  • Announcements: GNOME's outreach program for women, TDF Certification program, Oracle v. Google, SAS v. WPL, ...
Next page: Security>>

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds