Leading items

ABS: Android beyond the phone

By Jake Edge
April 20, 2011

The first Android Builders Summit (ABS) was held April 13-14 as part of the Linux Foundation's "two weeks of conferences" in San Francisco. ABS overlapped the last day of the Embedded Linux Conference (ELC), so many of the ELC attendees sat in on ABS talks, at least for the first day. On the second day, Marko Gargenta of Marakana gave a keynote that looked at uses of Android beyond its traditional consumer mobile phone (and now tablet) niche. He also outlined some of the reasons that device makers are turning to Android.

Advantages of Android

Gargenta is the author of Learning Android and has worked with various companies to help them with their Android plans. There are three main advantages that Android brings to the table which are important to device makers, he said. First that it is an open platform and it is relatively easy to get the source code and customize it. Second that it has "apps" and lots of developers are embracing the platform. And third that it is a "complete stack" that provides nearly all of the services that are required to create a product.

While Android is an open platform, as evidenced by Andy Rubin's famous tweet—though the tweet is missing two important steps (envsetup.sh and lunch) as Gargenta and audience members pointed out—it isn't much like other open source projects. There is no Git tree that contains "whatever was checked in the night before"; instead there are somewhat infrequent code drops. With Honeycomb (Android 3.0), even those have stopped, which is something that concerns some companies who are basing their products on Android. It may eventually cause them to reconsider using it.

Applications for Android are important, but it's not really the existing applications that are interesting, rather it is the "model for developing applications" that attracts the device makers. Many existing applications may run on a modified Android, but that is just a "bonus", he said. When you look at Android as a whole, it has all of the pieces from the hardware up, including the Linux kernel, libraries for many of the things the device vendors need to be able to do, Java, and the application development model, which is what makes it a complete stack. This stands in contrast to standard embedded Linux or Java ME, which aren't a complete stack.

Case studies

Gargenta then launched into several case studies of devices using Android. The projects were ones that he had worked on, though some were not public (or not public yet), so he didn't reveal the company behind them.

The first was a multi-function printer/scanner/copier device where the user interface would be built with Android. The application development framework was one of the most attractive parts of Android, not because they would be putting Market applications on the device, but because they could have independent developers work on the interface. There are lots of developers out there who can write to the Android APIs.

The complete stack approach of Android was also appealing because the system already has support for graphics, touchscreen interaction, and networking. There were some missing pieces, of course, including drivers and C libraries to talk to the proprietary printer/scanner hardware, Java interfaces for that hardware, and a new "home" application. Instead of the usual Android user interface, a custom application was written that didn't include things like an application drawer or status bar. In fact, users of the device may never know that they are using Android, he said.

Two different device types that Gargenta described had similar requirements and had many of the same reasons for going with Android. The first was a "public safety solution" for handling communications during catastrophes that was developed by a major OEM, while the other was a device for the US Department of Defense (DoD). In both cases, the availability of "off the shelf" hardware that runs Android was attractive. For the public safety application, it's important that multiple kinds of hardware can be used, as various different agencies need to be able to coordinate their efforts.

Once again, the application framework provided by Android is appealing because it allows multiple developers to work on various parts of the problem, more or less simultaneously. The large developer base is attractive as well. Both projects were concerned with stopping installation of "unapproved" applications, either from the Market, or by restricting which repositories the devices could access.

As might be guessed, the DoD project had further security concerns. It is important to ensure that the device is being used by an authorized person, so attaching a USB device as part of the authentication process is required. The existing Android code did not support application access to the USB ports, so that was added. In addition, device management was added so that devices could be tracked or remotely wiped, and so that password policies could be enforced.

Both projects had an interest in the priority of Android services. In general, radio communications should not be interrupted by text messages or a game, so the assumptions needed to be tweaked from those of a consumer device. Determining which services are critical can be difficult, Gargenta said. For example, "are media services that critical in a life or death situation?", he asked. They may or may not be, depending on the media in question.

The Cisco Cius was another example that Gargenta presented. It is meant to be an "iPad for business" that looks something like a desktop video phone, but the video screen part can be removed to become a mobile tablet device. [PULL QUOTE: The "open and portable" nature of Android was one of its selling points, but the company is rethinking Android because of the Honeycomb availability issue. END QUOTE] The "open and portable" nature of Android was one of its selling points, but the company is rethinking Android because of the Honeycomb availability issue. Google is also not helping adoption in the enterprise market because it is not telling anyone what its plans are for things like device management and security, he said.

The Cius has its own Market where applications are much more carefully vetted and generally have higher quality. The Cius also adds multi-user support, which is not something that Android does, but is, of course, available in the underlying Linux kernel. The device also provides video conferencing and Voice over IP telephony support; the latter was added before Google released Gingerbread with SIP support, because there is no Android roadmap.

Android set-top boxes were another use that Gargenta described. Google TV is not available as an API, so it can't be used for television applications. Android is attractive for the usual reasons, but has some drawbacks as well. Appeasing content providers with DRM solutions is one area that needed to be addressed in the two projects he worked on. The Android user interface is also not usable for TVs, but it is relatively straightforward to create one, partly because Android was designed to support multiple devices.

The last case study from Gargenta's talk was for "networked cars". Visteon has created a prototype of a Android-based dashboard for cars. One of the more interesting characteristics of such a device is that it requires multi-screen support, which is not something that comes out of the box with Android. But it does make a good platform for doing user interface development quickly, he said.

He listed a number of other Android-based products that he knew of, including home security systems, scientific calculators, microwaves, and washing machines. One thing that Gargenta didn't mention was whether any of the changes being made by these device makers were being pushed back to Google for possible inclusion into Android. One gets the sense that, in keeping with the secrecy that often shrouds the embedded world, those changes may well be held back. It's also not clear if the custom Linux drivers for various hardware devices are being released in source form, as Gargenta didn't really address the kernel in his response to an audience question about licensing.

It certainly was interesting to hear where Android is being used, especially in devices that stray far from its roots. In many ways it is just an extension of the enormous penetration that Linux has made into the embedded world. Whether other "full stack" solutions, like MeeGo or WebOS, can make inroads into devices over the next few years will be interesting to watch.

The conference

While ABS definitely had some interesting talks, some of which I hope to write up in coming weeks, it was rather different than one might have expected. The first two keynotes were essentially extended advertisements for the speakers' companies (Motorola and Qualcomm), which is not at all the norm at technical conferences. In addition, it was rather surprising to see a complete lack of Google speakers—and sponsorship. Some noted that the Google I/O conference was scheduled a few weeks after ABS, but that doesn't seem like reason enough for that level of non-participation. If the LF plans to reprise the conference next year, fixing the keynotes and working with Google would likely result in an even better conference.

Comments (7 posted)

Drupal Government Days: Drupal and the semantic web

April 20, 2011

This article was contributed by Koen Vervloesem

Drupal 7 is the first mainstream content management system with out-of-the-box support for users and developers to share their data in a machine-readable and interoperable way on the semantic web. At the Drupal Government Days in Brussels, there were a few talks about the features in Drupal — both in its core and in extra modules — to present and interlink data on the semantic web.

In his talk "Riding the semantic web with Drupal", Matthias Vandermaesen, senior developer for the Belgian Drupal web development company Krimson, gave both an introduction to the semantic web and an explanation of the Drupal features in this domain. The problem with the "old" web is that it is just a collection of interlinked web pages, according to Vandermaesen: "HTML only describes the structure of documents, and it interlinks documents, not data. The data described by HTML documents is human-understandable but not really machine-readable."

The semantic web, on the other hand, is all about interlinking data in a machine-readable way, and Linked Data, a subtopic of the semantic web, is a way to expose, share and connect pieces of data using URIs (Uniform Resource Identifier) and RDF (Resource Description Framework). This guarantees an open and low-threshold framework, where browsers and search engines can connect related information from different sources. All entities in a Linked Data dataset and their relationships are described by RDF statements. RDF provides a generic, graph-based data model to structure and link data. Each RDF statement comes in the form of a triple: subject - predicate - object. Each subject and predicate is identified by a URI, while an object can be represented by a URI or be a literal value such as a string or a number.

The semantic web is not some vague future vision, it's already here, Vandermaesen emphasized. He talked about some "cool stuff" that the semantic web makes possible. For instance, search engines like Google already enrich their search results with relevant information that is expressed in RDFa or microformats markup: if you search for a movie, Google shows you some extra information under the reference to the IMDb page of the movie, such as the rating, the number of people that have given a rating, the director, and the main actors. Google shows these so-called "rich snippets" in its result page for a lot of other types of structured data, such as recipes. Moreover, many social networking web sites like LinkedIn, Twitter, and Facebook (with its Open Graph Protocol) already markup their profiles with RDFa.

But how do we "get on" the semantic web? This is actually quite simple, according to Vandermaesen: just use the right technologies to work with machine-understandable data, like RDF and RDFa, OWL (Web Ontology Language), XML, and SPARQL (a recursive acronym for SPARQL Protocol and RDF Query Language). There are two common ways to publish RDF. The first one is to use a triplestore, which is a database much like a relational database, but with data following the RDF model. A triplestore is optimized for the storage and retrieval of RDF triples. Well-known triplestores are Jena, Redland, Soprano, and Virtuoso.

The other way to publish RDF is to embed it in XHTML, in the form of RDFa. This W3C recommendation specifies a set of attributes that can be used to carry metadata in an XHTML document. In essence, RDFa maps RDF triples to XHTML attributes. For instance, a predicate of a triple is expressed as the contents of the property attribute in an element, and the object of the same triple is expressed as the contents of the element itself. For example, using the Dublin Core vocabulary:

    <div xmlns:dc="http://purl.org/dc/elements/1.1/">
        <h2 property="dc:title">The trouble with Bob</h2>
    </div>

One of the benefits of RDFa is that publishers don't have to implement two ways to offer the same content (HTML for humans, RDF for computers), but can publish the same content simultaneously in a human-readable and machine-understandable way by adding the right HTML attributes.

Thanks to these machine-readable data, it's quite easy to connect various data sources. Vandermaesen gave some examples: you could add IMDb ratings to the movies in the schedule of your local movie theatre, and you could link the public transport timetables to Google Maps. This shows one of the key features of the semantic web: data is not contained in a single place, but you can mix and match data from different sources. "With the semantic web, the web becomes a federated graph, or (how Tim Berners-Lee calls it) a Giant Global Graph", he said.

RDFa in Drupal

"Drupal 7 makes it really easy to automatically publish your data in RDFa," Vandermaesen said, "and search engines such as Google will automatically pick up this machine-readable data to enrich your search results." Indeed, any Drupal 7 site automatically exposes some basic information about pages and articles with RDFa. For instance, the author of a Drupal article or page will be marked up by default with the property sioc:has_creator (SIOC is the Semantically-Interlinked Online Communities vocabulary). Other vocabularies that are supported by default are FOAF (Friend of a Friend), Dublin Core, and SKOS (Simple Knowledge Organization System). Drupal developers can also customize their RDFa output: if they create a new content type, they can define a custom RDF mapping in their code. A recent article on IBM developerWorks by Lin Clark walks the reader through the necessary steps for this.

But apart from RDFa support in the core, there are a couple of extra modules that let Drupal developers really tap into the potential of the semantic web. One of them is the (still experimental) SPARQL Views module, created by Lin Clark and sponsored by Google Summer of Code and the European Commission. With this module, developers can query RDF data with SPARQL (SPARQL is to RDF documents what SQL is to a relational database) and bring the data into Drupal views. This way, you can import knowledge coming from different sources and display it in your Drupal site in a tabular form, and this with almost no code to write. "Thanks to SPARQL Views, any Drupal web site can integrate Wikipedia info by using the right SPARQL queries to DBpedia," Vandermaesen explained. At his company Krimson, he used (and contributed to) SPARQL Views in a research project sponsored by the Flemish government, with the goal of creating a common platform to facilitate the exchange of data in an open and transparent fashion between large repositories that contain digitized audiovisual heritage.

Linked Open Data

In his presentation "Linked Open Data funding in the EU", Stefano Bertolo, a scientific project officer working at the European Commission, gave an overview of the projects the European Union is currently funding to support linked data technologies. He also maintained that governments are likely to become the first beneficiaries of advances in this domain, thanks to Drupal:

Linked Open Data, which is Linked Data open for anyone to use, is really taking off and Drupal is ready for it. There's a massive amount of information you can re-use in your Drupal installation, and this re-usability is the most important aspect of the semantic web. Just like a typical software developer re-uses a lot of software libraries for generic tasks, the semantic web allows you to re-use a lot of generic data. That's why the European Commission has been investing in Linked Open Data technology. Drupal and Linked Data have much to offer to each other, especially in the domain of publishing government data.

Bertolo mentioned three Linked Open Data projects funded by the European Commission. One is OKKAM, a project that ran from January 2008 to June 2010. Its name refers to the philosophical principle Occam's razor, "Entities should not be multiplied beyond necessity", to which the OKKAM project wants to be a 21st-century equivalent: "Entity identifiers should not be multiplied beyond necessity. What this means is that OKKAM offers an open service on the web to give a single and globally unique identifier for any entity which is named on the (semantic) web. This Entity Name System currently has about 7.5 million entities, such as Barack Obama, European Union, or Linus Torvalds. When you have found the entity you need in the OKKAM search engine, you can re-use its ID in all your RDF triples to refer unambiguously to the entity.

Another deliverable of the OKKAM project is sig.ma, a data aggregator for the semantic web. When you search for a keyword, sig.ma combines all information it can find in the "web of data" and presents it in a tabular form. Recently, a spin-off company started, based on the results of the research project.

The second European-funded project Bertolo talked about was LOD2, a large-scale project with many deliverables. The project aims to contribute high-quality interlinked versions of public semantic web data sets, and it will develop new technologies to raise the performance of RDF triplestores to be on par with relational databases. This is a huge challenge, because a graph-based data model like RDF has many freedoms, which is difficult to optimize as there is no strict database schema. The LOD2 project will also develop new algorithms and tools for data cleaning, linking, and merging. For instance, these tools could make it possible to diagnose and repair semantic inconsistencies. Bertolo gave an example: "Let's say that a database lists that a person has had a car insurance since 1967 while the same database lists the person's age as 18 years. Syntactically, there are no errors in the database, but semantically we should be able to diagnose the inconsistency here."

A third project by the European Commission is Linked Open Data Around the Clock. Bertolo explains its goal: "The value of a network highly depends on the number of links, and currently the links across Linked Open Data datasets are not enough. The mission of the Linked Open Data Around the Clock project is to interlink these much more, to give people more bang for their RDF buck. Our objective is to have 500 million links in two years." As a testbed, the project started with publishing datasets produced by the European Commission, the European Parliament, and other European institutions as Linked Data on the Web and interlinking them with other governmental data.

Drupal paving the way

At the moment, the semantic web is still struggling with a chicken-and-egg problem: many semantic web tools are still experimental and not easy to use for end users, and publishers still have trouble finding a good business model to publish their data as RDF when their competitors don't do so. However, with out-of-the-box RDFa support in Drupal 7, the open source CMS could pave the way for a more widespread adoption of semantic web technologies: Drupal founder Dries Buytaert claims that his CMS is already powering more than 1 percent of all websites in the world. If Drupal keeps growing its market share, the CMS could help to bring Linked Open Data to the masses, and we could soon have millions of web sites with RDFa data on the web.

Comments (8 posted)

A report from the (not only) MySQL conference 2011

April 20, 2011

This article was contributed by Josh Berkus

The MySQL Conference and Expo in Santa Clara, the largest open source conference in the San Francisco Bay area since the decline of LinuxWorld, was unusual this year in several ways. First was the inclusion of many non-MySQL databases in the conference. The second was the mostly-friendly rivalry of the several MySQL forks, who presented co-equal keynotes. The dominant discussion at the conference, however, was the stormy relationship between the MySQL community, the conference, and Oracle.

PostgreSQL and more

The biggest official change this year is that conference organizers O'Reilly and Associates decided to bring in non-MySQL open source databases in order to expand the scope of the conference. Most notable among these was the long-time MySQL "rival" project, PostgreSQL. The PostgreSQL support company EnterpriseDB was the top sponsor of the conference, and dominated the expo floor with a huge presentation area. There were two PostgreSQL tutorials and seven talks in the conference program.

In addition to PostgreSQL, several other open source database projects were invited to give talks at the conference, including MongoDB, CouchDB, Cassandra, Redis and HBase. MongoDB was particularly active on the trade show floor, and their ubiquitous coffee mugs turned up all over the conference.

The PostgreSQL presence at MySQLCon included at keynote in which EnterpriseDB staff went over features for the recently released version 9.0 and the upcoming version 9.1. The 9.1 features will be covered in a future LWN article.

Hot-and-cold Oracle

Oracle's relationship with the conference bordered on the schizophrenic. According to a member of the conference committee, Oracle sponsored the conference but refused to allow their name to be listed as a sponsor. For several months Oracle refused to allow their staff to submit talks to the conference or attend, relenting at the last minute and sending several speakers and a small marketing crew, according to a member of the Oracle staff.

Oracle did sponsor a party on Wednesday night of the conference. However, Oracle also sponsored a new MySQL track at IOUG's Collaborate conference at the same time, in Florida, 3000 miles away. While by all reports attendance of the MySQL track at Collaborate was poor (some witnesses reporting as few as 50 people at the keynotes), MySQL luminaries employed at Oracle, as well as MySQL-using Oracle partners, were obligated to go to Florida and not to California.

This cannot have helped attendance at the MySQL Conference. Indeed, attendance was down to around 1100 compared with over 1500 last year according to conference organizers.

MySQL

On Tuesday, Tomas Ulin, Oracle's MySQL engineering manager presented the recent improvements which Oracle has introduced in version 5.5, what is planned for 5.6, and some of the changes that have been made to MySQL in their first year of stewardship of the project.

One of the biggest changes to MySQL 5.5 is that InnoDB, the primary and most mature transactional database engine for MySQL, is now the default. This is welcome news since one of the chief causes of inexperienced MySQL users losing data is use of the older, non-crash-safe, MyISAM engine. The MySQL and InnoDB teams at Oracle have also been merged, "as they always should have been" according to Ulin.

Other 5.5 features include substantially improved performance on Windows, enhanced partitioning, and the performance_schema, which is a tool to collect runtime performance data about MySQL queries.

Ulin also announced a "development release" of MySQL 5.6. Oracle's goal is to make development releases very stable, so that they can move from the development releases to final release quickly. Features for 5.6 include more improvements to partitioning and additional views in performance_schema. MySQL is also adding additional query optimizations to improve performance on large databases, such as multi-range reads, sort optimizations, and pushdown of predicates into subqueries.

MySQL 5.6 may also include enhanced integration with Memcached. This includes the ability to use the Memcached protocol in order to access the InnoDB storage engine directly, effectively making InnoDB a NoSQL and SQL database at the same time.

Ulin also went over Oracle's plans for MySQL Cluster, otherwise known as NDB. MySQL Cluster is a specialized database engine aimed at telecommunications companies, and has been commercially successful in that market. "If you make a phone call today, MySQL cluster is probably involved somewhere", said Ulin.

Version 7.2 will include support for some kinds of JOINs across clustered tables, which NDB has not previously had. It will also include increases in the number of columns per table it can support, and replication of user privileges for faster access. It is also expected to support dynamic switching between SQL queries and direct object-oriented access to the database engine.

MariaDB

The two successive acquisitions of MySQL, by Sun and then by Oracle, have spawned a number of forks, each of which is pursuing its own course of development and community. On Wednesday, Monty Widenius, founder of MySQL, presented MariaDB, his MySQL successor database — or fork — which his company in Finland has been developing for the last two years. Several of the original MySQL developers are now working on MariaDB.

MariaDB is primarily meant to be a drop-in replacement for MySQL 5.1. Its main advantage is being pure open source and not owned by Oracle. MariaDB has also released an LGPL C-language driver for MySQL, which according to Widenius resolves some of the licensing issues with Oracle MySQL drivers.

Beyond that, MariaDB 5.2 was released recently with a number of useful features not present in mainstream MySQL. Widenius announced that for the first time Sphinx full text search is available directly as a storage engine called SphinxDB. MariaDB contains multiple improvements to MyISAM storage and supports pluggable authentication. It also has added "virtual columns", which hold automatically updated calculated values based on the other columns in the table.

MariaDB 5.3 will include an extensive rewrite of the query optimizer which is supposed to improve response time on more complex queries by orders of magnitude. It will also support a new, faster form of group commit for faster database writes.

Widenius also announced that MySQL support company SkySQL would be offering support for MariaDB. SkySQL is a new MySQL support company in Finland created by a group of former MySQL AB staff and MySQL co-founder David Axmark.

Drizzle

Probably the most conspicuous MySQL fork at MySQL Conference and Expo, due to the number of speakers, was Drizzle. Drizzle is optimized for usage on cloud hosting, as well as being extensively rewritten to clean up the code. The project's team is made up of both former MySQL developers and new contributors with Rackspace as their primary commercial sponsor.

On Wednesday, Brian Aker took the stage for Drizzle. His big (if rather belated) announcement was the general availability release of Drizzle, which is now ready for its first production use as of around a month ago. The second announcement was that MySQL support and services company Percona had announced commercial support for Drizzle.

The Drizzle team has spent the last three years redeveloping MySQL around a "micro-kernel" architecture. This means that they've taken many things in the MySQL core code, simplified them, and converted them to "plugins", allowing individual users to reconfigure how Drizzle works. Their refactor of the code has also eliminated many longstanding MySQL "gotchas" regarding Unicode support, timestamps, constraints, Cartesian joins, and more.

Since Drizzle is built "for the Cloud", a strong part of its focus is on replication. Drizzle supports row-based replication using protobufs, an open format created by Google. The new open replication format supports integration with a variety of other tools such as ApacheMQ, Memcached, and Hadoop. It also supports multiple masters, partial replication, and sharding. In development are virtualized database containers using database "catalogs".

Aker also announced libDrizzle, a client library which works with MySQL and SQLite as well as Drizzle. Since this driver is BSD-licensed, he believes it will be attractive to users who have legal concerns about the MySQL driver licensing.

The future of open source databases and MySQL

The conference ended on Thursday with talks from Baron Schwartz of Percona and Mike Olson of Cloudera and BerkeleyDB on the future of databases. The two talks were remarkably similar in their predictions:

databases will replicate and cluster seamlessly
data will grow to petabytes and more, even for smaller organizations
databases will integrated better with the rest of the stack
databases will support a variety of different data formats
people are using a variety of special-purpose databases now, but future databases will be more all-purpose
data caching and databases will stop being completely separate layers, but will be fused

The main difference between the two presentations was on the place of relational vs. non-relational databases in the future of databases.

The future of MySQL was rather more of a source of anxiety for the attendees. As a PostgreSQL geek, I got asked repeatedly where PostgreSQL would be in five years by MySQL users who were clearly wondering the same thing about MySQL. All of the forks and the love/hate relationship with Oracle have undermined confidence in MySQL, and sent users looking for alternatives, or for reassurance.

As for the future of MySQL Conference and Expo, the rumor at the conference was that O'Reilly plans to move it away from Santa Clara. An attendee named Olaf even ran vote-by-Twitter poll for a new location.

Videos from the conference are available on blip.tv.

Comments (16 posted)

Page editor: Jonathan Corbet
Next page: Security>>