|
|
Log in / Subscribe / Register

LWN.net Weekly Edition for April 5, 2012

Why bother supporting ARM?

By Jonathan Corbet
April 4, 2012
Two weeks ago, LWN covered the debate within the Fedora project over whether its ARM port should be designated one of that distribution's "primary" architectures. That discussion has progressed a little further, so an update may be warranted. But it may also be worthwhile to address a related question: why is there resistance to the concept of supporting ARM as a primary architecture in the first place? And why might it make sense to promote the ARM architecture anyway?

One of the things that came out in the original discussion is that the Fedora project did not have any idea of how to do that. Over its entire history, the project has never before seriously considered moving one of its secondary architectures to primary status. So there are no procedures in place and no criteria by which a decision to promote an architecture can be made. So, unsurprisingly, the project decided that it needs to come up with a set of reasonable criteria. On April 2, Matthew Garrett posted a draft showing what those criteria might look like.

The rules would appear to make sense. The Fedora infrastructure and release engineering teams need to have people who are able to represent any new primary architecture. The project must be able to build packages on its own servers. Anaconda, the Fedora installer, must work on the targeted hardware. Maintainers of important packages must have access to the supported hardware so they can fix problems. No binary blobs. And so on. Also required is approval from various Fedora teams, each of which can impose additional criteria if it sees the need. These rules are in an early form and can be expected to evolve over time, but the early responses on the mailing list suggest that most people are happy enough with what has been set down.

That said, there are clearly some people who do not see the point of supporting ARM as a primary architecture, and they have a number of reasons for their reluctance. The ARM architecture is messy, for example. The x86 architecture does not have a single design authority, but processors made by multiple vendors still resemble each other closely enough to create a fairly tightly-knit processor family. ARM does have a central design authority, but that authority leaves a lot of significant details up to individual manufacturers, of which there are many. So ARM is not a tightly-knit family; it is more like an extended group of hostile ex-spouses and in-laws who have moved to different continents to get away from each other.

The looseness of the ARM "platform" had led to a lot of innovation in the design space; there is no end of interesting ARM system-on-chip designs with all kinds of impressive integrated peripheral devices. But this diversity, along with a distressing lack of hardware discoverability, makes it impossible to create a single kernel that works on all (or even a significant subset of) ARM processors. Distributors hate having to maintain multiple kernels, and they hate having to put target-specific hacks into installers. ARM currently forces both, despite the ongoing work to consolidate kernel code and move hardware knowledge into the bootloader-supplied device tree structure.

ARM is also, for many developers, a relatively obscure architecture lacking the familiarity of x86. The fact that there are vastly more ARM systems running Linux than x86 systems does not really change that perception; most of us lack ARM-based development systems on our desks. Additionally, ARM processors are relatively slow. That is a problem for developers, who typically need to keep an x86 system and a cross-compiling toolchain around to be able to get through more than one edit-compile-test cycle in any given day. That slowness is also an issue for distributors; it can delay security updates and distribution releases, even for other architectures. And, while the hardware is slow, product cycles are fast; by the time developers have gotten a target working nicely, it may be obsolete and off the market.

Given all of these challenges, it is not surprising that some people would rather not be bothered by an architecture like ARM. The x86 world provides plenty of open, high-performing systems with wide support; why get distracted with that messy architecture where, even if the distribution can be made to work, the hardware is probably closed and won't allow it to be installed?

The answer, of course, is that said messy architecture is already performing much of our computing, and it will likely be doing more of it in the future. Traditional PC-style systems are no longer the center of attention; one assumes they will not go away entirely, but a lot of the action is elsewhere. There is a whole new crowd of makers looking to do interesting things with ARM-based designs; we are just beginning to see what can be done with interesting mobile devices, and the bulk of those devices are not, at this point, using x86 processors. Meanwhile, ARM has its eyes on data center applications where, some think, its compactness and power efficiency will make up for its lack of speed. The x86 architecture will be with us for a long time - even Intel has proved unable to kill it off in the past - but it is far from the only show in town.

It is also worth remembering that, for all its success, Linux is still a minority player on x86 systems. But Linux is the dominant system on ARM-based systems. The "year of the Linux desktop" may be an old and sad joke, but the year of the Linux gadget looks to be happening for real - again.

Given that ARM is where much of the action is, it would make sense that a Linux distribution - especially one that is supposed to be leading-edge and forward-looking - would want to support ARM as well as possible. Solid support for the architecture seems like a necessary precondition for any sort of presence in the interesting computing devices of the near future. Distributors like Ubuntu appear to have come to that conclusion; they have built on Debian's longstanding ARM support to create a distribution that, they hope, will be found in future devices. Without well-established ARM support, Fedora - along with the distributions derived from it - has little chance of competing in that area.

So one might well say that the questions being asked in the Fedora community are wrong. Rather than asking "why should we support ARM when it presents all of these difficulties?", it might make more sense to ask "how can we address these difficulties to provide the best ARM-based distribution possible?". The cynical among us might be tempted to say that Red Hat, Fedora's sponsor and main contributor, faces a classic disruptive technology problem. ARM is unlikely to displace x86 in the places where Red Hat currently sells support, and revenues from any future ARM-based "enterprise" distribution seem likely to be rather lower than those obtained from x86-based distributions. So it would be understandable if Red Hat were to show a lack of enthusiasm for the ARM architecture.

The cynical view is, at best, only partially right, though. Red Hat does not advertise the resources it is putting into ARM distribution development, but it clearly has a number of engineers on the task. Even as a "secondary" architecture, Fedora's ARM distribution has been solid enough to serve as the base for the ARM-based OLPC XO 1.75 laptop. Without Red Hat's support, there wouldn't be a Fedora ARM distribution even with secondary status. So it seems unlikely that Red Hat is the sticking point here, even if its contributions to the kernel's ARM subtree (29 patches total from 3.0 to the present) show little enthusiasm. More likely we're just seeing the usual noise as the wider community comes to terms with what will be required to support this architecture properly.

In the end, the world would not be well served by a single processor architecture; there is value in diversity. Similarly, an industry where ARM-based systems are dominated by Android variants may not be the best possible world. A lot of interesting things are happening in computing, and many of them involve the ARM architecture; there is a lot of value in having strong community-based distribution support for that architecture. That is why Fedora will, in the end, almost certainly bother to support ARM as a primary architecture despite the challenges it presents.

Comments (28 posted)

An update on Oracle v. Google

April 4, 2012

This article was contributed by Adam Saunders

After over a year and a half of legal proceedings, Oracle and Google will go to trial on April 16 in front of the United States District Court for the Northern District of California, to determine whether or not Google's Android software infringes Oracle's copyrights on Java, as well as some of its patents. If the parties don't settle, this trial is expected to take eight weeks.

A lot has happened since the litigation started. In August 2010, several months after acquiring Sun Microsystems, which developed Java and held the copyrights, Oracle launched a lawsuit against Google, claiming that Android's use of Java infringed seven of Oracle's patents, as well as the Java copyrights Oracle holds. The complaint demands an injunction against Google from continuing with its allegedly infringing activity, that "all copies made or used in violation of Oracle America's copyrights [...] be impounded and destroyed or otherwise reasonably disposed of", and that Oracle receive damages. Essentially, Oracle is formally seeking to stop Google's use of Java in Android, and wants compensation for that use.

The FSF has argued that if Google had used an available GPL-licensed version of Java, such as IcedTea, as part of Android, it would have avoided this litigation. This may be true; Sun (now Oracle) distributes Java under GPLv2 with a linking exception, which is what IcedTea is based off of. The GPLv2 implicit patent language, contained in sections 6 and 7 of the license, effectively gives users of Sun/Oracle's distribution of Java a royalty-free patent license that covers standard free software practices: the right to use, modify, and redistribute the software, including modified versions. With the linking exception, permissively licensed software and proprietary software that links to IcedTea could be developed without being licensed under GPLv2; thus, the app repository Google Play (formerly known as Android Market), with its proprietary apps as well as free software apps, would have still been possible.

However, this argument ignores the fact that the Android project started before Sun licensed Java under the GPL. Android, Inc. was founded in 2003 and acquired by Google in 2005; the relicensing of Java happened in November, 2006. When Android started, if a non-Sun programmer or development project wanted to make an open-source version of Java while minimizing the threat of copyright infringement, the only practical way to do this was to rely on clean room reverse engineering; this is what Google claims to have done. But clean room reverse engineering is not a helpful defense in patent litigation, which is why Google's way of implementing Java in Android - including basing Dalvik off of the Apache Harmony project, and not Sun/Oracle's GPL'd Java - exposes it to patent lawsuits from Oracle, assuming Oracle has any valid patents that read on Google's Java implementation.

So if you're Google, and you get sued by Oracle, one of the best things you can do to defend from the patent infringement claims is to get the patents reexamined and hope that they get rejected. This has been a very successful tactic; Google's request for USPTO patent reexaminations has, over time, left Oracle with only two patents left to litigate against Google. The reexamined claims in the '205 and '702 patents were rejected due to prior art, as were the reexamined claims in the '720 patent, the '447 patent, and the '476 patent. The '447 patent covered the concept of restricting access to objects based on where a specific program came from. The '720 patent claimed the novel concept of loading classes into a parent process before calling fork() so they would already be present for child processes. The '702 patent claimed the concept of coalescing duplicated objects (constants, for example) in a class file. The '476 patent is about determining access permissions depending on the calling sequence that led to a specific class method. Finally, the '205 patent claims the concept of a just-in-time compiler.

The only remaining patents are the '520 and the '104 patents.:

  • The '104 patent, reissued in 2003, claims a "method and apparatus for resolving data references in generated code"; the method describes generating and interpreting executable code, and changing symbolic references in the code to numerical references when the code is interpreted. Cameron McKenzie of TheServerSide.com aptly characterized this as claiming the very basic idea that "if you rid your code of symbolic references, and replace them with direct references, things are more efficient".

  • The '520 patent claims a "method and system for performing static initialization". Essentially, the virtual machine replaces a bunch of instructions initializing an array with a copy of the resulting array, speeding the initialization process.

With only these two patents left to litigate, Oracle is left hoping that it can claim a relatively low sum of damages from Google for alleged patent infringement.

What exactly has Oracle alleged in its copyright infringement claim? Oracle claimed [PDF] infringement of "(a) 37 Java API design specifications and implementations and (b) 11 Java software code files". Google's defense here looks strong, and there are indications that the court agrees. For example, a recent court order [PDF] asked Oracle to explain how Baker v. Selden applies to its copyright claims. In that case, the Supreme Court clearly established that one cannot use copyright to stop people from using the ideas contained in an expressive work; one can only use copyright to restrict use of the particular expressive work itself. Even though the same court order asked Google to address Sun's limitations on permitted uses of Apache Harmony APIs, it appears that the judge might view Baker as implying that one cannot use copyright to restrict API reimplementations in the way that Oracle is claiming in this case.

With regards to the allegedly infringing Java code files, Oracle specified [PDF] them as:

the entire code for AclEntryImpl.java, AclImpl.java, GroupImpl.java, OwnerImpl.java, PermissionImpl.java, PrincipalImpl.java, PolicyNodeImpl.java, and AclEnumerator.java, obtained by decompiling object code [...] [,] code from Arrays.java [...] [and] comments from CodeSource.java [...] [and] from CollectionCertStoreParameters.java

This claim is weak; although these files had previously been in Android, they are no longer part of Android, and had never been distributed as part of an Android device.

At the end of March, Google made a settlement offer that Oracle rejected. The settlement involved donating a fraction of a percentage of total Android revenues until April 2018, but only if Oracle can demonstrate that the '520 and '104 patents had been infringed. Some might interpret this settlement offer as indicating that Google feels Oracle has a decent case, but Google might simply want this litigation to not drag on any longer; litigation is expensive and time-consuming.

How should the free software and open source community react to this litigation? As the proceedings have shown, Oracle has become far less threatening than it may have appeared in the summer of 2010. Most of Oracle's patents have been rejected. As Groklaw has noted, Oracle's copyright claims on its APIs look weak, with Google's defense that the complaint refers to functional, and therefore non-copyrightable, subject matter looking strong. As well, Sun's praise of Android, doesn't really help Oracle's case. Although it is far too early to tell how the case will turn out, what ruling Judge Alsup will give, and whether or not Android will face the need to change its relationship with Java, it is clear that Oracle's case is much, much weaker than it initially seemed in the summer of 2010. It is entirely possible that Android's current implementation of Java will be in excellent legal shape following this case.

It is important to remember that this lawsuit is only one instance of several examples of legal pressure being applied against Android. Apple has launched many patent lawsuits against several Android device manufacturers, with many of them retaliating against Apple with patent lawsuits of their own. Another example is Microsoft's pressuring of Android device makers into patent licensing agreements. Last year, the non-practicing entity Lodsys sued, among others, Android app developers. So, regardless of how Oracle v. Google is resolved, Android, and free software in general, will remain under significant threat from software patents.

Comments (11 posted)

Runtime filesystem consistency checking

By Jake Edge
April 3, 2012
This year's edition of the Linux Storage, Filesystem, and Memory Management Summit took place in San Francisco April 1-2, just prior to the Linux Foundation Collaboration Summit. Ashvin Goel of the University of Toronto was invited to the summit to discuss the work that he and others at the university had done on consistency checking as filesystems are updated, rather than doing offline checking using tools like fsck. One of the students who had worked on the project, Daniel Fryer, was also present to offer his perspective from the audience. Goel said that the work is not ready for production use, and Fryer echoed that, noting that the code is not 100% solid by any means. They are researchers, Goel said, so the community should give them some leeway, but that any input to make their work more relevant to Linux would be appreciated.

Filesystems have bugs, Goel said, producing a list of bugs that caused filesystem corruption over the last few years. Existing solutions can't deal with these problems because they start with the assumption that the filesystem is correct. Journals, RAID, and checksums on data are nice features but they depend on offline filesystem checking to fix up any filesystem damage that may occur. Those solutions protect against problems below the filesystem layer and not against bugs in the filesystem implementation itself.

But, he said, offline checking is slow and getting slower as disks get larger. In addition, the data is not available while the fsck is being done. Because of that, checking is usually only done after things have obviously gone wrong, which makes the repair that much more difficult. The example given was a file and directory inode that both point to the same data block; how can the checker know which is correct at that point?

James Bottomley asked if there were particular tools that were used to cause various kinds of filesystem corruption, and if those tools were available for kernel hackers and others to use. Goel said that they have tools for both ext3 and btrfs, while audience members chimed in with other tools to cause filesystem corruptions. Those included fsfuzz, mentioned by Ted Ts'o, which will do random corruptions of a filesystem. It is often used to test whether malformed filesystems on USB sticks can be used to crash or subvert the kernel. There were others, like fswreck for the OCFS2 filesystem, as well as similar tools for XFS noted by Christoph Hellwig and another that Chris Mason said he had written for btrfs. Bottomley's suggestion that the block I/O scheduler could be used to pick blocks to corrupt was met with a response from another in the audience joking that the block layer didn't really need any help corrupting data—widespread laughter ensued.

Returning to the topic at hand, Goel stated that doing consistency checking at runtime is faced with the problem that consistency properties are global in nature and are therefore expensive to check. To find two pointers to the same data block, one must scan the entire filesystem, for example. In an effort to get around this difficulty, the researchers hypothesized that global consistency properties could be transformed into local consistency invariants. If only local invariants need to be checked, runtime consistency checking becomes a more tractable problem.

They started with the assumption that the initial filesystem is consistent, and that something below the filesystem layer, like checksums, ensures that correct data reaches the disk. At runtime, then, it is only necessary to check that the local invariants are maintained by whatever data is being changed in any metadata writes. This checking happens before those changes become "durable", so they reason by induction that the filesystem resulting from those is also consistent. By keeping any inconsistent state changes from reaching the disk, the "Recon" system makes filesystem repair unnecessary.

As an example, ext3 maintains a bitmap of the allocated blocks, so to ensure consistency when a block is allocated, Recon needs to test that the proper bit in the bitmap flips from zero to one and that the pointer used is the correct one (i.e. it corresponds to the bit flipped). That is the "consistency invariant" for determining that the block has been allocated correctly. A bit in the bitmap can't be set without a corresponding block pointer being set and vice versa. Additional checks are done to make sure that the block had not already been allocated, for example. That requires that Recon maintain its own block bitmap.

These invariants (they came up with 33 of them for ext3) are checked at the transaction commit point. The design of Recon is based on a fundamental mistrust of the filesystem code and data structures, so it sits between the filesystem and the block layer. When the filesystem does a metadata write, Recon records that operation. Similarly, it caches the data from metadata reads, so that the invariants can be validated without excessive disk reads. When the commit of a metadata update is done, the read cache is updated if the invariants are upheld in the update.

When filesystem metadata is updated, Recon needs to determine what logical change is being performed. It does that by examining the metadata block to determine what type of block it is, and then does a "logical diff" of the changes. The result is a "logical change record" that records five separate fields for each change: block type, ID, the field that changed, the old value, and the new value. As an example, Goel listed the change records that might result from appending a block to inode 12:

TypeIDFieldOldNew
inode12blockptr[1]0501
inode12i_size40968192
inode12i_blocks816
bitmap501--01
bgd0free_blocks15001499
Using those records, the invariants can be checked to ensure that the block pointer referenced in the inode is the same as the one that has its bit set in the bitmap, for example.

Currently, when any invariant is violated, the filesystem is stopped. Eventually there may be ways to try to fix the problems before writing to disk, but for now, the safe option is to stop any further writes.

Recon was evaluated by measuring how many consistency errors were detected by it vs. those caught by fsck. Recon caught quite a few errors that were not detected by fsck, while it only missed two that fsck caught. In both cases, the filesystem checker was looking at fields that are not currently used by ext3. Many of the inconsistencies that Recon found and fsck didn't were changes to unallocated data, which are not important from a consistency standpoint, but still should not be changed in a correctly operating filesystem.

There are some things that neither fsck nor Recon can detect, like changes to filenames in directories or time field changes in inodes. In both cases, there isn't any redundant information to do a consistency check against.

The performance impact of Recon is fairly modest, at least in terms of I/O operations. With a cache size of 128MB, Recon could handle a web server workload with only a reduction of approximately 2% I/O operations/second based on a graph that was shown. The cache size was tweaked to find a balance based on the working set size of the workload so that the cache would not be flushed prematurely, which would otherwise cause expensive reads of the metadata information. The tests were run on a filesystem on a 1TB partition with 15-20GB of random files according to Fryer, and used small files to try to stress the metadata cache.

No data was presented on the CPU impact of Recon, other than to say that there was "significant" CPU overhead. Their focus was on the I/O cost, so more investigation of the CPU cost is warranted. Based on comments from the audience, though, some would be more than willing to spend some CPU in the name of filesystem consistency so that the far more expensive offline checking could be avoided in most cases.

The most important thing to take away from the talk, Goel said, is that as long as the integrity of written block data is assured, all of the ext3 properties that can checked by fsck can instead be done at runtime. As Ric Wheeler and others in the audience pointed out, that doesn't eliminate the need for an offline checker, but it may help reduce how often it's needed. Goel agreed with that, and noted that in 4% of their tests with corrupted filesystems, fsck would complete successfully, but that a second run would find more things to fix. Ts'o was very interested to hear that and asked that they file bugs for those cases.

There is ongoing work on additional consistency invariants as well as things like reducing the memory overhead and increasing the number of filesystems that are covered. Dave Chinner noted that invariants for some filesystems may be hard to come up with, especially for filesystems like XFS that don't necessarily do metadata updates through the page cache.

The reaction to Recon was favorable overall. It is an interesting project and surprised some that it was possible to do runtime consistency checking at all. As always, there is more to do, and the team has limited resources, but most attendees seemed favorably impressed with the work.

[Many thanks are due to Mel Gorman for sharing his notes from this session.]

Comments (39 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: Libsecret revealed; New vulnerabilities in chromium, freeradius, phpmyadmin, rpm, ...
  • Kernel: 3.4 Merge window part 3; Extensive Linux storage, filesystem, and memory management summit coverage.
  • Distributions: DuckDuck Debian?; Debian, Gentoo, OmniOS, ...
  • Development: Epiphany: the minimalist GNOME browser; Leo, libam7xxx, netsniff-ng, ...
  • Announcements: Creative Commons 4.0 BY-NC-SA draft, FSF Free Software Award winners, Project Gado, ...
Next page: Security>>

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds