By Jonathan Corbet
April 4, 2012
Two weeks ago, LWN
covered the debate within
the Fedora project over whether its ARM port should be designated one
of that distribution's "primary" architectures. That discussion has
progressed a little further, so an update may be warranted. But it may
also be worthwhile to address a related question: why is there resistance
to the concept of supporting ARM as a primary architecture in the first
place? And why might it make sense to promote the ARM architecture anyway?
One of the things that came out in the original discussion
is that the Fedora project did not have any idea of how to do that. Over
its entire history, the project has never before seriously considered
moving one of its secondary architectures to primary status. So there are
no procedures in place and no criteria by which a decision to promote an
architecture can be made. So, unsurprisingly, the project decided that it
needs to come up with a set of reasonable criteria. On April 2,
Matthew Garrett posted a
draft showing what those criteria might look like.
The rules would appear to make sense. The Fedora infrastructure and
release engineering teams need to have people who are able to represent any
new primary architecture. The project must be able to build packages on its
own servers. Anaconda, the Fedora installer, must work on the targeted
hardware. Maintainers of important packages must have access to the
supported hardware so they can fix problems. No binary blobs. And so on.
Also required is approval from various Fedora teams, each of which can
impose additional criteria if it sees the need. These rules are in an
early form and can be expected to evolve over time, but the early
responses on the mailing list suggest that most people are happy enough
with what has been set down.
That said, there are clearly some people who do not see the point of
supporting ARM as a primary architecture, and they have a number of reasons
for their reluctance. The ARM architecture is messy, for example. The x86
architecture does not have a single design authority, but processors made
by multiple vendors still resemble each other closely enough to create a
fairly tightly-knit processor family. ARM does have a central
design authority, but that authority leaves a lot of significant details up
to individual manufacturers, of which there are many. So ARM is not
a tightly-knit family; it is more like an extended group of hostile
ex-spouses and in-laws who have moved to different continents to get away
from each other.
The looseness of the ARM "platform" had led to a lot of innovation in the
design space; there is no end of interesting ARM system-on-chip designs
with all kinds of impressive integrated peripheral devices. But this
diversity, along with a distressing lack of hardware discoverability, makes
it impossible to create a single kernel that works on all (or even a
significant subset of) ARM processors. Distributors hate having to
maintain multiple kernels, and they hate having to put target-specific
hacks into installers. ARM currently forces both, despite the ongoing work
to consolidate kernel code and move hardware knowledge into the
bootloader-supplied device tree structure.
ARM is also, for many developers, a relatively obscure architecture lacking
the familiarity of x86. The fact that there are vastly more ARM systems
running Linux than x86 systems does not really change that perception; most
of us lack ARM-based development systems on our desks. Additionally, ARM
processors are
relatively slow. That is a problem for developers, who typically need to
keep an x86 system and a cross-compiling toolchain around to be able to get
through more than one edit-compile-test cycle in any given day. That
slowness is also an issue for distributors; it can delay security updates
and distribution releases, even for other architectures. And, while the
hardware is slow, product cycles are fast; by the time developers have
gotten a target working nicely, it may be obsolete and off the market.
Given all of these challenges, it is not surprising that some people would
rather not be bothered by an architecture like ARM. The x86 world provides
plenty of open, high-performing systems with wide support; why get
distracted with that messy architecture where, even if the distribution can
be made to work, the hardware is probably closed and won't allow it to be
installed?
The answer, of course, is that said messy architecture is already
performing much of our computing, and it will likely be doing more of it in
the future. Traditional PC-style systems are no longer the center of
attention; one assumes they will not go away entirely, but a lot of the
action is elsewhere. There is a whole new crowd of makers looking to do
interesting things with ARM-based designs; we are just beginning to see
what can be done with interesting mobile devices, and the bulk of those
devices are not, at this point, using x86 processors. Meanwhile, ARM has
its eyes on data center applications where, some think, its compactness and
power efficiency will make up for its lack of speed. The x86 architecture
will be with us for a long time - even Intel has proved unable to kill it
off in the past - but it is far from the only show in town.
It is also worth remembering that, for all its success, Linux is still a
minority player on x86 systems. But Linux is the dominant system on
ARM-based systems. The "year of the Linux desktop" may be an old and sad
joke, but the year of the Linux gadget looks to be happening for real -
again.
Given that ARM is where much of the action is, it would make sense that a
Linux distribution - especially one that is supposed to be leading-edge and
forward-looking - would want to support ARM as well as possible. Solid
support for the architecture seems like a necessary precondition for any
sort of presence in the interesting computing devices of the near future.
Distributors like Ubuntu appear to have come to that conclusion; they have
built on Debian's longstanding ARM support to create a distribution that,
they hope, will be found in future devices. Without well-established ARM
support, Fedora - along with the distributions derived from it - has little
chance of competing in that area.
So one might well say that the questions being asked in the Fedora
community are wrong. Rather than asking "why should we support ARM when it
presents all of these difficulties?", it might make more sense to ask "how
can we address these difficulties to provide the best ARM-based
distribution possible?". The cynical among us might be tempted to say that
Red Hat, Fedora's sponsor and main contributor, faces a classic disruptive
technology problem. ARM is unlikely to displace x86 in the places where
Red Hat currently sells support, and revenues from any future ARM-based
"enterprise" distribution seem likely to be rather lower than those
obtained from x86-based distributions. So it would be understandable if
Red Hat were to show a lack of enthusiasm for the ARM architecture.
The cynical view is, at best, only partially right, though. Red Hat does
not advertise the resources it is putting into ARM distribution
development, but it clearly has a number of engineers on the task. Even as
a "secondary" architecture, Fedora's ARM distribution has been solid enough
to serve as the base for the ARM-based OLPC XO 1.75 laptop.
Without Red Hat's support, there wouldn't be a Fedora ARM distribution even
with secondary status. So it seems unlikely that Red Hat is the sticking
point here, even if its contributions to the kernel's ARM subtree (29
patches total from 3.0 to the present) show little enthusiasm. More likely
we're just seeing the usual noise as the wider community comes to terms
with what will be required to support this architecture properly.
In the end, the world would not be well served by a single processor
architecture; there is value in diversity. Similarly, an industry where
ARM-based systems are dominated by Android variants may not be the best
possible world. A lot of interesting things are happening in computing,
and many of them involve the ARM architecture; there is a lot of value in
having strong community-based distribution support for that architecture.
That is why Fedora will, in the end, almost certainly bother to support ARM
as a primary architecture despite the challenges it presents.
Comments (28 posted)
April 4, 2012
This article was contributed by Adam Saunders
After over a year and a half of legal proceedings, Oracle
and Google will go to trial on April 16 in front of the United States
District Court for the Northern District of California, to determine
whether or not Google's Android software infringes Oracle's copyrights on
Java, as well as some of its patents. If the parties don't settle, this
trial is expected to take eight weeks.
A lot has happened since the litigation started. In August 2010,
several months after acquiring Sun Microsystems, which developed
Java and held the copyrights, Oracle launched a
lawsuit against Google, claiming that Android's use of Java
infringed seven of Oracle's patents, as well as the Java copyrights
Oracle holds. The complaint demands an injunction against Google
from continuing with its allegedly infringing activity, that
"all copies made or used in violation of Oracle America's
copyrights [...] be impounded and destroyed or otherwise reasonably
disposed of", and that Oracle receive damages. Essentially,
Oracle is formally seeking to stop Google's use of Java in Android,
and wants compensation for that use.
The FSF has argued
that if Google had used an available GPL-licensed version of Java, such as
IcedTea, as part of Android, it would have avoided this
litigation. This may be true; Sun (now Oracle) distributes Java
under GPLv2 with a
linking exception, which is what IcedTea is based off
of. The GPLv2 implicit patent language, contained in sections 6 and 7 of
the license, effectively gives users of Sun/Oracle's distribution of Java a
royalty-free patent license that covers standard free software practices:
the right to use, modify, and redistribute the software, including modified
versions. With the linking exception, permissively licensed software and
proprietary software that links to IcedTea could be developed without being
licensed under GPLv2; thus, the app repository Google Play (formerly known
as Android Market), with its proprietary apps as well as free software
apps, would have still been possible.
However, this argument ignores the fact that the Android project started
before Sun licensed Java under the GPL. Android, Inc. was founded in 2003
and acquired by Google in 2005; the relicensing of Java happened in
November, 2006. When Android started, if a non-Sun
programmer or development project wanted to make an open-source version of
Java while minimizing the threat of copyright infringement, the only
practical way to do this was to rely on clean room reverse engineering;
this is what Google claims to have done. But clean room reverse engineering
is not a helpful defense in patent litigation, which is why Google's way of
implementing Java in Android - including basing Dalvik off of the Apache
Harmony project, and not Sun/Oracle's GPL'd Java - exposes it to patent
lawsuits from Oracle, assuming Oracle has any valid patents that read on
Google's Java implementation.
So if you're Google, and you get sued by Oracle, one of the best things you
can do to defend from the patent infringement claims is to
get the patents reexamined and hope that they get rejected. This has
been a very successful tactic; Google's
request for USPTO patent reexaminations has, over time, left Oracle
with only two patents left to litigate against Google. The
reexamined claims in the '205 and '702 patents were rejected due to prior
art, as were the reexamined claims in the '720 patent,
the
'447 patent, and the
'476 patent. The '447
patent covered the concept of restricting access to objects based on
where a specific program came from. The '720
patent claimed the novel concept of loading classes into a parent
process before calling fork() so they would already be present for
child processes. The '702
patent claimed the concept of coalescing duplicated objects
(constants, for example) in a class file.
The '476 patent
is about determining access permissions depending on the calling sequence
that led to a specific class method. Finally, the '205
patent claims the concept of a just-in-time compiler.
The only remaining patents are the '520
and the '104 patents.:
- The '104
patent, reissued in 2003, claims a "method and apparatus for resolving
data references in generated code"; the method describes generating and
interpreting executable code, and changing symbolic references in the code
to numerical references when the code is interpreted. Cameron McKenzie of
TheServerSide.com aptly characterized this as claiming the very basic idea
that "if
you rid your code of symbolic references, and replace them with direct
references, things are more efficient".
- The '520
patent claims a "method and system for performing static
initialization". Essentially, the virtual machine replaces a bunch of
instructions initializing an array with a copy of the resulting array,
speeding the initialization process.
With only these two patents left to
litigate, Oracle is left hoping that it can claim a relatively low sum of
damages from Google for alleged patent infringement.
What exactly has Oracle alleged in its copyright infringement claim? Oracle
claimed [PDF]
infringement of "(a) 37 Java API design specifications and
implementations and (b) 11 Java software code files". Google's
defense here looks strong, and there are indications that the court
agrees. For example, a recent court order [PDF]
asked Oracle to explain how Baker v. Selden applies to its copyright
claims. In that case, the Supreme Court clearly established that one cannot
use copyright to stop people from using the ideas contained in an
expressive work; one can only use copyright to restrict use of the
particular expressive work itself. Even though the same court order asked
Google to address Sun's limitations on permitted uses of Apache Harmony
APIs, it appears that the judge might view Baker as implying that one
cannot use copyright to restrict API reimplementations in the way that
Oracle is claiming in this case.
With regards to the allegedly infringing Java code files, Oracle specified [PDF]
them as:
the entire code for AclEntryImpl.java, AclImpl.java,
GroupImpl.java, OwnerImpl.java, PermissionImpl.java, PrincipalImpl.java,
PolicyNodeImpl.java, and AclEnumerator.java, obtained by decompiling object
code [...] [,] code from Arrays.java [...] [and] comments from
CodeSource.java [...] [and] from CollectionCertStoreParameters.java
This claim is weak; although these files had previously been
in Android, they are
no longer part of Android, and had never
been distributed as part of an Android device.
At the end of March, Google made a settlement offer that Oracle rejected.
The settlement involved donating
a fraction of a percentage of total Android revenues until April 2018, but
only if Oracle can demonstrate that the '520 and '104 patents had been
infringed. Some might interpret this settlement offer as indicating that
Google feels Oracle has a decent case, but Google might simply want this
litigation to not drag on any longer; litigation is expensive and
time-consuming.
How should the free software and open source community react to this
litigation? As the proceedings have shown, Oracle has become far less
threatening than it may have appeared in the summer of 2010. Most of
Oracle's patents have been rejected. As Groklaw has noted,
Oracle's copyright claims on its APIs look weak, with Google's defense that
the complaint refers to functional, and therefore non-copyrightable,
subject matter looking strong. As well, Sun's praise
of Android, doesn't really help Oracle's case. Although it is far too
early to tell how the case will turn out, what ruling Judge Alsup will
give, and whether or not Android will face the need to change its
relationship with Java, it is clear that Oracle's case is much, much weaker
than it initially seemed in the summer of 2010. It is entirely possible
that Android's current implementation of Java will be in
excellent legal shape following this case.
It is important to remember that this lawsuit is only one instance of
several examples of legal pressure being applied against Android. Apple has
launched
many patent lawsuits against several Android device manufacturers, with
many of them retaliating against Apple with patent lawsuits of their
own. Another example is Microsoft's
pressuring of Android device makers into patent licensing
agreements. Last year, the non-practicing entity Lodsys sued,
among others, Android app developers. So, regardless of how Oracle
v. Google is resolved, Android, and free software in general, will
remain under significant threat from software patents.
Comments (11 posted)
By Jake Edge
April 3, 2012
This year's edition of the Linux Storage, Filesystem, and Memory Management
Summit took place in San Francisco April 1-2, just prior to the Linux
Foundation Collaboration Summit.
Ashvin Goel of the University of Toronto was invited to the summit to
discuss the work that he and others at the university had done on
consistency checking as filesystems are updated, rather than doing offline
checking using tools like
fsck. One of the students who had
worked on the project, Daniel Fryer, was also present to offer his
perspective from the audience. Goel said that the work is not ready for
production use, and Fryer echoed that, noting that the code is not 100%
solid by any means. They are researchers, Goel said, so the community
should give them some leeway, but that any input to make their work more
relevant to Linux would be appreciated.
Filesystems have bugs, Goel said, producing a list of bugs that
caused filesystem corruption over the last few years. Existing solutions
can't deal with these problems because they start with the assumption that
the filesystem is correct. Journals, RAID, and checksums on data are nice
features but they depend on offline filesystem checking to fix up any
filesystem damage that may occur. Those solutions protect against problems
below the
filesystem layer and not against bugs in the filesystem implementation itself.
But, he said, offline checking is slow and getting slower as disks get
larger. In
addition, the data is not available while the fsck is being done.
Because of that, checking is usually only done after things have obviously gone
wrong, which makes the repair that much more difficult. The example given
was a file and directory inode that both point to the same data block; how
can the checker know which is correct at that point?
James Bottomley asked if there were particular tools that were used to
cause various kinds of filesystem corruption, and if those tools were
available for kernel hackers and others to use. Goel said that they have
tools for both ext3 and btrfs, while audience members chimed in with other
tools to cause filesystem corruptions. Those included fsfuzz, mentioned by
Ted Ts'o, which will do random corruptions of a filesystem. It is often
used to test whether malformed filesystems on USB sticks can be used to
crash or subvert the kernel. There were others, like fswreck for the OCFS2
filesystem, as well as similar tools for XFS noted by Christoph Hellwig and
another
that Chris Mason said he had written for btrfs. Bottomley's suggestion
that the block I/O scheduler could be used to pick blocks to corrupt was
met with a response from another in the audience joking that the block
layer didn't really need any help corrupting data—widespread laughter
ensued.
Returning to the topic at hand, Goel stated that
doing consistency checking at runtime is faced with the problem that
consistency properties are global in nature and are therefore expensive to
check. To find two pointers to the same data block, one must scan the
entire filesystem, for example. In an effort to get around this
difficulty, the researchers
hypothesized that global consistency properties could be transformed into
local consistency invariants. If only local invariants need to be
checked, runtime consistency checking becomes a more tractable problem.
They started with the assumption that the initial filesystem is consistent,
and that something below the filesystem layer, like checksums, ensures that
correct data reaches the disk. At runtime, then, it is only necessary to
check that the local invariants are maintained by whatever data is being changed
in any metadata writes. This checking happens before those changes become
"durable", so they reason by induction that the filesystem resulting from
those is
also consistent. By keeping any inconsistent state changes from reaching
the disk, the "Recon" system makes filesystem repair unnecessary.
As an example, ext3 maintains a bitmap of the allocated blocks, so to
ensure consistency when a block is allocated, Recon needs to test that the
proper bit in the bitmap flips from zero to one and that the pointer used is the
correct one (i.e. it corresponds to the bit flipped). That is the
"consistency invariant" for determining that the block has been allocated
correctly. A bit in the bitmap can't be set without a corresponding block
pointer being set and vice versa. Additional checks are done to make sure
that the block had not already been allocated, for example. That requires
that Recon maintain its own block bitmap.
These invariants (they came up with 33 of them for ext3) are checked at the
transaction commit point. The design of Recon is based on a fundamental
mistrust of the filesystem code and data structures, so it sits between the
filesystem and the
block layer. When the filesystem does a metadata write, Recon records
that operation. Similarly, it caches the data from metadata reads, so that
the invariants can be validated without excessive disk reads. When the
commit of a metadata update is done, the read cache is updated if the
invariants are upheld in the update.
When filesystem metadata is updated, Recon needs to determine what
logical change is being performed. It does that by examining the metadata
block to determine what type of block it is, and then does a "logical diff"
of the changes. The result is a "logical change record" that records
five separate fields for each change: block type, ID, the field that
changed, the old value, and the new value. As an example, Goel listed the
change records that might result from appending a block to inode 12:
| Type | ID | Field | Old | New |
| inode | 12 | blockptr[1] | 0 | 501 |
| inode | 12 | i_size | 4096 | 8192 |
| inode | 12 | i_blocks | 8 | 16 |
| bitmap | 501 | -- | 0 | 1 |
| bgd | 0 | free_blocks | 1500 | 1499 |
Using those records, the invariants can be checked to ensure that the
block pointer referenced in the inode is the same as the one that has its bit
set in the bitmap, for example.
Currently, when any invariant is violated, the filesystem is stopped.
Eventually there may be ways to try to fix the problems before writing to
disk, but for now, the safe option is to stop any further writes.
Recon was evaluated by measuring how many consistency errors were detected
by it vs. those caught by fsck. Recon caught quite a few errors
that were not detected by fsck, while it only missed two that
fsck caught. In both cases, the filesystem checker was looking at
fields that are not currently used by ext3. Many of the inconsistencies
that Recon found and fsck didn't were changes to unallocated data,
which are not important from a consistency standpoint, but still should not
be changed in a correctly operating filesystem.
There are some things that neither fsck nor Recon can detect, like
changes to filenames in directories or time field changes in inodes. In
both cases, there isn't any redundant information to do a consistency check
against.
The performance impact of Recon is fairly modest, at least in terms of I/O
operations. With a cache size of 128MB, Recon could handle a web server
workload with only a reduction of approximately 2% I/O operations/second
based on a graph that was shown. The
cache size was tweaked to find a balance based on the working set size of
the workload so that the cache would not be flushed prematurely, which
would otherwise cause expensive reads of the metadata information. The
tests were
run on a filesystem on a 1TB partition with 15-20GB of random files
according to Fryer,
and used small files to try to stress the metadata cache.
No data was presented on the CPU impact of Recon, other than to say that
there was "significant" CPU overhead. Their focus was on the I/O cost, so
more investigation of the CPU cost is warranted. Based on comments from
the audience, though, some would be more than willing to spend some CPU in
the name of filesystem consistency so that the far more expensive offline
checking could be avoided in most cases.
The most important thing to take away from the talk, Goel said, is that
as long as the integrity of written block data is assured, all
of the ext3 properties that can checked by fsck can instead be
done at runtime. As Ric Wheeler and others in the audience pointed out,
that doesn't eliminate the need for an offline checker, but it may help
reduce how often it's needed. Goel agreed with that, and noted that in 4%
of their tests with
corrupted filesystems, fsck would complete successfully, but that
a second run would find more things to fix. Ts'o was very interested to
hear that and asked that they file bugs for those cases.
There is ongoing work on additional consistency invariants as well as
things like reducing the memory overhead and increasing the number of
filesystems that are covered. Dave Chinner noted that invariants for some
filesystems may be hard to come up with, especially for filesystems like
XFS that don't necessarily do metadata updates through the page cache.
The reaction to Recon was favorable overall. It is an interesting project
and surprised some that it was possible to do runtime consistency checking
at all. As always, there is more to do, and the team has limited resources,
but most attendees seemed favorably impressed with the work.
[Many thanks are due to Mel Gorman for sharing his notes from this session.]
Comments (39 posted)
Page editor: Jonathan Corbet
Next page: Security>>