By Nathan Willis
July 3, 2013
Glibc, the GNU C library, is a foundational free software project,
and one that has served its important role for well over two
decades. Recently, though, it has undergone a shift in its development
model, with an increase in contributions from volunteers outside of
the core team of committers. That sort of change can bring about
strife, but it can also inspire important new features. The latter is
certainly true—and the former might also be true—with the
case of lock elision, which is set to be included for the first time
in the just-frozen 2.18 branch. 2.18 will enable developers to test
the glibc implementation of lock elision for some lock types but not
others—primarily since there is not full consensus within the
project on how to enable such experimental new features.
Lock elision, in general, is a technique for speeding up
multi-threaded programs that may contend for the same lock. As more
and more cores become available, synchronization between threads
becomes increasingly expensive. One way to avoid the
synchronization overhead is to use transactional memory instead. A
memory transaction buffers all of the possible results of an
operation; in the common case where nothing interrupts the operation,
the transaction is committed, but if something does interfere, the
transaction can be rolled back atomically.
For locks, this means that multiple threads can acquire locks on
the same object, and if they are modifying different parts of the
object, both transactions ought to succeed. For example, one thread
can acquire a lock on an object (say, tablefoo), update the
row it is interested in (tablefoo(N)), then release the
lock. Meanwhile, a different thread can also acquire a lock on
tablefoo for the purpose of updating tablefoo(M).
Using memory transactions, both threads can afford to be conservative
about locking the whole object, but cavalier about updating the part
of the object they need—most of the time, both transactions will
go through; only when both threads try to update tablefoo(N)
is there a collision, at which point one transaction must be rolled back.
But the real trick is that, in this common case where the two
threads do not collide, the locks themselves are unnecessary. If the
program is smart enough to recognize this, acquiring and releasing the
locks can simply be skipped. This is lock elision. Unfortunately, up
until recently, real-world implementations of transactional memory
(almost always in software) have been too slow to offer a real
advantage over manipulating the locks in the traditional manner.
That changed, however, with the
debut of Intel's Transactional Synchronization Extension (TSX), a
hardware implementation of transactional memory for the Haswell
generation of CPUs. Consequently, building TSX support into
the lock implementations of common libraries would allow existing
programs to take advantage of lock elision speed-ups without even
recompiling.
Intel's Andi Kleen has been working on a TSX-based lock elision implementation for glibc, which he wrote
about back in January 2013. The 14-part patch
set has been through many iterations, but in late June the
deadline was fast approaching for the glibc 2.18 freeze, and the
status of the patches was still a matter of debate.
The patch set adds elision capabilities to both POSIX thread
(pthread) mutexes and read/write locks (rwlocks), and uses an adaptive
algorithm to decide when to elide locks in a given code path.
Essentially, the algorithm keeps track of whether each mutex succeeds
at eliding a lock, and if it fails, elision is suspended for a period
of time. Not all lock variants are supported; in particular, locks
are never elided for recursive mutexes. Elision is automatically
attempted when the CPU supports transactional memory, but developers
can also explicitly enable or disable it in their code. Kleen's patch
set also offers two environment variables,
GLIBC_PTHREAD_MUTEX and GLIBC_PTHREAD_RWLOCK, which
users can use to explicitly enable or disable each flavor of elision
when starting a program.
Lock down?
The general consensus on the libc-alpha mailing list was that the
mutex patches (patches 1 through 5, plus 7) were ready for inclusion.
However, there was less agreement on the other patches, for three
reasons. First, the glibc team was wary of the rwlock patches due to
a disagreement over how to interpret the POSIX standard. According to
the standard, a "normal" (i.e., non-recursive, non-errorcheck,
non-timed) mutex is required to deadlock if the owner of the lock
attempts to re-lock it while already holding the lock. However, if
the lock in question is elided, this required deadlock does not occur.
It is certainly debatable whether or not avoiding a deadlock is really
a bad thing (after all, deadlocks are bugs), but the glibc project
decided to follow the standard to the letter, and elide only
non-"normal" mutexes.
But what is not clear from the specification is whether or not
the same behavior is required for rwlocks. Carlos O'Donell has
contacted the Austin Group to ask for clarification; if the official
answer is that rwlocks are required to deadlock on re-locks, then the
rwlock patches for glibc will not be merged as-is.
Second, the definition of "normal" for mutexes is not a simple
affair. The POSIX standard in fact requires specific behavior of
"normal" mutexes (such as the deadlock-on-re-lock behavior mentioned
above). But the standard also allows for a different type of mutex
termed "default" mutexes, in which the implementation is allowed more
freedom of behavior. In previous versions of glibc,
PTHREAD_MUTEX_NORMAL and PTHREAD_MUTEX_DEFAULT were
defined to be identical. Kleen's patch set splits them, in order to
allow "default" mutexes to be elided. Technically, not deadlocking on
re-lock would violate the standard, even though not deadlocking
because the lock had been elided would often be seen as a preferable
outcome. But splitting PTHREAD_MUTEX_NORMAL and
PTHREAD_MUTEX_DEFAULT could be seen as an ABI change, and,
even though it would not affect old binaries, several in the project
(such as Roland McGrath) felt that
more consideration is needed before making the split, since it
would be difficult to reverse after the fact.
Finally, there was also a lack of consensus about whether or not
environment variables are ultimately the most appropriate mechanism
with which to tune optional runtime features like lock elision. In
addition to the coarse-grained enable-or-disable-elision functionality
of the new environment variables, Kleen's patch set also adds several
parameters that can be used to tune the behavior of the adaptive
algorithm. Some would prefer adding a "tunables" API, while others see
no problem with adding new environment variables under a well-known
namespace (namely, GLIBC_) as long as there is sufficient
discussion. Plus, since the elision algorithm is brand-spanking-new,
with little or no testing outside of the confines of the glibc
project, it is still possible that the algorithm itself could undergo
a major overhaul before it is ready for real-world use. Offering
tunable parameters is one way for real-world tests to help refine the
algorithm, but if users become dependent on the specifics exposed,
swapping in a different algorithm later becomes trickier.
Testing is a related matter, albeit one not currently holding up
inclusion of the lock elision patch set. IBM's Dominik Vogt has been
testing the patch set on the company's System z
platform, which
is the only other widely available processor architecture to offer its
own implementation of hardware transactional memory. So far, his
tests have produced as many questions
as answers (in part because he is still working his way through the
internal processes required to publicly release the test suite as free
software). But in the long term, providing a test suite is a vital
step for the project—doubtless in coming years there will be
more processor architectures to add their own implementations of
transactional memory.
Friends of the library
O'Donell declared glibc frozen for
2.18 on July 2, after Kleen checked in the approved mutex
patches. As of press time, the project was still waiting to hear back
from the Austin Group for clarification on the rwlock
deadlock-on-re-lock question, and it appears that decisions on the
normal/default split, tunables, and other patches may get deferred
until later. O'Donell has assembled the contributor input on the
tunables question in a page
on the project's wiki. That discussion is expected to
take longer than anyone wishes to keep glibc 2.18 in freeze.
Apart from the technical details of adding lock elision, Kleen's
work to add the feature to glibc can be seen as a case study of the
project's new, consensus-driven development style. For many years,
development was overseen by Ulrich Drepper, who earned a reputation as
a prickly gatekeeper past whom few outside contributions ever made it
to land in the codebase. Other longtime project members (including
McGrath) formed a "steering committee" to try and work around the
practical problems that arose from Drepper's management style. But
Drepper left the project in 2010, and in March 2012, McGrath announced that the steering committee had
voluntarily dissolved, to make way for a more open, community-driven
model.
Kleen's lock elision patch set, coming as it does largely from
outside, is proof that the hard-to-persuade gatekeeper model is indeed
gone. In practice, the glibc community arrived at rough consensus on
the patch set in a series of epic-length list threads, and it could
certainly be argued that the eventual consensus was incomplete. In
fact, O'Donell was even a bit apologetic toward Kleen about how difficult the
process had been, encouraging him to
stick around and continue to contribute. Or, as he put it in a separate message, "We haven't merged something of this complexity in a long time.
Please bear with us as we get better with the process."
But, ultimately, at least part of the mutex lock elision
implementation has been merged, which might not have happened
at all if one project manager were to have made the call in isolation.
The consensus model still defers greatly to the experienced
contributors (like McGrath), but that is certainly appropriate. In
the end, patches were merged, with more or less full agreement, and
the remaining issues are largely debates about the long term impact of
the implementation—not vehement opposition.
Comments (32 posted)
July 3, 2013
This article was contributed by Martin Michlmayr
EuroPython 2013
In a EuroPython keynote (slides
[PDF]), Alex Martelli, a founder of
the Italian Python association and author
of several Python books, shared his thoughts on software development and
more generally on the path toward perfection. He observed that there's a
cultural assumption that we should always be striving for perfection at all
times. Martelli argued instead that "good enough" is often good enough and
that this approach will, in fact, lead to better results in the long run.
Worse is Better
Martelli opened his talk by recounting a
debate that was started
in 1989 by Richard
Gabriel. Gabriel
contrasted two approaches to software design and implementation: the New
Jersey style, also known as "Worse is Better", and the MIT/Stanford
approach, known as "the Right Thing". These approaches can be contrasted
according to four core values: simplicity, correctness, consistency, and
completeness. Martelli observed that "it's hard to argue against any of
these values", but that the two styles weigh the importance of the four
values in different ways, which is important when there's a conflict
between them.
The "Worse is Better" approach puts strong emphasis on simplicity.
Simplicity pertains to both the implementation and the interface, and is
a crucial consideration in the design of a system. Martelli gave Unix as
an example where this approach can be observed. The question to ask,
according to Martelli, is "can I think of a simple implementation of
this design concept?" Correctness is obviously important, but it's more
important to be simple than to be correct. In terms of consistency, the
expectation is not to be overly inconsistent. Finally, completeness can
be sacrificed in favor of any of the other values and it must be
sacrificed if simplicity is threatened. This can be seen in the Unix
philosophy "just do one thing really well", said Martelli, explaining
that "well means simple". In the MIT/Stanford approach, or "the Right
Thing", correctness is a top priority, as is consistency. The focus of
simplicity is on the interface. The back end can be complex as long as
the interface is simple. Completeness is roughly as important as
simplicity.
What this means in practice is that "the Right Thing" philosophy is
dominated by experts — experts who have to make the system perfect
before users can access it. On the other hand, the "Worse is Better"
approach makes use of incremental development. Martelli paraphrased G.
K. Chesterton's quote "if a thing is worth doing, it is worth doing
badly", explaining that by doing it "badly", you get there earlier — and you can work on improving it.
Martelli went on to compare Gabriel's model with Eric Raymond's The
Cathedral and the
Bazaar.
While the former covers the software design process, the latter focuses
on the development process, but there are many parallels. Martelli observed
that the "Cathedral" development style is close to "the Right Thing"
approach — a defining characteristic of both models is that
experts are in charge. There are also many similarities between the
"Bazaar" and the "Worse is Better" model: it's a chaotic, iterative
process in which a crowd is in charge. Raymond's mantra "given enough
eyeballs, all bugs are shallow" emphasizes that bugs are found and fixed
much faster in a crowd-sourced system.
Perfect as a verb
Martelli explained the problem with "perfection", which is that
releasing a "perfect" system implies BDUF — Big Design
Up Front. Everything
must proceed top-down: you need perfect identification of requirements,
a perfect architecture, perfect design, and perfect implementation. The
problem is that this approach takes forever. Martelli observed that
there's always something to improve. The real world also interferes with
this approach, as users or customers don't want to wait forever for a
new release.
In the real world, requirements change all the time, architecture varies
with design choices, design varies with implementation technologies, and
the implementation always has some bugs. In fact, most bugs are only
discovered in real world deployment. Martelli argued that iterative
development is therefore the only viable approach: deploy something, fix
bugs, and improve the system based on feedback.
Summing up his thoughts on achieving perfection, Martelli suggested that
"perfect" should be understood as a verb rather than an adjective.
Perfecting your software is a laudable goal, but it's a process rather
than a state as you never reach perfection — the goalposts keep
shifting all the time.
Bugs are a normal part of this perfection process. While many
programmers believe that they write perfect code, that's not how it
actually works. In 1974, Martelli, then a university student of
hardware design, and two colleagues had to write a Fortran program
together. They used punch cards and the program had to be perfect as
they only had one chance to run it. "You know about pair programming",
Martelli remarked, "this was pair punching". As it turns
out, the program ran perfectly the first time. Unfortunately, this was
the only perfect program he wrote in his 40 year career, so "don't count
on it as your mode of development".
The main question to consider with bugs is whether they cause
irrecoverable losses. As long as your software only causes problems one
can recover from, you're okay, especially if the software is clearly
tagged as beta. If your bug could kill someone, for example because you
work on medical device control software, "a bug could easily cause
irrecoverable losses" and a different approach may be required.
However, this is not the case in most situations.
The other aspect to
consider is whether your reputation can recover from the damage your
bugs cause. Martelli explained that the key is how you respond to bug
reports — a courteous, speedy response to issues is vital, even
when you're not paid by the user. Users spend their time evaluating
your software and reporting issues to you, so they should be respected. "The
person who points out the bug is not my enemy but my best friend",
Martelli noted.
In a weird way, bugs may even be seen as a feature. Martelli mentioned
the service recovery
paradox —
there is some evidence that the customers with the highest level of
satisfaction are not those who never had any problem at all, but those
who successfully have had a problem resolved. While this should not
encourage programmers to introduce bugs on purpose, it shows that
bugs — if properly dealt with — are not the end of the world.
What not to skimp on
While Martelli encouraged a "good enough" approach to software
development, he noted that there are some things you cannot skimp on.
You absolutely need a lightweight, agile process. Martelli doesn't
care which, but the process has to include revision control (which
one doesn't matter as they are all "good enough"), code reviews, and
testing. Proper release engineering practices are also crucial, so you
know what was released as which version. Additionally, you must promote
good coding style, clarity, and elegance. Finally, documentation
cannot be skipped: "if you're not documenting what you're releasing,
you're essentially asking your users to reverse engineer what you've
done". Summarizing these requirements, Martelli explained that "there's
no condition under which cowboy coding is acceptable".
Martelli added that security must be a concern from the start as it's
very difficult or impossible to add this later. He means security
in a general sense, including aspects such as privacy and auditability.
On the other hand, some features that would be nice to have from the
beginning can usually be added by refactoring the code later.
Such features include modularity and a plug-in architecture, an API, and
scalability. "You can incur some technical debt", Martelli suggested,
as long as you do it with care.
Conclusion
Toward the end of the talk, Martelli gave examples from other areas of
life and explained that his "good enough" philosophy is not restricted
to software design and implementation. For example, he asked whether it
makes senses to hire the "perfect employee". It's quite likely that
such a person would not be available for hire anyway, that they would
exceed the budget, or that they simply don't exist. Instead, he
suggested finding a good (not perfect) candidate who is a good match, in
terms of personality and company culture, and to provide training for
missing skills.
Finally, Martelli clarified that his aim is not to lower expectations.
You should dream big, but the best way to achieve those dreams remains
the "release early, release often" paradigm and to learn from real
users' interactions. The abstract of Martelli's talk noted that "this
talk is probably not perfect, but I do think it's good enough" —
in my opinion, and judging from the reaction of the audience, it was
certainly "good enough" to provoke a lot of interesting thoughts and
discussions.
Comments (9 posted)
July 3, 2013
This article was contributed by Neil Brown
Once upon a time, a new programming language could be interesting because
of some new mechanism for structured flow control. An if statement
that could guard a collection of statements would be so much easier
than one which just guarded a goto. Or a for statement which
took control of the loop variable could simplify matrix multiplication
significantly. An illuminating insight into this earlier age can be
found in Knuth's "Structured
Programming with go to statements" [PDF].
Many of the issues that seemed important in 1974 seem very dated
today, but some are still fresh and relevant.
The work of these early pioneers has left us with five basic forms
that appear to be common to most if not all procedural languages: two
conditional constructs, if and switch/case; two looping
constructs, while and for; and one encapsulation construct: the
function or procedure.
While interesting new control flow is unlikely to be a headline item
on a newly developed language these days, each language must embody
concrete choices concerning these structures and it is quite clear that,
while there is similarity, we are far from uniformity. Exploring how
a language handles control flow can provide interesting insights
into the philosophy behind the language. In this article, we will
continue our explorations of Go and Rust by looking at various
control-flow structures, but particularly focusing on the "for" loop.
The background of for loops
The for loop first appeared in programming languages as an easy way to
step through a fixed list of values. We can see this in Fortran, which
used the word do rather than for (here 10 is the label of the
statement after the loop):
do 10 i = 1, 100, 2
and in Algol58:
for i := 1(2)100 do
Algol60 adds some syntactic sugar
for i := 1 step 2 until 100 do
while Pascal dropped the step clause so you would need:
for j := 0 to 49 do
and then set i := j * 2 + 1 inside the loop.
The Algol60 for loop was actually quite rich as can be seen by the
examples here. It is a richness that probably seems
excessive by today's standard.
In C, which came a decade later, several of the ideas in Algol were
generalized and simplified to encapsulate all the interesting
possibilities in just three expressions: initializer, test, and step,
thus:
for (i = 1; i < 100; i += 2)
As the three expressions can be almost arbitrarily complex, very rich
looping constructs can be created from this simple form. The effect
is that the head of the for forms a coroutine that is executed in
concert with the body of the for loop. Control alternates between
one and the other, so that together they achieve the desired result.
The coroutine nature of the for loop's head is made particularly
obvious by the many (over 150) for_each macros that appear in the
Linux kernel. With these the code for one routine is physically quite
separate from the other, emphasizing the separate roles of the two
pieces of code. An example of such a for_each macro, from
include/linux/radix-tree.h is
#define radix_tree_for_each_slot(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
slot = radix_tree_next_slot(slot, iter, 0))
This example is interesting for a couple of reasons.
First, the middle expression — the loop-continue condition — is not
simply a condition, but contains an assignment and is sometimes used to
find the next value. This makes it clear that they aren't simply
expressions with fixed purposes, but rather three separate entry
points into a coroutine.
Secondly, it contains two variables that change throughout the loop:
slot and iter. The slot variable is
the regular loop variable that any for
loop would have, while iter contains extra state for tracking the
path through the list and is largely of internal interest.
While it is primarily internal, it needs to be visible externally, and,
in fact, needs to be declared externally. The for
statement has some properties of a coroutine, but cannot
define local variables for use throughout the loop.
So we see in the C for loop, particularly when combined with other
features of C such as the rich expressions and the macro preprocessor,
a very powerful, though not completely satisfactory, for loop
mechanism. One that will serve as a basis for examining others.
Go for — broke or beautiful?
The for loop in Go comes in three different forms — not quite the
range of Algol60, but seemingly more than C. One form is superficially very similar to that in C: the
parentheses are not required, and the loop body must be a
"block" rather than a simple statement. But these are syntactic
differences which don't affect expressiveness. The earlier
iterative example looks much the same in Go as in C:
for i := 1; i < 100 ; i += 2 { .... }
The parallel ends there, however. Simple for loops will look much the
same, but complex for loops will have to look quite different. This is
partly because Go has no macro preprocessor and partly because Go
expressions are not as rich as C expressions. While the C for loop
simply contains three expressions, the Go for loop contains a "simple
statement", an "expression", and another "simple statement", where
"simple statement" specifically includes assignments and
increments/decrements.
Were we to try a literal translation of the radix tree for_each loop
into Go, we would have mixed success. Go allows the declaration of local
variables inside a for loop head, so there would be no need to declare
slot and iter separately. However, as the condition in a Go for
statement cannot contain assignments, we find a complete literal translation
is impossible. Of course measuring a language by how literal translations from another
language fare is far from reasonable — we may not be using the best
tool for the job and, as already noted, there are other forms of the
for loop in Go.
The second form is really a reduced version of the first, with the two
simple statements missing and, thus, their semicolons discarded:
for i < 100 { ... }
That form is essentially what many other language would call a while loop.
This leaves the final form — the for/range loop.
for x := range expression { ... }
will iterate though members of the result of the expression in various
ways depending on the type of the result. This makes explicit a
difference from the for loops in the earlier languages. For
Fortran, ALGOL, and Pascal, the for loop dealt with sequences of
numbers, or possibly "enumerated constants" which are very number-like.
As we have seen, C can work with arbitrary values and the Go range
clause make it clear that this loop is for much more than just numbers.
The value can be an array, a slice (part of an array), a string (of
Unicode characters), a map (also known as a "hash",
"associative array", or "dictionary" in other languages), or a
"channel" (used for IPC). In the first four cases the for loop steps through the
components of the value in a fairly obvious way. Channels are
a bit different and will be examined shortly. As range does
not work with user-defined types at all, we cannot
translate our "radix_tree" loop directly into for/range and so must
look elsewhere.
A reasonable place to look might be some existing body of Go code to
see how such things are done. Though the Go compiler is not written
in Go, the Go language source distribution includes many tests,
libraries, examples, and tools written in Go, with a total of 2418 .go
source files, all of which were presumably written by people quite familiar
with the
language. Altogether, there are over 7000 for loops to consider.
Of these, 1200 are of the while loop form, nearly 2800 are for/range
loops, and the remaining 3000 are in the three-part form, the vast majority
of which have a numeric loop variable (demonstrating that the numeric
loops of yesteryear are very much alive and well). So there are not a
lot of examples of iterating user-defined data structures — a fact which
itself might be significant.
One example of interest is in
src/pkg/container/list/list_test.go:
for e := l.Front(); e != nil; e = e.Next() {
le := e.Value.(int)
....
This example is not vastly unlike the for_each macros we saw written
in C. The syntax is clearly different, but the idea of having a very
simple "head" on the for loop, with the actual code for the
coroutine being off in a different file, is represented quite clearly.
The for loop fragment given could easily be for almost any data
structure. If there was a desire to keep the value (le above) more
distinct from the iterator (e above), a construct like:
for slot, iter, ok := l.Front(); ok; slot, ok = iter.Next() {
could return a sequence of slots using an iterator much like the
radix_tree_for_each_slot loop we saw earlier. This construct is
really quite elegant and extremely general.
Another interesting example occurs in various files in
src/pkg/net,
such as src/pkg/net/hosts.go and takes the form:
for line, ok := file.readLine(); ok; line, ok = file.readLine() {
This is very similar to the Front/Next example, except that Front
and Next are identical. This could be considered to violate the DRY
principle: Don't Repeat Yourself.
In C, this sort of loop is regularly written as:
while ((line = fgets(buf, sizeof(buf), file)) != NULL) {
but that cannot be used in Go, as expressions do not include assignments.
This issue of expressions excluding assignments has clearly not gone
unnoticed by the Go designers. If we look at the if and
switch statements, we see that, while they can be given a
simple expression, they can also be given a simple statement as well,
such as:
if i := strings.Index(s, ":"); i >= 0 {
which includes both an assignment and a test. This would work quite
nicely for the readLine loop:
while line, ok := file.readLine(); ok {
except that Go does not provide a while loop — only a for loop. Though the for loop does include two simple statements, neither are
executed at a convenient place to make this loop work as expected. So
if we are to remove the repetition of the readLine call, we must look
elsewhere.
One possibility is to explore the fact that while expressions do not
include assignments, they do include function calls and functions can
include assignments. Go supports function literals. This means that the body of a function
can be given anywhere the name of a function can be used. The body of
a function may be assigned to a variable, or it may be called in
place. Further, the function so defined can access any variables that
are in the same scope as the function. So:
for line := "";
func() (ok bool) {
line, ok = file.readLine()
}(); {
is a for loop in the three-part form which behaves much the same as
the example above from hosts.go but without repetition.
The "initialize" part of the for loop (line := "")
declares a new variable, line which is initialized to the empty
string (it syntactically needs to be initialized to something, though
the value won't be used).
The "condition" part of the loop is an immediate call to a function
literal which calls file.readLine(), returns the ok part of the
result and has a side effect of assigning the line part of the
result to the line variable.
The = form of assignment is needed in the function, rather than the
:= form, so that it does not declare a new line variable, which is
local to the function, but instead uses the one local to the for
loop.
The "next" part of the loop is empty, and appears between the second
; and the {.
While this does remove the unfortunate repetition of the readLine
call, the cure turns out to be much worse than the disease, as the loop
is close to unreadable. While function literals certainly have their
place, this is not that place.
This leaves one more possibility to explore — it is time to examine that
"range channel" construct hinted at earlier.
Channels
Concurrency and multiple threads (known as goroutines) are deeply
embedded in Go, and the preferred mechanism for communicating between
goroutines is the "channel". A channel is somewhat like a Unix
pipe. It conceptually has two ends, and data written to one end can be
read from the other. While a pipe can only pass characters or strings
of characters, a channel can pass any type known to Go, including
other channels.
for i := range my_channel {
will repeatedly assign to i each value received from my_channel
and then run the body of the for loop. This is a lot like our
readLine example — if only we could make lines appear on a channel.
And, of course, we can.
func lines (file *file) (<- chan string) {
ch := make(chan string)
go func () {
for {
line, ok := file.readLine()
if !ok { break }
ch <- line
}
close(ch)
}()
return ch
}
This lines function creates a channel (the make function) and
starts a goroutine (the function literal after the go keyword) that
sends lines back over the channel. This could be called as:
for line := range lines(file) {
which will very cleanly iterate over all the lines in the file with
no violation of the DRY principle.
However, further examination shows that this isn't really ideal. It
certainly works in the simple case, but problems arise when you
break or return out of the for loop. When you do that, the
channel is not destroyed and the goroutine remains in existence trying
to write to it, though no one will ever read it again.
Go has built in garbage collection that will reclaim unreferenced
memory, but not unreferenced goroutines.
In order to clean up properly here, we would need to close the channel
after breaking out of the for loop. Strangely only the write end of a
channel can be closed and, since the return value of our lines function is
currently the read end (<- chan string), we need to change it to
return the double-ended channel. We also need to declare a variable
to hold the channel:
func lines (file *file) (chan string) {
ch := make(chan string)
go func () {
for {
line, ok := file.readLine()
if !ok { break }
ch <- line
}
close(ch)
}()
return ch
}
...
c := lines(file)
defer close(c)
for line := range c { ... }
Now we have a for loop that iterates over lines in a file, but that we
can break out of without leaking channels or goroutines. However it
isn't really elegant any more. Needing to return both ends of the channel,
needing to declare a separate variable to hold that channel, and the
explicit defer close are all warts which tarnishes the elegant:
for line := range lines(file)
The conclusion is that despite the repetition, the form used in the
net package of:
for line, ok := file.readLine(); ok; line, ok = file.readLine() {
does seem to be the best way to implement the task. All of the
alternatives fall short.
From loops to philosophy
It is in that last observation that part of the philosophy of Go seems
to show itself. While Go offers a lot of functionality, it often
seems quite restrictive in how this functionality is accessed. This is reminiscent of the 13th aphorism from the Zen of Python:
There should be one — and preferably only one — obvious way to do it.
We see this restrictiveness in for loops where the range syntax is
only available for built-in types, and where the first/next structure is
really the only way to do other for loops, even if it involves
repeating yourself.
We can see a similar pattern with inter-goroutine communication, where
channels have a privileged status. There are several language
facilities that only work with raw channels much like for/range only
works with internal data types. Send (ch <- v), receive (v <- ch), and the
select statement (which is a bit like switch but chooses
which of several blocking operations is ready to run) are completely
unavailable to user-defined types.
Where Python provides a default implementation for "maps", but allows
a class to provide an independent implementation using the same
syntax, Go provides a built in "map" data type and permits no
substitutes. The Go FAQ makes it clear that this is a conscious
decision and not an oversight:
We believe that Go's implementation of maps is strong enough that
it will serve for the vast majority of uses.
This is probably why we found so few examples of iterating user-defined
data structures in the Go code — maps are used instead.
Finally, even the syntax has an element of restrictiveness. We saw
this briefly in a previous article where the handling
of semicolons impose certain style choices on the programmer. We can
see it also in the go fmt command, which will reformat the code
in a .go file to follow a particular standard. While this is not
imposed on programmers, the language designers recommend the
use of go fmt to ensure that code follows the one true layout.
This philosophy certainly has a lot to recommend it. By removing
options from the programmer, the language removes the need to make
choices and so frees the programmer to focus on the actual
functionality that they need. It is a philosophy that also imposes heavy requirements on the language and
support environment. If there is only one way to do something, then
that one way had better work extremely well. Given the vibrant
community that has been built up around Go, and the strong emphasis on
performance shown in the recent release of Go 1.1, it seems likely
that Go does live up to this requirement
Rusty loops
Turning to Rust we see a very different style of for loop.
The example loop we started with which iterates over odd values from 1
to 99 would look like:
for uint::range_step(1, 100, 2) |i| { ... }
Here the:
|i| { ... }
piece is a function literal, similar to those we saw when exploring Go,
though
with a very different syntax and a different name. Rust like many other
languages calls it a lambda expression. It consists of a list of formal
parameters between vertical bars, and a statement block.
The
uint::range_step(1, 100, 2)
is a reference to a function called range_step in the
uint module. The uint::range_step()
function actually takes 4 arguments: start, stop, step, and function. The behavior of range_step() is to call
function, repeatedly passing values from start up to the
stop, incrementing by step each time. Consequently our for loop
could be realized simply by:
uint::range_step(1, 100, 2, |i| {
...
})
There are two problems with this. A minor point is that the syntax is
arguably less pleasing than the first version. More importantly,
constructs like break and continue don't have any meaning inside a
function literal, so they could not affect the flow of this second loop.
The for statement addresses both of these. It provides syntax for
writing the function literal outside the normal list of function
parameters and it gives meaning to break, loop (the Rust
equivalent of continue), and return.
By convention, the function in the head of for should stop looping when
the function argument that it calls returns false. The for statement
uses this by effectively translating break to return false and
loop to return true. If any return statement appears in the body of the for loop, it is also
translated to something that will "do the right thing".
This seems like a fairly complex set of transformations, but the end
result is extremely flexible. It allows a very clear separation of the
two coroutines that make up a for loop, with the head routine having
the full power of a regular function that is able to declare local variables
and to communicate in arbitrary ways with the body routine.
Both the "iterate over all the lines in a file" loop which we struggled
with in Go, and the radix tree loop from the Linux kernel, would be
trivial to implement as an iterator routine in Rust. The first of
these would look like:
pub fn every_line(f: @io::Reader, it: &fn(&str) -> bool) {
while !f.eof() {
let line = f.read_line();
if !it(line) { break }
}
}
and could be called as:
let f = io::file_reader(&Path("/etc/motd")).get();
for every_line(f) |line| {
io::println(fmt!("Line is %s", line));
}
This power to write elegant iterators is not without its
cost. While Rust allows an arbitrary function to provide the head of
the for loop, it also requires the head of the for
loop to be some function. The simple initialize, test, increment
form of C and Go cannot be used.
If we go back and look at the nearly 3000 for loops in the Go source
code that use a numeric loop variable, we find that the vast majority
of them could be implemented using uint::range_step() or even the simple
uint::range(). But not all. Some examples include:
for ; i > 0; i /= 10 {
for (mid = (bot+top)/2; mid < top; mid = (bot+top)/2) {
for n := 1; n <= 256; n *= 2 {
for rate := 0.05; rate < 10; rate *= 2 {
for parent := ".."; ; parent = "../" + parent {
(the last one does not have a numeric variable of course, but is still
a useful example).
Several of these could be supported by adding a very small number of
extra iterators to the standard library, the rest could just
as easily be implemented with a while loop. So this limitation
doesn't really limit Rust a significant amount.
A Rusty philosophy?
We see, in the for loops of Rust, a very different philosophy to that
of Go. While Go forces you into a particular mold, Rust lets you
build your own mold with enormous freedom. You could even modify the
exact behavior of break inside your for loops if that seems like a
useful thing to do.
This freedom and flexibility extends to other parts of Rust too. In
last month's article, we
saw that Rust does not draw a distinction between
expressions and statements, so it allows if and match constructs (the
latter being similar to switch) deeply inside expression, whereas Go
does not permit such things.
Rust goes even further with a rich macro language that can
declare which syntactic elements (e.g. identifier, expression, type)
may replace each macro parameter, and can repeat the body of the macro
if the parameter is a list. This leaning towards extreme flexibility seems to pervade Rust and is
reminiscent of the Perl programming motto: There is more than one way to
do it.
Summary
There will always be a tension in language design between allowing
the programmer freedom of expression and guiding the programmer
toward clarity of expression. In a previous article, we saw
how the type system of Rust prefers clarity over freedom. Go is not
such a stickler, and is satisfied with run-time type checks in places
where Rust would insist on compile-time checks. Here, when we look at the structuring of statements and expressions, we
find Rust prefers freedom while Go seems more focused on clarity by
eliminating unnecessary flexibility.
Which of these is to be preferred is almost certainly a very personal
choice. Some people rebel against a constraining environment,
others relish the focus it allows them. Both provide room for
creativity and productivity. Go and Rust provide very different
points in the spectrum of possibilities and it is good to have that
choice ... except that it does mean that you have to choose.
Comments (29 posted)
For those who might be interested in putting a talk proposal in for an
upcoming conference, we have added a new feature to the weekly
Announcements page. The LWN events
calendar has long been a feature of the site, but the CFP deadlines calendar was
added more recently. Now the information from that calendar will
also be posted to the Announcements page in
tabular form. Hopefully that will help
everyone keep track of those deadlines and lead to more submissions of
interesting talks to the numerous conferences in our communities.
Comments (2 posted)
Page editor: Jonathan Corbet
Security
By Jake Edge
July 3, 2013
The reporting of 1200 bugs, some of which may have security
implications, is
sure to overwhelm any distribution's bug handling abilities. So it was
rather helpful that Alexandre Rebert started out by posting to the debian-devel mailing list
rather than just flooding the bug tracker.
Beyond just the sheer number of bugs, though, there is a question of
dealing with so many potential security issues, which are generally handled
differently than regular bugs.
Rebert
and other security researchers at Carnegie Mellon University (CMU) found the bugs
in binaries from the Debian repositories using an automated bug finder
called Mayhem [PDF]
Mayhem is a closed-source research project at CMU CyLab that
uses symbolic execution on binary programs to find exploitable bugs in the
code.
It does its job by looking for load and store instructions that can be
influenced by the inputs to the program. It examines the paths
through the program using a "hybrid symbolic execution" mechanism that
combines normal execution of the program with symbolic execution of an
intermediate language representation that is created whenever a tainted
(i.e. dependent on
user input) branch condition is detected. The symbolic execution looks for
ways to exploit the tainted code and builds an exploit if it can. The
Mayhem paper goes into a lot more detail, perhaps enough for others to
reproduce the technique.
The bugs are "exploitable" in the sense that each crash can execute arbitrary
code. While code execution bugs are serious, the programs in question are
typically run by regular users from the shell, so being able to get a shell
(which is the usual proof of concept used by demonstration exploits as
well as by Mayhem) is not a huge accomplishment. But being able to get a
shell means that an exploit could do anything the user could do, including
exposing or deleting files, participating in a botnet, sending spam, and so
on. The exploits require specially crafted arguments and/or input files to
trigger the bugs, so users would have to be tricked into running the
programs that way.
Of course, any setuid programs or those accessible via the web or other
internet services are a much larger concern. That's not to downplay what
the Mayhem team has done in any way, but fuzzing has shown us that
arbitrary inputs to programs often lead to crashes—the trick is finding a
way to get users to provide crafted inputs that lead to an interesting (to
the attacker) result. Regardless, the bugs do need to be fixed, and
the Mayhem team has provided a wealth of information to do just that.
Each bug report comes with a tar file (an example for
gcov was provided with Rebert's message) that contains a script to
reproduce the problem, files containing the arguments and input that cause
the crash, the core dump, and more. Reports for each of the bugs were sent
to the appropriate Debian package maintainers, though some of those
addresses were
actually mailing lists, as Paul Wise pointed
out. That allows us to see some of the reports, including
one
for the nfsidmap binary in the nfs-common package. Rebert's
message also linked to a text file that lists
all of the affected packages and their maintainers.
There are almost certainly more bugs out there for Mayhem to find as the
team limited the search space of the tool, allowing just five minutes of
run time per binary. They also limit the bugs reported to one per binary
and five per package. There are likely to be plenty of duplicate bugs on
the list as well; bugs in libraries may well appear for multiple binaries.
And, of course, the bugs aren't limited to Debian, as many of the packages
will be in the repositories of lots of different distributions; all or
nearly all of them will not be Debian-specific at all.
Unfortunately, there is no automated way to extract addresses for the
upstream developers or mailing lists from the Debian packages. The bug
reports may ultimately need to make their way upstream, but the Mayhem team
couldn't find a way to do that, so they started with the Debian
maintainers. As Andreas
Tille noted, some
packages may have implemented the machine-readable debian/copyright
file, which might provide an upstream contact and email address. But,
for security reports, even that may not be the right place to send the
message.
But, in fact, Rebert has recognized that the
security tag on most of the proposed bug reports was probably not accurate. "It looks like a majority of the crashes have
little security implications", he said, so that tag will be removed
before the actual bug reports get submitted. It isn't clear that a
security contact would be needed in the majority of cases but, since Mayhem
sets out to find exploitable bugs, "responsible disclosure" might still
indicate that a security list or email should be used to report the problems.
The problem is, in some ways, similar to the question of where bugs should be filed that we
reported on last week. Which bug tracker (distribution or upstream) to use
is contentious enough when looking at single bugs reported by users; 1200
bugs increases the scale of the problem significantly. The clear
indication is that Mayhem can find lots more if it were given free rein,
though the duplicates need to eliminated or substantially reduced or the
team risks overwhelming distributions and upstreams.
The "huge pile of bugs" problem is a consequence of the closed-source
nature of Mayhem. If the tool were available to be used by various
projects' developers as part of their testing, the bugs could be
found and fixed in the normal course of development. Rebert mentioned the
possibility of creating some kind of Mayhem web service, but it would be
far more useful if the tool was free software (even "free as in beer" would
be better than the existing situation). Since public funds were used to
develop the tool, one might hope the public would get a bit more out of
that spending. The Mayhem paper mentions that the
US Defense Advanced Research Projects
Agency (DARPA) helped fund some of the work, but, alas, that funding doesn't
seem to come with a mandate to publish the source.
It's clear that running Mayhem on the 23,000 or so binaries found in the
Debian "Wheezy" repository has found real bugs, some of which are
"exploitable" in limited scenarios. Some are probably worse than that,
however, and as the tool gets improved, it may be able to narrow in on more
dangerous bugs. One might guess that CMU and the Mayhem developers plan
to commercialize Mayhem. That is, of course, their prerogative, but it is
unfortunate that tools like Mayhem and the Coverity static analyzer
(which came out of Stanford University)
are not free software tools. One suspects they would see much more
use—and, possibly,
improvement—if they were.
Comments (9 posted)
Brief items
If I could, I would repeal the Internet. It is the technological marvel of
the age, but it is not — as most people imagine — a symbol of
progress. Just the opposite. We would be better off without it. I grant its
astonishing capabilities: the instant access to vast amounts of
information, the pleasures of YouTube and iTunes, the convenience of GPS
and much more. But the Internet's benefits are relatively modest compared
with previous transformative technologies, and it brings with it a
terrifying danger: cyberwar.
—
Robert
J. Samuelson throws the baby out with the bath water
I find it hilarious that Redhat cripples their cryptographic security
software. In the sense that it makes me wonder about the rest of their
security processes and software. What the...
—
Jacob Appelbaum
The ancients, given a chance to observe today's intelligence and spying
brouhaha, would likely assert that the gods are laughing at us, finding
hilarious our public attempts at indignation not only over what is being
done, but our laughable efforts to pretend that we didn't know about it all
along.
—
Lauren Weinstein
The biological world is also open source in the sense that threats are
always present, largely unpredictable, and always changing. Because of
this, defensive measures that are perfectly designed for a particular
threat leave you vulnerable to other ones. Imagine if our immune system
were designed to deal only with a single strain of flu. In fact, our immune
system works because it looks for the full spectrum of invaders — low-level
viral infections, bacterial parasites, or virulent strains of a pandemic
disease. Too often, we create security measures — such as the Department of
Homeland Security's
BioWatch program — that spend too many resources to deal specifically with a very narrow range of threats on the risk spectrum.
—
Rafe Sagarin
Comments (7 posted)
Bluebox Security
claims
to have found a way to modify code contained within an Android application
package without breaking the associated cryptographic signature.
"
All Android applications contain cryptographic signatures, which
Android uses to determine if the app is legitimate and to verify that the
app hasn’t been tampered with or modified. This vulnerability makes it
possible to change an application’s code without affecting the
cryptographic signature of the application – essentially allowing a
malicious author to trick Android into believing the app is unchanged even
if it has been." The problem was evidently disclosed to Google in
February; details are promised at the
Black Hat USA
conference starting July 27.
Comments (2 posted)
New vulnerabilities
ffmpeg: multiple vulnerabilities
| Package(s): | ffmpeg |
CVE #(s): | CVE-2013-3671
CVE-2013-3672
CVE-2013-3673
CVE-2013-3674
|
| Created: | June 27, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Mageia advisory:
* CVE-2013-3671:
The format_line function in log.c in libavutil uses inapplicable offset
data during a certain category calculation, which allows remote attackers
to cause a denial of service (invalid pointer dereference and application
crash) via crafted data that triggers a log message.
* CVE-2013-3672:
The mm_decode_inter function in mmvideo.c in libavcodec does not validate
the relationship between a horizontal coordinate and a width value, which
allows remote attackers to cause a denial of service (out-of-bounds array
access and application crash) via crafted American Laser Games (ALG) MM
Video data.
* CVE-2013-3673:
The gif_decode_frame function in gifdec.c in libavcodec does not properly
manage the disposal methods of frames, which allows remote attackers to
cause a denial of service (out-of-bounds array access and application crash)
via crafted GIF data.
* CVE-2013-3674:
The cdg_decode_frame function in cdgraphics.c in libavcodec does not validate
the presence of non-header data in a buffer, which allows remote attackers to
cause a denial of service (out-of-bounds array access and application crash)
via crafted CD Graphics Video data.
|
| Alerts: |
|
Comments (none posted)
Foreman: multiple vulnerabilities
| Package(s): | Foreman |
CVE #(s): | CVE-2013-2113
CVE-2013-2121
|
| Created: | June 28, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Red Hat advisory:
A flaw was found in the create method of the Foreman Bookmarks controller.
A user with privileges to create a bookmark could use this flaw to execute
arbitrary code with the privileges of the user running Foreman, giving them
control of the system running Foreman (such as installing new packages) and
all systems managed by Foreman. (CVE-2013-2121)
A flaw was found in the way the Foreman UsersController controller handled
user creation. A non-admin user with privileges to create non-admin
accounts could use this flaw to create admin accounts, giving them control
of the system running Foreman (such as installing new packages) and all
systems managed by Foreman. (CVE-2013-2113) |
| Alerts: |
|
Comments (none posted)
openstack-keystone: authentication bypass
| Package(s): | openstack-keystone |
CVE #(s): | CVE-2013-2157
|
| Created: | June 28, 2013 |
Updated: | August 12, 2013 |
| Description: |
From the openSUSE bug report:
Jose Castro Leon from CERN reported a vulnerability in the way the
Keystone LDAP backend authenticates users. When provided with an empty
password, the backend would perform an anonymous LDAP bind that would
result in successfully authenticating the user. An attacker could
therefore easily impersonate and get valid tokens for any user. Only
Keystone setups using LDAP authentication backend are affected. |
| Alerts: |
|
Comments (none posted)
php-radius: buffer overflow
| Package(s): | php-radius |
CVE #(s): | CVE-2013-2220
|
| Created: | July 3, 2013 |
Updated: | July 26, 2013 |
| Description: |
From the Mandriva advisory:
Fix a security issue in radius_get_vendor_attr() by enforcing checks
of the VSA length field against the buffer size. |
| Alerts: |
|
Comments (none posted)
python-keystoneclient: password disclosure
| Package(s): | python-keystoneclient |
CVE #(s): | CVE-2013-2013
|
| Created: | June 28, 2013 |
Updated: | September 18, 2013 |
| Description: |
From the openSUSE bug report:
OpenStack keystone places a username and password on the command line,
which allows local users to obtain credentials by listing the process. |
| Alerts: |
|
Comments (none posted)
python-keystoneclient: multiple vulnerabilities
| Package(s): | python-keystoneclient |
CVE #(s): | CVE-2013-2166
CVE-2013-2167
|
| Created: | June 28, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Red Hat advisory:
A flaw was found in the way python-keystoneclient handled encrypted data
from memcached. Even when the memcache_security_strategy setting in
"/etc/swift/proxy-server.conf" was set to ENCRYPT to help prevent
tampering, an attacker on the local network, or possibly an unprivileged
user in a virtual machine hosted on OpenStack, could use this flaw to
bypass intended restrictions and modify data in memcached that will later
be used by services utilizing python-keystoneclient (such as Nova, Cinder,
Swift, Glance, and so on). (CVE-2013-2166)
A flaw was found in the way python-keystoneclient verified data from
memcached. Even when the memcache_security_strategy setting in
"/etc/swift/proxy-server.conf" was set to MAC to perform signature
checking, an attacker on the local network, or possibly an unprivileged
user in a virtual machine hosted on OpenStack, could use this flaw to
modify data in memcached that will later pass signature checking in
python-keystoneclient. (CVE-2013-2167) |
| Alerts: |
|
Comments (none posted)
ruby: SSL server spoofing
| Package(s): | ruby |
CVE #(s): | CVE-2013-4073
|
| Created: | June 28, 2013 |
Updated: | August 6, 2013 |
| Description: |
From the Ruby advisory:
When a CA a SSL client trusts allows to issue the server certificate that has null byte in subjectAltName, remote attackers can obtain the certificate for ‘www.ruby-lang.org\0.example.com’ from the CA to spoof ‘www.ruby-lang.org’ and do man-in-the-middle between Ruby’s SSL client and SSL servers. |
| Alerts: |
|
Comments (none posted)
wireshark: two dissector vulnerabilities
| Package(s): | wireshark |
CVE #(s): | CVE-2013-4079
CVE-2013-4080
|
| Created: | June 27, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Mageia advisory:
The GSM CBCH dissector could crash (CVE-2013-4079).
The Assa Abloy R3 dissector could consume excessive memory and CPU
(CVE-2013-4080). |
| Alerts: |
|
Comments (none posted)
wordpress: multiple vulnerabilities
| Package(s): | wordpress |
CVE #(s): | CVE-2013-2173
CVE-2013-2199
CVE-2013-2200
CVE-2013-2201
CVE-2013-2202
CVE-2013-2203
CVE-2013-2204
CVE-2013-2205
|
| Created: | July 2, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Mageia advisory:
A denial of service flaw was found in the way Wordpress, a blog tool and
publishing platform, performed hash computation when checking password for
password protected blog posts. A remote attacker could provide a specially-
crafted input that, when processed by the password checking mechanism of
Wordpress would lead to excessive CPU consumption (CVE-2013-2173).
Inadequate SSRF protection for HTTP requests where the user can provide a
URL can allow for attacks against the intranet and other sites. This is a
continuation of work related to CVE-2013-0235, which was specific to SSRF
in pingback requests and was fixed in 3.5.1 (CVE-2013-2199).
Inadequate checking of a user's capabilities could allow them to publish
posts when their user role should not allow for it; and to assign posts to
other authors (CVE-2013-2200).
Inadequate escaping allowed an administrator to trigger a cross-site
scripting vulnerability through the uploading of media files and plugins
(CVE-2013-2201).
The processing of an oEmbed response is vulnerable to an XXE
(CVE-2013-2202).
If the uploads directory is not writable, error message data returned via
XHR will include a full path to the directory (CVE-2013-2203).
Content Spoofing in the MoxieCode (TinyMCE) MoxiePlayer project
(CVE-2013-2204).
Cross-domain XSS in SWFUpload (CVE-2013-2205). |
| Alerts: |
|
Comments (none posted)
xdm: denial of service
| Package(s): | xdm |
CVE #(s): | CVE-2013-2179
|
| Created: | July 2, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the openSUSE advisory:
xdm was updated on crypt() NULL pointer crashes:
* Starting with glibc 2.17 (eglibc 2.17), crypt() fails
with EINVAL (w/ NULL return) if the salt violates
specifications. Additionally, on FIPS-140 enabled Linux
systems, DES/MD5-encrypted passwords passed to crypt()
fail with EPERM (w/ NULL return). If using glibc's
crypt(), check return value to avoid a possible NULL
pointer dereference. |
| Alerts: |
|
Comments (none posted)
xen: multiple vulnerabilities
| Package(s): | xen |
CVE #(s): | CVE-2013-2211
CVE-2013-1432
|
| Created: | July 2, 2013 |
Updated: | July 19, 2013 |
| Description: |
From the Mageia advisory:
CVE-2013-2211: libxl allows guest write access to sensitive console related xenstore keys
CVE-2013-1432: Page reference counting error due to XSA-45/CVE-2013-1918 fixes |
| Alerts: |
|
Comments (none posted)
xml-security-c: code execution
| Package(s): | xml-security-c |
CVE #(s): | CVE-2013-2210
|
| Created: | June 28, 2013 |
Updated: | July 3, 2013 |
| Description: |
From the Debian advisory:
Jon Erickson of iSIGHT Partners Labs discovered a heap overflow in
xml-security-c, an implementation of the XML Digital Security
specification. The fix to address CVE-2013-2154 introduced the
possibility of a heap overflow in the processing of malformed XPointer
expressions in the XML Signature Reference processing code, possibly
leading to arbitrary code execution. |
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The 3.10 kernel was released on June 30. Linus's
announcement said: "
In the bigger
picture (ie since 3.9) this release has been pretty typical and not
particularly prone to problems, despite my waffling about the exact release
date. As usual, the bulk patch-wise is all drivers (pretty much exactly two
thirds), while the rest is evenly split between arch updates and 'misc'. No
major new subsystems this time around, although there are individual new
features." Some of those new features include a number of
Ftrace enhancements, the
memory pressure notification mechanism,
tickless operation, ARM
multi-cluster power management support (part
of the big.LITTLE solution), the
bcache
block caching layer, and much more. See the
KernelNewbies 3.10 page for
lots of details.
The 3.11 merge window is open as of this writing; see the separate article
below for a summary of what has been merged so far.
Stable updates:
3.9.8,
3.4.51, and 3.0.84 were released on June 27,
3.2.48 came out on June 30, and
3.9.9, 3.4.52,
and 3.0.85 were released on on
July 3.
Comments (none posted)
Hmm, I bet lockdep and the branch tracer probably don't play well
together. They both are bullies, and want to beat up the same
kid. The problem is, they want sole access to beat up that kid, and
don't want help.
—
Steven Rostedt
In my defence, it didn't actually say the patch did this. Just
that we "can".
—
Rusty Russell
At this point in the process, I want testers who choose to test.
Hapless victim testers come later. Well, other than randconfig
testers, but I consider them to be voluntary hapless victims.
—
Paul McKenney
Comments (none posted)
Tim Bird has announced the availability of
an
extensive guide to tuning Linux for flash-based storage devices [PDF].
"
This is the culmination of several months of effort, to determine
the results of using different tuning options in the Linux kernel, with
different filesystems running on flash-based block devices. The document
was prepared by Cogent Embedded, and funded by the CE Workgroup of the
Linux Foundation. In addition to describing different tuning options
available, the document also gives methodologies for measuring performance
on the filesystems and has extensive graphs showing the results of the
different tuning options."
Full Story (comments: 12)
The planning process for the 2013 Kernel Summit (October 23-25, Edinburgh)
has begun; as in previous years, the program committee is looking for
proposals for interesting topics in need of discussion. "
The best topics for the kernel summit tend to focus on topics which
are not appropriate for any of the subsystem-specific workshops or
minisummits, and which can not be easily resolved using the normal
e-mail and IRC channels. These include issues about our overall
development process, and topics which span multiple subsystems."
The deadline for proposals is July 19.
Full Story (comments: none)
By Jonathan Corbet
July 3, 2013
Reference counting is used by the kernel to know when a data structure is
unused and can be disposed of. Most of the time, reference counts are
represented by an
atomic_t variable, perhaps wrapped by a
structure like a
kref. If references are added and removed
frequently over an object's lifetime, though, that
atomic_t
variable can become a performance bottleneck. The 3.11 kernel will include
a new per-CPU reference count mechanism designed to improve scalability in
such situations.
This mechanism, created by Kent Overstreet, is defined in
<linux/percpu-refcount.h>.
Typical usage will involve embedding a percpu_ref structure within
the data structure being tracked. The counter must be initialized with:
int percpu_ref_init(struct percpu_ref *ref, percpu_ref_release *release);
Where release() is the function to be called when the reference
count drops to zero:
typedef void (percpu_ref_release)(struct percpu_ref *);
The call to percpu_ref_init() will initialize the reference count
to one. References are added and removed with:
void percpu_ref_get(struct percpu_ref *ref);
void percpu_ref_put(struct percpu_ref *ref);
These functions operate on a per-CPU array of reference counters, so they
will not cause cache-line bouncing across the system. There is one
potential problem, though: percpu_ref_put() must determine whether
the reference count has dropped to zero and call the release()
function if so. Summing an array of per-CPU counters would be expensive,
to the point that it would defeat the whole purpose. This problem is
avoided with a simple observation: as long as the initial reference is
held, the count cannot be zero, so percpu_ref_put() does not
bother to check.
The implication is that the thread which calls percpu_ref_init()
must indicate when it is dropping its reference; that is done with a call
to:
void percpu_ref_kill(struct percpu_ref *ref);
After this call, the reference count degrades to the usual model with a
single shared atomic_t counter; that counter will be decremented
and checked whenever a reference is released.
The performance benefits of a per-CPU reference count will clearly only be
realized if most of the references to an object are added or removed while
the initial reference is held. In practice that is often the case. This
mechanism has found an initial use in the control group code; the comments
in the header file claim that it is used by the asynchronous I/O code as
well, but that is not the case in the current mainline.
Comments (1 posted)
Kernel development news
By Jonathan Corbet
July 3, 2013
Once upon a time, Linus tried to limit merge window activity to roughly
1,000 commits in any given day. On July 2, the day he began pulling
changes for 3.11, over 3,000 commits made their way into the mainline.
Clearly, a lethargic 1,000 commits/day pace won't cut it in the 3.x era.
Expect this to be another busy development cycle.
That said, the number of new features merged for 3.11 so far is relatively
small. Much of the work pulled to date consists of code cleanups (in the
staging tree, for example), reworking of ARM architecture code to use
common abstractions, and the removal of board-file support for some ARM
subarchitectures.
The user-visible changes that have been pulled so far include:
- The f2fs filesystem now supports security labels, enabling it to be
used with security modules.
- The Lustre
distributed filesystem has been merged into the staging tree. It is
disabled in the build system, though, since it has build problems on a
number of architectures.
- The ARM architecture (both 32- and 64-bit) has gained better huge page
support, in the form
of both the hugetlbfs filesystem and transparent huge pages.
- The ARM64 architecture now supports virtualization with both KVM and
Xen.
- The new O_TMPFILE option to the open() and
openat() system calls allows filesystems to optimize the
creation of temporary files — files which need not be visible in the
filesystem. When O_TMPFILE is present, the provided pathname
is only used to locate the containing directory (and thus the
filesystem where the temporary file should be). So, among other
things, programs using O_TMPFILE should have fewer concerns
about vulnerabilities resulting from symbolic link attacks.
- New hardware support includes:
- Systems and processors:
Freescale i.MX6 SoloLite processors,
Freescale Vybrid VF610 processors,
Samsung EXYNOS5420 processors,
Rockchip RK2928 and RK3xxx processors,
TI Nspire processors, and
STMicroelectronics STiH41x and STiH416 processors.
- Miscellaneous:
Marvell EBU device bus controllers,
Marvell EBU PCIe controllers,
ARM cache-coherent interconnect controllers,
Microchip Technology MCP3204/08 analog-to-digital converters,
Analog Devices AD7303 digital-to-analog converters,
STMicroelectronics LPS331AP pressure sensors, and
Samsung S3C24XX SoC pin controllers.
- Networking:
MTK USB Bluetooth interfaces.
- USB:
Faraday FUSBH200 host controllers and
Cavium Networks Octeon host controllers.
Changes visible to kernel developers include:
- There is a new struct file_operations method:
int (*iterate) (struct file *, struct dir_context *);
Its job is to iterate through the contents of a directory. This
method is meant to serve as a replacement for the readdir() method that
eliminates persistent race conditions associated with updating the
current read position. All internal users have been converted, and
the readdir() method has been removed.
- There are a couple of new functions for working with atomic types:
int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode);
void wake_up_atomic_t(atomic_t *p);
A call to wait_on_atomic_t() will block the calling thread
until the given val goes to zero. Simply decrementing an
atomic_t variable will not be sufficient to wake anybody
waiting, though; an explicit call to wake_up_atomic_t() is
required to do that.
- The CONFIG_HOTPLUG configuration option has been removed; all
kernels are hotplug enabled these days.
- The wait/wound mutex locking primitive
has been merged.
- As part of the read-copy-update
simplification effort, the "tiny-preempt" version of RCU has been
removed from the kernel. From the
commit message: "People currently using TINY_PREEMPT_RCU can
get much better memory footprint with TINY_RCU, or, if they really
need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
minor degradation in memory footprint."
- The kernel now has the concept of power-efficient workqueues; these
are simply marked as "unbound," so that jobs queued to them can run on
any CPU in the system. Per-CPU workqueues may perform better in some
situations, but they can also cause sleeping CPUs to wake up; that
wakeup can be avoided if work items can be run on CPUs that are not
sleeping. If the CONFIG_WQ_POWER_EFFICIENT_DEFAULT
configuration option is set, a number of workqueues observed to impact
power performance will be switched to the unbound mode.
Kernel code can explicitly request power-efficient behavior by
creating workqueues with the WQ_POWER_EFFICIENT flag or by
using a couple of new systemwide workqueues:
system_power_efficient_wq or
system_freezable_power_efficient_wq.
- The d_hash() and d_compare() methods in struct
dentry_operations have lost their inode argument.
- A new per-CPU reference count mechanism has been added; see this article for details.
A normal two-week merge window could be expected to close on July 16,
but Linus has occasionally shortened the merge window in recent development
cycles. If the development cycle as a whole lasts for the usual
70 days, then the 3.11 kernel can be expected around
September 10.
Comments (3 posted)
July 3, 2013
This article was contributed by Christoffer Dall and Jason Nieh
One of the new features in the 3.9 kernel is KVM/ARM: KVM support for the ARM
architecture. While KVM is already supported on i386 and x86/64, PowerPC, and
s390, ARM support required more than just reimplementing the features and
styles of the other architectures. The reason is that the ARM virtualization
extensions are quite different from those of other architectures.
Historically, the ARM architecture is not virtualizable, because there are a
number of sensitive instructions which do not trap when they are executed in
an unprivileged mode. However, the most recent 32-bit ARM processors, like
the Cortex-A15, include hardware support for virtualization as
an ARMv7 architectural extension. A number of research
projects have attempted to support virtualization on ARM processors without
hardware virtualization support, but they require various levels of paravirtualization
and have not been stabilized. KVM/ARM is designed specifically to
work on ARM processors with the virtualization extensions enabled to run
unmodified guest operating systems.
The ARM hardware extensions differ quite a bit from their x86 counterparts. A
simplified view of the ARM CPU modes is that the kernel runs in SVC mode and
user space runs in USR mode. ARM introduced a new CPU mode for running
hypervisors called HYP mode, which is a more privileged mode than SVC mode.
An important characteristic of HYP mode, which is central to the design of
KVM/ARM, is that HYP mode is not an extension of SVC mode, but a distinct mode
with a separate feature set and a separate virtual memory translation mechanism.
For example, if a page fault is taken in HYP mode, the faulting virtual address
is stored in a different register in HYP mode than in SVC mode. As another
example, for the SVC and USR modes, the hardware has two separate page table base
registers, which are used to provide the familiar address space split between
user space and kernel. HYP mode only uses a single page table base register and
therefore does not allow the address space split between user mode and kernel.
The design of HYP mode is a good fit with a classic bare-metal hypervisor
design because such a hypervisor does not reuse
any existing kernel code written to work in SVC mode.
KVM, however, was designed specifically to reuse existing kernel components and
integrate these with the hypervisor. In comparison, the x86 hardware support
for virtualization does not provide a new CPU mode, but provides an orthogonal
concept known as "root" and "non-root". When running as non-root on x86,
the feature set is completely equivalent to a CPU without virtualization
support. When running as root on x86, the feature set is extended to add
additional features for controlling virtual machines (VMs), but all
existing kernel code can run
unmodified as both root and non-root. On x86, when a VM traps to the
hypervisor, the CPU changes from non-root to root. On ARM, when a VM traps
to the hypervisor, the CPU traps to HYP mode.
HYP mode controls virtualization features by configuring sensitive
operations to trap to HYP mode when executed in SVC and USR mode; it also allows
hypervisors to configure a number of shadow register values used to hide
information about the physical hardware from VMs. HYP mode also controls
Stage-2 translation, a feature similar to Intel's "extended page table"
used to control VM memory
access. Normally when an ARM processor issues a
load/store instruction, the memory address used in the instruction is translated
by the memory management unit (MMU) from a virtual address to a physical address using regular page
tables, like this:
- Virtual Address (VA) -> Intermediate Physical Address (IPA)
The virtualization extensions add an extra stage of translation known
as Stage-2 translation which can be enabled and disabled only from HYP mode.
When Stage-2 translation is enabled, the MMU translates address in the following
way:
- Stage-1: Virtual Address (VA) -> Intermediate Physical Address (IPA)
- Stage-2: Intermediate Physical Address (IPA) -> Physical Address (PA)
The guest operating system controls the Stage-1 translation independently of the
hypervisor and can change mappings and page tables without trapping to the
hypervisor. The Stage-2 translation is controlled by the hypervisor, and a
separate Stage-2 page table base register is accessible only from HYP mode. The use
of Stage-2 translations allows software running in HYP mode to control access
to physical memory in a manner completely transparent to a VM running in SVC or
USR mode, because the VM can only access pages that the hypervisor has mapped
from an IPA to the page's PA in the Stage-2 page tables.
KVM/ARM design
KVM/ARM is tightly integrated with the kernel and effectively turns the
kernel into a first class ARM hypervisor. For KVM/ARM to use the hardware
features, the kernel must somehow be able to run code in HYP mode because
HYP mode is used to configure the hardware for running a VM, and traps from the
VM to the host (KVM/ARM) are taken to HYP mode.
Rewriting the entire kernel to run only in HYP mode is not an option, because
it would break compatibility with hardware that doesn't have the virtualization
extensions. A HYP-mode-only kernel also would not work when run inside a
VM, because the HYP
mode would not be available. Support for running both in HYP mode and
SVC mode would be much too invasive to the source code, and would potentially
slow down critical paths. Additionally, the hardware requirements for the page
table layout in HYP mode are different from those in SVC mode in that they
mandate the use of LPAE (ARM's Large Physical Address Extension) and require
specific bits to be set on the page table entries, which are otherwise clear on
the kernel page tables used in SVC mode. So KVM/ARM must manage a separate
set of HYP mode page tables and explicitly map in code and data accessed from
HYP mode.
We therefore came up with the idea to split execution across multiple CPU modes
and run as little code as possible in HYP mode. The code run in HYP mode is
limited to a few hundred instructions and isolated to two assembly files:
arch/arm/kvm/interrupts.S
and arch/arm/kvm/interrupts_head.S.
For readers not familiar with the general KVM architecture, KVM on all
architectures works by exposing a simple interface to
user space to provide virtualization of core components such as the CPU and
memory. Device emulation, along with setup and configuration of VMs, is handled by a
user space process, typically QEMU. When such a process decides it
is time to run the VM, it will call the KVM_VCPU_RUN ioctl(),
which executes VM code natively on the CPU. On ARM, the ioctl() handler in
arch/arm/kvm/arm.c switches to HYP mode by issuing an HVC (hypercall)
instruction, which changes the CPU mode to HYP mode, context switches all hardware
state between the host and the guest, and finally jumps to the VM SVC or
USR mode to natively execute guest code.
When KVM/ARM runs guest code, it enables Stage-2 memory translation, which completely
isolates the address space of VMs from the host and other VMs. The CPU will be
executing guest code until the hardware traps to HYP mode, because of a hardware
interrupt, a stage-2 page fault, or a sensitive operation. When such a trap
occurs, KVM/ARM switches back to the host hardware state and returns to normal
KVM/ARM host SVC code with the full kernel mappings available.
When returning from a VM, KVM/ARM examines the reason for the trap, and performs
the necessary emulation or
resource allocation to allow the VM to resume. For example, if the guest
performs a memory-mapped I/O (MMIO) operation to an emulated device, that will generate a Stage-2
page fault, because only physical RAM dedicated to the guest will be mapped in
the Stage-2 page tables. KVM/ARM will
read special system registers, available only in HYP mode, which contain the
address causing the fault and report the address to QEMU through a shared
memory-mapped structure between QEMU and the kernel. QEMU knows the memory map
of the emulated system and can forward the operation to the appropriate device
emulation code. As another example, if a hardware interrupt occurs while the VM
is executing, this will trap to HYP mode, and KVM/ARM will switch back in the host
state and re-enable interrupts, which will cause the hardware interrupt handlers to
execute once again, but this time without trapping to HYP mode. While every hardware
interrupt ends up interrupting the CPU twice, the actual trap cost on ARM
hardware is negligible compared to the world-switch from the VM to the host.
HYP mode
Providing access to HYP mode from KVM/ARM was a non-trivial challenge, since HYP
mode is a more privileged mode than the standard ARM kernel modes and there is
no architecturally defined ABI for entering HYP mode from less privileged modes.
One option would be to expect bootloaders to either install secure monitor
handlers or hypercall handlers that would allow the kernel to trap back
into HYP mode, but this method is brittle and error-prone, and prior experience
with establishing TrustZone APIs has shown that it is hard to create a standard
across different implementations of the ARM architecture.
Instead,
Will Deacon, Catalin Marinas, and Ian Jackson proposed
that we rely on
the kernel being booted in HYP mode if the kernel is going to support KVM/ARM. In
version 3.6, a patch series developed by Dave Martin and Marc Zyngier was
merged that detects if the kernel is booted in HYP mode and, if so, installs a
small stub handler that allows other subsystems like KVM/ARM to take
control of HYP mode later on. As it turns out, it is reasonable to recommend that
bootloaders always boot the kernel in HYP mode if it is available because even
legacy kernels always make an explicit switch to SVC mode at boot time, even
though they expect to boot into SVC mode already. Changing bootloaders to
simply boot all kernels in HYP mode is therefore backward-compatible with
legacy kernels.
Installing the hypervisor stub when the kernel is booted in HYP mode was
an interesting implementation challenge. First, ARM kernels are often loaded as
a compressed image, with a small uncompressed pre-boot environment known as the
"decompressor" which decompresses the kernel image into memory. If the
decompressor detects that it is
booted in HYP mode, then a temporary stub must be installed at this stage
allowing the CPU to fall back to SVC mode to run the decompressor code. The
reason is that the decompressor must turn on the MMU to enable caches, but
doing so in HYP mode requires support for the LPAE page table format used by HYP
mode, which is an unwanted piece of complexity in the decompressor code.
Therefore, the decompressor installs the temporary HYP stub, falls back to SVC
mode, decompresses the kernel image, and finally, immediately before calling the
uncompressed initialization code, switches back to HYP mode again. Then, the uncompressed
initialization code will again detect that the CPU is in HYP mode and will
install the main HYP stub to be used by kernel modules later in the boot process
or after the kernel has finally booted. The HYP stub can be found in
arch/arm/kernel/hyp-stub.S.
Note that the uncompressed initialization code
doesn't care whether the uncompressed code is started directly in HYP mode from a
bootloader or from the decompressor.
Because HYP mode is a more privileged mode than SVC mode, the transition from
SVC mode to HYP mode occurs only through a hardware trap. Such a trap can be
generated by executing the hypercall (HVC) instruction, which will trap into HYP
mode and cause the CPU to execute code from a jump entry in the HYP exception
vectors. This allows a subsystem to use the hypervisor stub to fully take
over control of HYP mode, because the hypervisor stub allows subsystems to
change the location of the exception vectors. The HYP stub is called through
the __hyp_set_vectors() function, which takes the physical address of
the HYP exception vector as its only parameter, and replaces the HYP Vector Base Address Register
(HVBAR) with that address. When KVM/ARM is initialized during normal kernel boot
(after all main kernel initialization functions have run), it creates an identity mapping
(one-to-one mapping of virtual addresses to physical address) of the HYP mode
initialization code, which includes an exception vector, and sets the physical address of
using the __hyp_set_vectors() function. Further, the KVM/ARM initialization code
calls the HVC instruction to run the identity-mapped initialization code, which can safely
enable the MMU, because the code is identity mapped.
Finally, KVM/ARM initialization sets up the HVBAR to point to the main KVM/ARM HYP exception
handling code, now using the virtual addresses for HYP mode. Since HYP mode
has its own address space, KVM/ARM must choose an appropriate virtual address
for any code or data, which is mapped into HYP mode. For convenience
and clarity, the kernel virtual addresses are reused for pages mapped into HYP
mode, making it possible to dereference structure members directly as long as
all relevant data structures are mapped into HYP mode.
Both traps from sensitive operations in VMs and hypercalls from the host kernel
enter HYP mode through an exception on the CPU.
Instead of changing the HYP exception vector on every switch between
the host and the guest, a single HYP exception
vector is used to handle both HVC calls from the host kernel and to handle traps
from the VM. The HYP vector handling code checks the VMID field on the Stage-2
page table base register, and VMID 0 is reserved for the host. This field is
only accessible from HYP mode and guests are therefore prevented from
escalating privilege. We introduced the kvm_call_hyp()
function, which can be
used to execute code in HYP mode from KVM/ARM.
For example, KVM/ARM code running in SVC mode can make the
following call to invalidate TLB entries, which must be done from HYP mode:
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
Virtual GIC and timers
ARMv7 architectures with hardware virtualization support also include
virtualization support for timers and the interrupt controller. Marc Zyngier
implemented support for these features, which are called "generic timers"
(a.k.a. architected timers) and the Virtual Generic Interrupt Controller (VGIC).
Traditionally, timer operations on ARM systems have been MMIO
operations to dedicated timer devices. Such MMIO
operations performed by VMs would trap to QEMU, which would involve a
world-switch from the VM to host kernel, and a switch from the host kernel to user space for
every read of the time counter or every time a timer needed to be programmed.
Of course, the timer functionality could be emulated inside the kernel, but this
would require a trap from the VM to the host kernel, and would therefore add
substantial overhead to VMs compared to running on native hardware.
Reading the
counter is a very frequent operation in Linux. For example, every time a task
is enqueued or dequeued in the scheduler, the runqueue clock is updated,
and in particular multi-process workloads like Apache benchmarks clearly show
the overhead of trapping on each counter read.
ARMv7 allows for an optional extension to the architecture, the generic timers,
which makes counter and timer operations part of the core architecture. Now,
reading a counter or programming a timer is done using coprocessor register
accesses on the core itself, and the generic timers provide two sets of timers
and counters: the physical and the virtual. The virtual counter and timer are
always available, but access to the physical counter and timer can be limited
through control registers accessible only in HYP mode. If the kernel is booted
in HYP mode, it is configured to use the physical timers; otherwise
the kernel uses the virtual timers, allowing both an unmodified kernel to
program timers when running inside a VM without trapping to the host, and providing
the necessary isolation of the host from VMs.
If a VM programs a virtual
timer, but is preempted before the virtual timer fires, KVM/ARM reads the timer
settings to figure out the remaining time on the timer, and programs a
corresponding soft timer in the kernel. When the soft timer expires, the timer
handler routine injects the timer interrupt back into the VM. If the VM is
scheduled before the soft timer expires, the virtual timer hardware is
re-programmed to fire when the VM is running.
The role of an interrupt controller is to receive interrupts from devices and
forward them to one or more CPUs. ARM's Generic Interrupt Controller
(GIC) provides a "distributor" which is the core logic of the GIC and several CPU
interfaces. The GIC allows CPUs to mask
certain interrupts, assign priority, or set affinity for certain interrupts to
certain CPUs. Finally, a CPU also uses the GIC to send inter-processor
interrupts (IPIs) from one CPU core to another and is the underlying mechanism
for SMP cross calls on ARM.
Typically, when the GIC raises an interrupt to a
CPU, the CPU will acknowledge the interrupt to the GIC, interact with the interrupting
device, signal end-of-interrupt (EOI) to the GIC, and resume normal operation.
Both acknowledging and EOI-signaling interrupts are privileged operations that will trap
when executed from within a VM, adding performance overhead to common
operations. The hardware support for virtualization in the VGIC comes in the
form of a virtual CPU interface that CPUs can query to acknowledge and EOI virtual
interrupts without trapping to the host. The hardware support further provides a
virtual control interface to the VGIC, which is accessed only by KVM/ARM, and is
used to program virtual interrupts generated from virtual devices (typically
emulated by QEMU) to the VGIC.
Since access to the distributor is typically
not a common operation, the hardware does not provide a
virtual distributor, so KVM/ARM provides in-kernel GIC distributor emulation
code as part of the support for VGIC. The result is that VMs can acknowledge and EOI
virtual interrupts directly without trapping to the host. Actual hardware interrupts
received during VM execution
always trap to HYP mode, and KVM/ARM lets the kernel's standard ISRs
handle the interrupt as usual, so the host remains in complete control
of the physical hardware.
There is no mechanism in the VGIC or generic timers to
let the hardware directly inject physical interrupts from the virtual timers as
virtual interrupts to the VMs. Therefore, VM timer interrupts will trap as any
other hardware interrupt, and KVM/ARM registers a handler for the virtual timer
interrupt and injects a corresponding virtual timer interrupt using software
when the handler function is called from the ISR.
Results
During the development of KVM/ARM, we continuously measured the virtualization overhead
and ran long-running workloads to test stability and measure performance. We
have used various kernel configurations and user space environments (both ARM
and Thumb-2) for both the host and the guest, and validated our workloads with
SMP and UP guests. Some workloads have run for several weeks at a time without
crashing, and the system behaves as expected when exposed to extreme memory
pressure or CPU over-subscription. We therefore feel that the
implementation is
stable and encourage users to try and use the system.
Our measurements using both micro and macro benchmarks show that the overhead
of KVM/ARM is within 10% of native performance on multicore platforms for
balanced workloads. Purely CPU-bound workloads perform almost at native speed.
The relative overhead of KVM/ARM is comparable to KVM on x86. For some macro
workloads, like Apache and MySQL, KVM/ARM even has less overhead than on x86
using the same configuration. A significant source of this improved
performance can be attributed to the optimized path for IPIs and thereby
process rescheduling caused by the VGIC and generic timers hardware
support.
Status and future work
KVM/ARM started as a research project at Columbia University and was later
supported by Virtual Open
Systems. After the 3.9 merge, KVM/ARM continues to be
maintained by the original author of the code, Christoffer Dall, and the ARMv8
(64-bit) port is maintained by Marc Zyngier. QEMU system support for ARMv7 has
been merged upstream in QEMU, and kvmtool also has support for KVM/ARM on both
ARMv7 and ARMv8. ARMv8 support is scheduled to be merged for the 3.11 kernel
release.
Linaro is supporting a number of efforts to make KVM/ARM itself feature
complete, which involves debugging and full migration features including
migration of the in-kernel support for the VGIC and the generic timers.
Additionally, virtio has so far relied on a PCI backend in QEMU and the kernel,
but a significant amount of work has already been merged upstream to refactor
the QEMU source code concerning virtio to allow better support for MMIO-based
virtio devices to accelerate virtual network and block devices. The remaining
work is currently a priority for Linaro, as is support for the mach-virt
ARM machine definition, which is a simple machine model designed to be used for
virtual machines and is based only on virtio devices. Finally, Linaro is also
working on ARMv8 support in QEMU, which will also take advantage of mach-virt and virtio
support.
Conclusion
KVM/ARM is already used heavily in production by the SUSE Open Build Service on
Arndale boards, and we can only speculate about its future uses in the green
data center of the future, as the hypervisor of choice for ARM-based networking
equipment, or even ARM-based laptops and desktops.
For more information, help on how to run KVM/ARM on your specific board or SoC,
or to participate in KVM/ARM development, the kvmarm
mailing list is a good place to start.
Comments (10 posted)
By Jonathan Corbet
July 2, 2013
Maintaining user-space ABI compatibility is one of the key guiding
principles of Linux kernel development; changes that break user space are
likely to be reverted quickly, often after an incendiary message from
Linus. But what is to be done in cases where an ABI is deemed to be
unworkable and unmaintainable? Control group maintainer Tejun Heo is
trying to solve that problem, but, in the process, he is running into
opposition from one of Linux's highest-profile users.
Control groups ("cgroups") allow an administrator to divide the processes
in a system into a hierarchy of groups; this hierarchy need not match the
process tree. The grouping function alone is
useful; systemd uses it to keep track of all of the processes involved with
a given service, for example. But the real purpose of control groups is to allow
resource control policies to be applied to the processes within each group;
to that end, the kernel contains a range of "controllers" that enforce
policies on CPU time, block I/O bandwidth, memory usage, and more. Control
groups are managed with a virtual filesystem exported by the kernel; see Documentation/cgroups/cgroups.txt for a
thorough (if slightly dated) description of how this subsystem works.
The trouble with control groups
There is no doubt that the functionality provided by control groups is both
extensive and flexible. Indeed, part of the problem is that it is
too flexible. Consider, for example, the support for multiple
hierarchies in the control group subsystem. Cgroups allow the creation of
a hierarchy of processes to be used in dividing up a limited resource, such
as available CPU time. But they allow the creation of an entirely
different hierarchy for the control of a different resource. Thus, for
example, CPU time could be placed under a policy that favors certain users
over others, while memory use could, instead, be regulated depending on
what program a process is running. Processes can be grouped in entirely
different ways in each hierarchy.
The problem here is that, while the design allowing each controller to have
its own hierarchy seems nice and orthogonal, the implementation cannot be
that way. The controllers for memory usage, I/O bandwidth, and writeback
throttling all look independent on the surface, but those problems are all
intertwined in the memory management system in the kernel. All three of
those controllers will need to associate pages of memory with specific
control groups; if a given process is in one cgroup from the memory
controller's point of view, but a different cgroup for the I/O bandwidth
controller, that tracking quickly becomes difficult or impossible. It is
easy to set up policies that conflict or that simply cannot be properly
implemented within the kernel.
Another perceived problem is that the virtual filesystem interface is too
low-level, exposing too many details of how control groups are implemented
in the kernel. As the number of users of control groups grows, it will
become increasingly hard to make changes without breaking existing
applications. It's not clear what the correct cgroup interface should be,
but those who spend enough time looking at the current implementation tend
to come away convinced that changes are needed.
This problem is aggravated by an increasing tendency to use file
permissions to hand subtrees of a cgroup hierarchy over to unprivileged
processes. There are legitimate reasons to want to delegate authority in
that way; complex applications may want to use cgroups to implement their own
internal policies, for example. There are also use cases associated with
virtualization
and containers. But that delegation greatly increases the number of
programs with an intimate understanding of how cgroups work, complicating
any future changes. There are
also any number of security issues that come with unprivileged access to a
cgroup hierarchy; it is trivially easy to run denial-of-service attacks against a
system if one has write access to a cgroup hierarchy. In short, the interface
was just never meant to be used in this way.
For these reasons and more, there is a strong desire to rework the cgroup
interface into something that is more maintainable, more secure, and easier to
use. Getting there, though, is likely to be a long and painful process, as
can be seen by the early discussions around the subject.
The solution and its discontents
The plan for control groups can be described in relatively few words; the
resulting discussion, instead, is rather more verbose. Multiple
hierarchies are seen to be misconceived and unmaintainable on their face;
the plan is to phase out that functionality so that, in the end, all
controllers are attached to a single, unified hierarchy of processes.
Unprivileged access to the cgroup hierarchy will be strongly discouraged;
the hope is to have a single, privileged process handling all of the cgroup
management tasks. That process will, in turn, provide some sort of
higher-level interface to the rest of the system.
Tim Hockin is charged with making Google's massive cluster of machines work
properly for a wide variety of internal users. Google uses cgroups
extensively for internal resource management; more to the point, the
company also makes extensive use of multiple hierarchies. So, needless to
say, Tim is not at all pleased with the prospect of that functionality
going away. As he put it:
So yeah, I'm in a bit of a panic. You're making a huge amount of
work for us. You're breaking binary compatibility of the
(probably) largest single installation of Linux in the world. And
you're being kind of flip about the reality of it...
[PULL QUOTE:
The kernel's ABI rules have not been suspended for
control groups
END QUOTE]
Part of the reason for Tim's panic is that he was under the impression that
the existing functionality would be removed within a year or two. That is
decidedly not the case; the kernel's ABI rules have not been suspended for
control groups. The plan is to add a new control interface, and any new
features will probably only work with that new interface, but the existing
interface, including multiple hierarchies, will continue to be supported
until it's clear that it is no longer being used.
Tim described, in general terms, how Google
uses multiple hierarchies. Essentially, every job in the system has two
attributes: whether it's a production or "batch" job, and whether it gets
I/O bandwidth guarantees. The result is a 2x2 matrix describing resource
allocation policies (though one of the entries — batch processes with I/O
guarantees, makes little sense and is not used). Using two independent
cgroup hierarchies
makes this set of policies relatively easy to express; Tim asserts that a
unified hierarchy would not be usable in the same way.
Tejun was unimpressed, responding that this
case could be managed by setting up three cgroups at the same level of the
hierarchy, each of which would implement one of the three useful policy
combinations. The problem with this solution, according to Tim, is that
the processes without I/O bandwidth guarantees would be split into two
groups, whereas in the current solution they are in one group. If one of
those two groups has far more members than the other, the members of that
larger group will get far less of the available bandwidth than the members
of the small group. Tejun still thinks that the problem should be
solvable, perhaps with the use of a user-space management daemon that would
adjust the relative bandwidth allocations depending on the workload. Tim
has answered that the situation is actually
a lot more complicated, but he has not yet shared the details of how, so it
is hard to understand what the real difficulties with a single hierarchy
are.
A single management process?
Tim also dislikes the plan to have a single process managing the control
group hierarchy. That process could be made to provide the functionality
that Google (along with others) needs, though there are performance
concerns associated with adding a process in the middle. But Tim was not
alone in being concerned by this message from
Lennart Poettering on the nature of that single process:
This hierarchy becomes private property of systemd. systemd will
set it up. Systemd will maintain it. Systemd will rearrange
it. Other software that wants to make use of cgroups can do so only
through systemd's APIs.
Google does not currently run systemd and is not thrilled by the prospect
of having to switch to be able to make use of cgroup functionality. So Tim
responded that "If systemd is the
only upstream implementation of this single-agent idea, we will have to
invent our own, and continue to diverge rather than converge." There
is no particular judgment against systemd implied by that position; it is
simply that making that switch would affect a whole lot of things beyond
cgroups, and that is more than Google feels like it would want to take on
at the moment. But, in general, it would not be surprising if, in the long
term, some users remain opposed to the idea of systemd as the only
interface to cgroups. That suggests that we will be seeing competing
implementations of the cgroup management daemon concept.
One of those alternatives may be about to come into view; Serge Hallyn confessed that he is working on a cgroup
management daemon of his own. In some situations, a separate daemon might
meet a lot of needs, but Lennart was clear
that he would never have systemd defer to such a daemon. His position —
not an entirely unreasonable one — is that the init process, as the creator
of all other processes in the system, should not be dependent on any other
process for its normal operation. He also seems to feel that it would
not be possible to put the cgroup management code into a library that could
be used in multiple places. So we are likely to see multiple
implementations of this functionality in use before this story is done.
That, in turn, could create headaches for developers of applications that
need to interface with the cgroup subsystem.
The discussion, thus far, seems to have changed few minds. But Tejun has
made it clear that he doesn't intend to
just ignore complaints from users:
While the bar to overcome is pretty high, I do want to learn about
the problems you guys are foreseeing, so that I can at least
evaluate the graveness properly and hopefully compromises which can
mitigate the most sore ones can be made wherever necessary.
He also acknowledged the biggest problem
faced by the development community: despite having accumulated some
experience on wrong ways to solve the
problem, nobody really knows what the right solution is. More mistakes are
almost certain, so it's too soon to try to settle on final solutions.
In the early years of Linux, most of the ABIs implemented by the kernel
were specified by groups like POSIX or by prior implementation in other
kernels. That made the ABI design problem mostly go away; it was just a
matter of doing what had already been done before. For current problems,
though, there are rather fewer places to look for guidance, so we are
having to figure out the best designs as we go. Mistakes are certain to
happen in such a setting. So we are going to have to get better at
learning from those mistakes, coming up with better designs, and moving to
them without causing misery for our users. The control group transition is
likely to set a lot of precedents regarding how these changes should (or
should not) be handled in the future.
Comments (35 posted)
Patches and updates
Kernel trees
- Sebastian Andrzej Siewior: 3.8.13-rt13 .
(June 30, 2013)
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Architecture-specific
Virtualization and containers
Miscellaneous
- Lucas De Marchi: kmod 14 .
(July 3, 2013)
Page editor: Jonathan Corbet
Distributions
By Jonathan Corbet
July 3, 2013
The
Fedora 19 release brings a lot of
goodies for Fedora users, but there is one class of users that may be a bit
less happy: those who want to run Fedora on an Apple Mac system in a
dual-boot configuration with OS X. A late bug in the Anaconda
installer makes the creation of such systems nearly impossible. One might
wonder why Fedora 19 shipped with this kind of problem; a look at the
reasons gives a few insights into how the Fedora release process works.
The decision to proceed with the Fedora 19 release was announced on June 27. Unfortunately, bug #979205
had been filed shortly before. The installer fails to create the needed
partitions for a dual-boot system on an OS X machine, causing the
installation to fail. As Matthew Garrett put it when calling attention to the problem: "This
is rather frustrating, since Fedora's the only distribution with any
significant support for running on Apple hardware." A glance at any
Linux-related conference will show that Apple systems are popular among
developers; it seems a bit strange that a distribution that has put
significant effort into working on that hardware would ship with a known
problem of this nature. The explanation for what happened involves a
number of separate issues.
The first is that the bug was introduced very late in the development
cycle; according to Adam Williamson, it went
into Anaconda 19.30.10, which first saw wide testing in the RC1 release on
June 25. Naturally, the patch that caused the problem was a response
to another
bug; even so, the patch was the subject of some discussion before being merged
into the otherwise-frozen Anaconda source. In the end, the patch was
deemed to be sufficiently low-risk to be accepted — a judgment which, like many,
is easy to criticize after the fact. At the time, though, it looked like a
way to fix a known problem in the release.
The new code took several days to find its way into a build that would see
wider testing; it was committed on a Thursday, and the build did not happen
until after the following weekend. That left a period of about two days
between the bug's general availability and the Fedora 19 go/no-go
decision — not very long for an installation-time issue to surface. Some
participants have suggested that, in the future, the time between an RC
release and the go/no-go decision should be lengthened to increase the
chances of catching a last-minute problem. But that probably would not
have helped in this case.
The fact that the Fedora quality-assurance team only appears to have a
single Mac system, and that they don't test it for dual-boot installations,
also did not help. There was a clear hole in the QA net that this problem
slipped through. One might argue that this does not necessarily indicate a
problem: as Chris Murphy pointed out, Macs
are not officially supported by the distribution. So it is not surprising
that the testing resources available are unable to catch every problem. It
also means that, even if the problem had been found before the go/no-go
decision, it would not have been entitled to "blocker" status and, thus,
might not have affected that decision.
While not saying that the release should have been delayed to fix this
problem, Matthew did question one interesting bit of Fedora policy: once
the go/no-go decision has been made in the "go" direction, the process
becomes unstoppable. That means that, even if this bug were deemed to have
a "blocker" level of severity, it still would not have blocked the
release. Kevin Fenzi defended this policy,
describing the long series of events that starts to unfold once the
decision to make the release has been made. The explanation was not
satisfying to everybody, but the policy exists and doesn't appear to be
subject to change.
So Fedora 19 simply will not install properly in a dual-boot OS X
configuration without a lot of extra work. And things are likely to stay
that way; an installer problem cannot be fixed through the normal Fedora
update process. There was some talk of a 19.1 release, but, as Kevin put it, "We are currently pretty unsetup
for any kind of point releases." So this problem is likely to
remain in the official Fedora distribution until Fedora 20. Not an
ideal outcome by any means, but one that may have been hard to avoid.
Comments (19 posted)
Brief items
Maintainers shouldn't have to do the work to support any configuration
they're not comfortable testing/etc, but if somebody else comes along
to do it for them, the solution is cooperation, not revert wars.
--
Rich Freeman
Comments (none posted)
Version 2.0 of the DoudouLinux educational distribution is out. "
But
DoudouLinux is not just a CD/DVD of educative stuffs for children.
DoudouLinux is now a vast project on its own. We have published with
version 2.0 a manifesto
that defines the philosophy and the ethics of
our project: we want our children be able to fully master the digital world
they are going to live in, instead of undergoing it. As a result we now
feel very concerned about user privacy, especially when it comes to
children." LWN
looked at
DoudouLinux in 2011.
Full Story (comments: none)
The Fedora 19 release is now available. As usual, this release offers a
lot of new features; see the announcement or
the
release notes for details.
Update: the Fedora 19 for ARM
release is also available. The Fedora ARM team is clearly getting up
to speed and is now able to offer releases on the same day as the primary
architectures.
Full Story (comments: 72)
The Fedora Secondary Arch Team for Power has announced the release of
Fedora 19 for Power architecture.
Full Story (comments: none)
GNU Linux-libre 3.10-gnu is out. "
No big deblobbing news for this
one: a handful of new drivers that requested blobs had to be deblobbed, a
few others had to be updated because of new blob requests."
Full Story (comments: none)
Kubuntu, Lubuntu, Ubuntu GNOME, and UbuntuKylin have released a first alpha
version of 13.10 "Saucy Salamander".
Full Story (comments: none)
Distribution News
Fedora
The Cooperative Bug Isolation Project (CBI) is now available for Fedora
19. CBI is a research effort designed to find out what went wrong when
software crashes. Download CBI packages and help squash bugs.
Full Story (comments: none)
Fedora 17 will reach its end-of-life on July 30, 2013. No further updates
will be available after that time.
Full Story (comments: 1)
Newsletters and articles of interest
Comments (none posted)
Oracle recently changed the Berkeley DB license to AGPLv3 prompting a
discussion on the Debian lists about possible conflicts between GPLv2
licensed software in Debian and the new AGPLv3 BDB. Bradley Kuhn sent an
email to the Debian-legal mailing list with his point of view. "
I
know that some have complained that compliance with AGPLv3 may require more
work by Debian redistributors. That is a reasonable concern, but I think
the issue can be mitigated. The argument is roughly analogous to this one:
complying with GPLv2 is more difficult than complying with the Apache
license. But, unless Debian wants to take a wholesale position opposed to
copyleft, I don't think this issue is or should be considered
insurmountable."
Full Story (comments: 29)
Jonathan Riddell has
announced
that the Kubuntu distribution will not be following Ubuntu in its switch to
the Mir display server. "
Here at Kubuntu we still want to work as
part of the community development, taking the fine software from KDE and
other upstreams and putting it on computers worldwide. So when Ubuntu
Desktop gets switched to Mir we won't be following. We'll be staying with X
on the images for our 13.10 release now in development and the 14.04LTS
release next year. After that we hope to switch to Wayland which is what
KDE and every other Linux distro hopes to do."
Comments (153 posted)
Stefano Zacchiroli has
announced the
sources.debian.net (sources.d.n) web site, which hosts the source code for Debian packages. "
Via sources.d.n you can therefore browse the content of Debian source packages with usual code viewing features like syntax highlighting. More interestingly, you can search through the source code (of unstable only, though) via integration with http://codesearch.debian.net. You can also use sources.d.n programmatically to query available versions or link to specific lines, with the possibility of adding contextual pop-up messages (example)."
Comments (21 posted)
Page editor: Rebecca Sobol
Development
By Nathan Willis
July 3, 2013
A new release of the FOSSology source-code analysis
tool is out. Although there have been minor updates, this is the
first update in 2013 to bring additional functionality. The 2.0 release in 2012 marked
a major shift for the project, debuting a new, more modular design
and paving the way for faster releases. The newest update, version
2.2.0, includes a new permissions scheme and some usability
improvements, but in the long run, the most notable feature in this
release may be the improved compatibility with the Software Package Data Exchange (SPDX)
standard for tracking software components, licenses, and copyrights.
FOSSology is designed to be a flexible platform for analyzing
source code, but it is best known for its ability to scan large
collections of files and pick out licenses and copyright statements.
The resulting license and copyright information is then used to help
an organization stay in compliance with the licensing requirements it
inherits from upstream open source projects. However, there are other
use cases—for instance, at LinuxCon Japan, Armijn Hemel mentioned using FOSSology to help
automate the process for finding license violations in the source code
of software shipped in embedded Linux devices. It is not hard to imagine
the tool being adapted for other source-scanning tasks, such as to
assemble a list of contributors needed to sign off on license change.
Users can upload source packages to FOSSology, then queue scanning
jobs that analyze the packages for various types of information
handled by scanning "agents." As new code is added, components
are updated, and trees are rearranged, these scans can be run
periodically, to help check for problematic license combinations or
missing information. The basic agents available include a license
recognizer, a copyright recognizer,
a MIME-type analyzer, and a package header parser (which looks for the
packaging information defined for RPM or .deb files). However, users
can write their own agents to scan for arbitrary information.
All of the agents work by matching text patterns, which is a tricky
business, considering all of the ways a licensing statement could be
phrased, and the wide assortment of licenses that may be encountered.
FOSSology defines 600 or so at
the moment. Although they are sometimes less critical from a
legal-compliance standpoint, recognizing copyright statements is also a pattern-matching game;
FOSSology looks for text blocks that resemble copyright statements, as
well as for email addresses and URLs.
Historically, FOSSology has been deployed on a web server backed by
a PostgreSQL database, with multiple users uploading source code
bundles and performing scans through the web UI. In October 2012,
version 2.1.0 added a pair of command-line utilities,
fo_nomos_license_list and fo_copyright_list, with
which users could query the FOSSology database for license or copyright
information. The command-line utilities free up users from the web
UI, plus they make the FOSSology repository more accessible to
scripting, and they are reported to run faster. Execution speed can
be a major issue with large repositories, where a scan run in the web
UI could time out if it took too long. But in the 2.1.0 release the tools
were pretty limited in scope, since both required scanning an entire
upload (that is, one package or source archive). The 2.2.0 release
updates the utilities to accept a sub-tree as the starting node from
which to perform a scan.
Version 2.2.0 also introduces a new permissions
scheme that allows administrators to limit access to specific
files on a per-file and per-user basis. The system implements its own
set of internal user groups (i.e., separate from the Unix groups that may be
associated with accounts); each user in a group can be granted read
permission, write permission, and user/group-administration
permission. The ability to upload source packages to the application
is governed by a separate permission table, perm_upload,
which grants upload permission for each folder to specific groups;
each user gets his or her own group by default, which enables per-user
upload restrictions. It is a fairly straightforward system, but it
replaces the permission system used in previous releases (which bound
permissions to each individual application plugin), so administrators
may have to do some work migrating existing installations.
Licenses galore
There are, naturally, the usual collection of bugfixes and
stability improvements in this release, plus the noteworthy addition
of the ability to pull up the full text of a software license from
within FOSSology itself (useful for those rare users who do not have the
differences between GFDL v1.1 and GFDL v1.2 memorized, no doubt).
But the bigger news item on the license-presentation front is the
fact that FOSSology has migrated its list of license names to be
compatible with the canonical list supported by SPDX. The SPDX
project is a relatively new effort (dating back to 2010); it defines a metadata format for describing
the "bill of materials" of a software package, including everything
from its creator and definitive name to its URL of origin and file
checksums. In the list of mandatory items, as one might guess, is the
"concluded license" that governs the package as a whole. SPDX is
meant to be both human-readable and machine-parsable (RDF is the
preferred file format), so the specification includes a list of open
source licenses.
SPDX is also in use by a few other source analysis tools, such as
the Ninka scanner and
the commercial tools used by Black Duck Software. The specification
is written by a Linux Foundation workgroup,
which is currently drafting a new
revision.
What SPDX support brings with it is the ability to use FOSSology
data in conjunction with other tools based on sharing a common file
format. The license-compliance problem is no longer one that
organizations can ignore. Last week, Harald Welte won a GPL infringement case in Germany in
which the court held that the violator had to ascertain on its own
that it was in compliance with the licensing requirements it inherited
from upstream suppliers. In other words, even if a device maker
contracts out the software to a third party, it is still required to
verify that the source code it offers in compliance with the GPL
actually corresponds to the software on the device. For a device
maker that does not do development itself, that could be a tricky
undertaking. But with independent tools able to report licensing
information in a compatible format, the problem becomes easier
(although still not trivial) to solve.
For its part, FOSSology has adopted SPDX's names for the licenses
already on its list of recognized licenses, and the 2.2.0 release
notes comment that the application also added support for a few
SPDX licenses not previously recognized by its license agent.
FOSSology is most certainly a specialist's tool at this stage, but the
refactoring that went into the 2.x series may make it useful for a
wider variety of applications, if developers write scanning modules of
their own to look for interesting nuggets buried in the source code.
There was a one-year wait between version 1.4 and 2.0, but in the year
since, the project has picked up the pace and delivered two stable
releases with functional additions. Hopefully, that signals a
platform that more developers will wish to contribute to. After all,
the free software community is (justifiably) nitpicky where licenses
and copyrights are concerned, but there are far more potentially
useful bits of information to glean from a corpus of source code,
given the proper tool to find them.
Comments (1 posted)
Brief items
A friend asked yesterday if I knew of a tool to print a web page as a single-page PDF, i.e., making the PDF page as tall as necessary to keep everything on one page.
As a result, I know that Obnam's bug list is 4915 mm tall.
—
Lars Wirzenius
We want to thank all our loyal fans.
—
Google, after shutting down Google Reader.
Comments (none posted)
Version 5.1 of the Qt toolkit has been
announced.
"
We have added many new modules that largely extend the functionality
offered in 5.0. The new Qt Quick Controls and Qt Quick Layouts modules
finally offer ‘widgets’ for Qt Quick. They contain a set of fully
functional controls and layout items that greatly simplify the creation of
Qt Quick based user interfaces."
Comments (1 posted)
Version 1.9 of the Upstart init-replacement has been released. This version adds support for AppArmor through two new stanzas, adds a stateful re-exec, and allows inherited environment variables to be un-set for Session inits. In addition, a new D-Bus signal bridge has been added, as has a client library (libupstart) through which applications can communicate with Upstart.
Full Story (comments: none)
A new version of the systemd init-replacement has been released. Version 205 includes "a number of major new concepts, such as
transient units, scopes and slices, which turn systemd into something
that is far more dynamic than it ever was," a new systemd-run tool, and is the first release in which systemd assumes management of control groups (cgroups).
Full Story (comments: none)
Version 1.7 of the GNUstep Objective-C runtime has been released. Changes include the move to a CMake-based build systems, a CTest-based test suite, and significant improvements in property introspection. The test suite itself has also been improved, as has integration with libdispatch and with foreign exceptions (e.g., exceptions from C++). Finally, MIPS64 is now supported in the assembly routines.
Full Story (comments: none)
Version
0.7 of the Rust language is out. "
This release had a markedly
different focus from previous releases, with fewer language changes and
many improvements to the standard library. The highlights this time include
a rewrite of the borrow checker that makes working with borrowed pointers
significantly easier and a comprehensive new iterator module
(std::iterator) that will eventually replace the previous closure-based
iterators." See
the
release notes for details.
Comments (none posted)
Newsletters and articles
Comments (none posted)
KDE.News has an
interview with Jolla engineer Vesa-Matti Hartikainen who will be giving a keynote at KDE's
Akademy conference in mid-July. The interview covers various topics, from the history of Jolla (and how it came out of MeeGo and the N9 Nokia phone efforts) to the use of Qt in Jolla's Sailfish OS. "
For Jolla, Qt is a first class citizen. For developing apps using QML, we put a lot of effort into making them as good as possible. We have an awesome team working on the Sailfish Silica component set. It includes many of the original core developers of the QML language and runtime. And we have really experienced app developers from N9 and other Nokia projects. On the middleware level, a lot of the lower level APIs now have quite good Qt bindings for C++ developers."
Comments (none posted)
Linux.com
introduces the
Swift parallel scripting language. It may offer some assistance in solving the parallel programming problems noted by Andreas Olofsson in his
keynote at this year's Linux Foundation Collaboration Summit.
"
Swift plays a simple but 'pervasively parallel' coordination role to create the upper level logic of more complex applications, [Argonne National Laboratory and the University of Chicago's Michael] Wilde said. 'It makes it very easy to parallelize what we often call the "outer loops".'
Highly parallel applications can thus be composed by gluing together serial algorithms because Swift creates the parallelism automatically at runtime, without explicit direction from the programmer. It does this by first encapsulating the applications that are called within a script as 'functions' with uniform interfaces, and then applying automatic data flow, he said."
Comments (4 posted)
At his blog, Adam Nemeth has harsh words to share about Google's recent decision to move away from the XMPP instant messaging protocol. Specifically, he criticizes XMPP itself: "Jabber failed to provide good enough spam protection, failed to provide a scalable protocol, failed to provide easy transfer of accounts between providers (if I change e-mail address, I don't have to re-add all my friends, it's enough to set a simple forward or inbox pulling - that's not true for Jabber IDs!)." The result, he argues, was that client application developers never found the protocol all that compelling.
Comments (94 posted)
On his blog, FirefoxOS developer Christian Heilmann
reflects on why he is excited about the phone operating system in light of the
announcement of the first FirefoxOS smartphones. One of five things he highlights: "
FirefoxOS does not assume a fast, stable and always available connection. When traveling I start hating my Android phone which I love to bits otherwise. Having dozens of megabyte updates over roaming is out of the question and neither is using flaky and slow wireless connections. Firefox OS has no native apps – all of them, including the system apps are written in HTML, CSS and JavaScript. Thus they are much smaller and can have atomic updates instead of having to be replaced as a unit every single time."
Comments (46 posted)
Page editor: Nathan Willis
Announcements
Brief items
The Document Foundation has announced that AMD has joined its advisory
board and will be working to support the acceleration of LibreOffice on its
processors using the
HSA
architecture. "
HSA is an innovative computing architecture that
enables CPU, GPU and other processors to work together in harmony on a
single piece of silicon by seamlessly moving the right tasks to the best
suited processing element. This makes it possible for larger, more complex
applications to take advantage of the power that has traditionally been
reserved for more focused tasks. While the biggest impact will be for AMD
APU users, supporting benefits of the work will improve the LibreOffice
core data structures enabling larger spreadsheets to calculate faster for
all users.
Full Story (comments: 10)
The Electronic Frontier Foundation has announced that security expert Bruce
Schneier has joined its Board of Directors. "
Schneier is widely
acclaimed for his criticism and
commentary on everything from network security to national
security. His insight is particularly important as we
learn more and more about the unconstitutional surveillance
programs from the National Security Agency and the depth
and breadth of data the NSA is collecting on the public."
Full Story (comments: none)
GigaOM is
reporting
that Doug Engelbart, famous as the inventor of the mouse, has passed away.
Another pioneer is gone.
Comments (1 posted)
Articles of interest
The June edition of the Free Software Foundation newsletter covers a
statement on PRISM revelations, FSF-certified device from ThinkPenguin,
MediaGoblin 0.4.0, LibreWRT, and several other topics.
Full Story (comments: none)
Wired looks at delays for open-source-oriented groups in getting their applications for non-profit status accepted—or denied—by the US Internal Revenue Service (IRS). Open source is on a list of organization types that require extra scrutiny from the IRS—"Tea Party" groups making that list have been in the news over the last month or so. "
That has provided the documentary evidence for a phenomenon that many open source project leaders know all too well: For the past four years, it's been close to impossible to get an open source project approved for 501(c)(3) classification — a nonprofit status that allows supporters to make tax-exempt donations to the organization.
Take the Open Source Geospatial Foundation, which builds open-source mapping software called OSGeo. It first applied for 501(c)(3) status more than five years ago, according to Tyler Mitchell, the former executive director of the foundation. 'It's not resolved today,' he says. 'You'll just keep thinking that it will be resolved in a couple of weeks. It never will be. '"
Comments (29 posted)
The Register
reports
that the schooner Nina, carrying Evi Nemeth and others, has been lost at sea. "
One of the shining lights of the world of Unix, retired CU professor Evi Nemeth, is among a group of sailors missing at sea near New Zealand.
The author of system administration tomes covering both Unix and Linux – and, incidentally a mathematician of sufficient quality to identify problems with Diffie-Helman encryption – has spent much of her retirement sailing."
Comments (15 posted)
The Free Software Foundation Europe has published its Free Software
related election questions for this fall's elections to the German
parliament. "
First, something pleasant: SPD, the Greens, the Pirate
party, the Linke and the Free Voters want software where development was
funded by the public administration to be published under a free
licence."
Full Story (comments: none)
Calls for Presentations
CFP Deadlines: July 4, 2013 to September 2, 2013
The following listing of CFP deadlines is taken from the
LWN.net CFP Calendar.
| Deadline | Event Dates |
Event | Location |
| July 6 |
September 23 September 27 |
Tcl/Tk Conference |
New Orleans, LA, USA |
| July 8 |
October 21 October 23 |
Open Source Developers Conference |
Auckland, New Zealand |
| July 15 |
August 16 August 18 |
PyTexas 2013 |
College Station, TX, USA |
| July 15 |
October 22 October 24 |
Hack.lu 2013 |
Luxembourg, Luxembourg |
| July 19 |
October 23 October 25 |
Linux Kernel Summit 2013 |
Edinburgh, UK |
| July 20 |
January 6 January 10 |
linux.conf.au |
Perth, Australia |
| July 21 |
October 21 October 23 |
KVM Forum |
Edinburgh, UK |
| July 21 |
October 21 October 23 |
LinuxCon Europe 2013 |
Edinburgh, UK |
| July 21 |
October 19 |
Central PA Open Source Conference |
Lancaster, PA, USA |
| July 22 |
September 19 September 20 |
Open Source Software for Business |
Prato, Italy |
| July 25 |
October 22 October 23 |
GStreamer Conference |
Edinburgh, UK |
| July 28 |
October 17 October 20 |
PyCon PL |
Szczyrk, Poland |
| July 29 |
October 28 October 31 |
15th Real Time Linux Workshop |
Lugano, Switzerland |
| July 29 |
October 29 November 1 |
PostgreSQL Conference Europe 2013 |
Dublin, Ireland |
| July 31 |
November 5 November 8 |
OpenStack Summit |
Hong Kong, Hong Kong |
| July 31 |
October 24 October 25 |
Automotive Linux Summit Fall 2013 |
Edinburgh, UK |
| August 7 |
September 12 September 14 |
SmartDevCon |
Katowice, Poland |
| August 15 |
August 22 August 25 |
GNU Hackers Meeting 2013 |
Paris, France |
| August 18 |
October 19 |
Hong Kong Open Source Conference 2013 |
Hong Kong, China |
| August 19 |
September 20 September 22 |
PyCon UK 2013 |
Coventry, UK |
| August 21 |
October 23 |
TracingSummit2013 |
Edinburgh, UK |
| August 22 |
September 25 September 27 |
LibreOffice Conference 2013 |
Milan, Italy |
| August 30 |
October 24 October 25 |
Xen Project Developer Summit |
Edinburgh, UK |
| August 31 |
October 26 October 27 |
T-DOSE Conference 2013 |
Eindhoven, Netherlands |
| August 31 |
September 24 September 25 |
Kernel Recipes 2013 |
Paris, France |
| September 1 |
November 18 November 21 |
2013 Linux Symposium |
Ottawa, Canada |
If the CFP deadline for your event does not appear here, please
tell us about it.
Upcoming Events
The openSUSE Conference team has announced the sponsors for the
openSUSE Conference 2013 which takes place July 18-22 in Thessaloniki,
Greece. Sponsors include SUSE Linux GmbH, ARM, DevHdR, Oracle, and more.
Full Story (comments: none)
The Linux Professional Institute will host an exam lab at the openSUSE
Conference in Thessaloniki, Greece on July 20.
Full Story (comments: none)
Events: July 4, 2013 to September 2, 2013
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
July 1 July 5 |
Workshop on Dynamic Languages and Applications |
Montpellier, France |
July 1 July 7 |
EuroPython 2013 |
Florence, Italy |
July 2 July 4 |
OSSConf 2013 |
Žilina, Slovakia |
July 3 July 6 |
FISL 14 |
Porto Alegre, Brazil |
July 5 July 7 |
PyCon Australia 2013 |
Hobart, Tasmania |
July 6 July 11 |
Libre Software Meeting |
Brussels, Belgium |
July 8 July 12 |
Linaro Connect Europe 2013 |
Dublin, Ireland |
| July 12 |
PGDay UK 2013 |
near Milton Keynes, England, UK |
July 12 July 14 |
5th Encuentro Centroamerica de Software Libre |
San Ignacio, Cayo, Belize |
July 12 July 14 |
GNU Tools Cauldron 2013 |
Mountain View, CA, USA |
July 13 July 19 |
Akademy 2013 |
Bilbao, Spain |
July 15 July 16 |
QtCS 2013 |
Bilbao, Spain |
July 18 July 22 |
openSUSE Conference 2013 |
Thessaloniki, Greece |
July 22 July 26 |
OSCON 2013 |
Portland, OR, USA |
| July 27 |
OpenShift Origin Community Day |
Mountain View, CA, USA |
July 27 July 28 |
PyOhio 2013 |
Columbus, OH, USA |
July 31 August 4 |
OHM2013: Observe Hack Make |
Geestmerambacht, the Netherlands |
August 1 August 8 |
GUADEC 2013 |
Brno, Czech Republic |
August 3 August 4 |
COSCUP 2013 |
Taipei, Taiwan |
August 6 August 8 |
Military Open Source Summit |
Charleston, SC, USA |
August 7 August 11 |
Wikimania |
Hong Kong, China |
August 9 August 11 |
XDA:DevCon 2013 |
Miami, FL, USA |
August 9 August 12 |
Flock - Fedora Contributor Conference |
Charleston, SC, USA |
August 9 August 13 |
PyCon Canada |
Toronto, Canada |
August 11 August 18 |
DebConf13 |
Vaumarcus, Switzerland |
August 12 August 14 |
YAPC::Europe 2013 “Future Perl” |
Kiev, Ukraine |
August 16 August 18 |
PyTexas 2013 |
College Station, TX, USA |
August 22 August 25 |
GNU Hackers Meeting 2013 |
Paris, France |
August 23 August 24 |
Barcamp GR |
Grand Rapids, MI, USA |
August 24 August 25 |
Free and Open Source Software Conference |
St.Augustin, Germany |
August 30 September 1 |
Pycon India 2013 |
Bangalore, India |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol