User: Password:
|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for July 4, 2013

Merging lock elision into glibc

By Nathan Willis
July 3, 2013

Glibc, the GNU C library, is a foundational free software project, and one that has served its important role for well over two decades. Recently, though, it has undergone a shift in its development model, with an increase in contributions from volunteers outside of the core team of committers. That sort of change can bring about strife, but it can also inspire important new features. The latter is certainly true—and the former might also be true—with the case of lock elision, which is set to be included for the first time in the just-frozen 2.18 branch. 2.18 will enable developers to test the glibc implementation of lock elision for some lock types but not others—primarily since there is not full consensus within the project on how to enable such experimental new features.

Lock elision, in general, is a technique for speeding up multi-threaded programs that may contend for the same lock. As more and more cores become available, synchronization between threads becomes increasingly expensive. One way to avoid the synchronization overhead is to use transactional memory instead. A memory transaction buffers all of the possible results of an operation; in the common case where nothing interrupts the operation, the transaction is committed, but if something does interfere, the transaction can be rolled back atomically.

For locks, this means that multiple threads can acquire locks on the same object, and if they are modifying different parts of the object, both transactions ought to succeed. For example, one thread can acquire a lock on an object (say, tablefoo), update the row it is interested in (tablefoo(N)), then release the lock. Meanwhile, a different thread can also acquire a lock on tablefoo for the purpose of updating tablefoo(M). Using memory transactions, both threads can afford to be conservative about locking the whole object, but cavalier about updating the part of the object they need—most of the time, both transactions will go through; only when both threads try to update tablefoo(N) is there a collision, at which point one transaction must be rolled back.

But the real trick is that, in this common case where the two threads do not collide, the locks themselves are unnecessary. If the program is smart enough to recognize this, acquiring and releasing the locks can simply be skipped. This is lock elision. Unfortunately, up until recently, real-world implementations of transactional memory (almost always in software) have been too slow to offer a real advantage over manipulating the locks in the traditional manner. That changed, however, with the debut of Intel's Transactional Synchronization Extension (TSX), a hardware implementation of transactional memory for the Haswell generation of CPUs. Consequently, building TSX support into the lock implementations of common libraries would allow existing programs to take advantage of lock elision speed-ups without even recompiling.

Intel's Andi Kleen has been working on a TSX-based lock elision implementation for glibc, which he wrote about back in January 2013. The 14-part patch set has been through many iterations, but in late June the deadline was fast approaching for the glibc 2.18 freeze, and the status of the patches was still a matter of debate.

The patch set adds elision capabilities to both POSIX thread (pthread) mutexes and read/write locks (rwlocks), and uses an adaptive algorithm to decide when to elide locks in a given code path. Essentially, the algorithm keeps track of whether each mutex succeeds at eliding a lock, and if it fails, elision is suspended for a period of time. Not all lock variants are supported; in particular, locks are never elided for recursive mutexes. Elision is automatically attempted when the CPU supports transactional memory, but developers can also explicitly enable or disable it in their code. Kleen's patch set also offers two environment variables, GLIBC_PTHREAD_MUTEX and GLIBC_PTHREAD_RWLOCK, which users can use to explicitly enable or disable each flavor of elision when starting a program.

Lock down?

The general consensus on the libc-alpha mailing list was that the mutex patches (patches 1 through 5, plus 7) were ready for inclusion. However, there was less agreement on the other patches, for three reasons. First, the glibc team was wary of the rwlock patches due to a disagreement over how to interpret the POSIX standard. According to the standard, a "normal" (i.e., non-recursive, non-errorcheck, non-timed) mutex is required to deadlock if the owner of the lock attempts to re-lock it while already holding the lock. However, if the lock in question is elided, this required deadlock does not occur. It is certainly debatable whether or not avoiding a deadlock is really a bad thing (after all, deadlocks are bugs), but the glibc project decided to follow the standard to the letter, and elide only non-"normal" mutexes.

But what is not clear from the specification is whether or not the same behavior is required for rwlocks. Carlos O'Donell has contacted the Austin Group to ask for clarification; if the official answer is that rwlocks are required to deadlock on re-locks, then the rwlock patches for glibc will not be merged as-is.

Second, the definition of "normal" for mutexes is not a simple affair. The POSIX standard in fact requires specific behavior of "normal" mutexes (such as the deadlock-on-re-lock behavior mentioned above). But the standard also allows for a different type of mutex termed "default" mutexes, in which the implementation is allowed more freedom of behavior. In previous versions of glibc, PTHREAD_MUTEX_NORMAL and PTHREAD_MUTEX_DEFAULT were defined to be identical. Kleen's patch set splits them, in order to allow "default" mutexes to be elided. Technically, not deadlocking on re-lock would violate the standard, even though not deadlocking because the lock had been elided would often be seen as a preferable outcome. But splitting PTHREAD_MUTEX_NORMAL and PTHREAD_MUTEX_DEFAULT could be seen as an ABI change, and, even though it would not affect old binaries, several in the project (such as Roland McGrath) felt that more consideration is needed before making the split, since it would be difficult to reverse after the fact.

Finally, there was also a lack of consensus about whether or not environment variables are ultimately the most appropriate mechanism with which to tune optional runtime features like lock elision. In addition to the coarse-grained enable-or-disable-elision functionality of the new environment variables, Kleen's patch set also adds several parameters that can be used to tune the behavior of the adaptive algorithm. Some would prefer adding a "tunables" API, while others see no problem with adding new environment variables under a well-known namespace (namely, GLIBC_) as long as there is sufficient discussion. Plus, since the elision algorithm is brand-spanking-new, with little or no testing outside of the confines of the glibc project, it is still possible that the algorithm itself could undergo a major overhaul before it is ready for real-world use. Offering tunable parameters is one way for real-world tests to help refine the algorithm, but if users become dependent on the specifics exposed, swapping in a different algorithm later becomes trickier.

Testing is a related matter, albeit one not currently holding up inclusion of the lock elision patch set. IBM's Dominik Vogt has been testing the patch set on the company's System z platform, which is the only other widely available processor architecture to offer its own implementation of hardware transactional memory. So far, his tests have produced as many questions as answers (in part because he is still working his way through the internal processes required to publicly release the test suite as free software). But in the long term, providing a test suite is a vital step for the project—doubtless in coming years there will be more processor architectures to add their own implementations of transactional memory.

Friends of the library

O'Donell declared glibc frozen for 2.18 on July 2, after Kleen checked in the approved mutex patches. As of press time, the project was still waiting to hear back from the Austin Group for clarification on the rwlock deadlock-on-re-lock question, and it appears that decisions on the normal/default split, tunables, and other patches may get deferred until later. O'Donell has assembled the contributor input on the tunables question in a page on the project's wiki. That discussion is expected to take longer than anyone wishes to keep glibc 2.18 in freeze.

Apart from the technical details of adding lock elision, Kleen's work to add the feature to glibc can be seen as a case study of the project's new, consensus-driven development style. For many years, development was overseen by Ulrich Drepper, who earned a reputation as a prickly gatekeeper past whom few outside contributions ever made it to land in the codebase. Other longtime project members (including McGrath) formed a "steering committee" to try and work around the practical problems that arose from Drepper's management style. But Drepper left the project in 2010, and in March 2012, McGrath announced that the steering committee had voluntarily dissolved, to make way for a more open, community-driven model.

Kleen's lock elision patch set, coming as it does largely from outside, is proof that the hard-to-persuade gatekeeper model is indeed gone. In practice, the glibc community arrived at rough consensus on the patch set in a series of epic-length list threads, and it could certainly be argued that the eventual consensus was incomplete. In fact, O'Donell was even a bit apologetic toward Kleen about how difficult the process had been, encouraging him to stick around and continue to contribute. Or, as he put it in a separate message, "We haven't merged something of this complexity in a long time. Please bear with us as we get better with the process."

But, ultimately, at least part of the mutex lock elision implementation has been merged, which might not have happened at all if one project manager were to have made the call in isolation. The consensus model still defers greatly to the experienced contributors (like McGrath), but that is certainly appropriate. In the end, patches were merged, with more or less full agreement, and the remaining issues are largely debates about the long term impact of the implementation—not vehement opposition.

Comments (32 posted)

"Good enough" is good enough

July 3, 2013

This article was contributed by Martin Michlmayr


EuroPython 2013

In a EuroPython keynote (slides [PDF]), Alex Martelli, a founder of the Italian Python association and author of several Python books, shared his thoughts on software development and more generally on the path toward perfection. He observed that there's a cultural assumption that we should always be striving for perfection at all times. Martelli argued instead that "good enough" is often good enough and that this approach will, in fact, lead to better results in the long run.

Worse is Better

[Alex Martelli]

Martelli opened his talk by recounting a debate that was started in 1989 by Richard Gabriel. Gabriel contrasted two approaches to software design and implementation: the New Jersey style, also known as "Worse is Better", and the MIT/Stanford approach, known as "the Right Thing". These approaches can be contrasted according to four core values: simplicity, correctness, consistency, and completeness. Martelli observed that "it's hard to argue against any of these values", but that the two styles weigh the importance of the four values in different ways, which is important when there's a conflict between them.

The "Worse is Better" approach puts strong emphasis on simplicity. Simplicity pertains to both the implementation and the interface, and is a crucial consideration in the design of a system. Martelli gave Unix as an example where this approach can be observed. The question to ask, according to Martelli, is "can I think of a simple implementation of this design concept?" Correctness is obviously important, but it's more important to be simple than to be correct. In terms of consistency, the expectation is not to be overly inconsistent. Finally, completeness can be sacrificed in favor of any of the other values and it must be sacrificed if simplicity is threatened. This can be seen in the Unix philosophy "just do one thing really well", said Martelli, explaining that "well means simple". In the MIT/Stanford approach, or "the Right Thing", correctness is a top priority, as is consistency. The focus of simplicity is on the interface. The back end can be complex as long as the interface is simple. Completeness is roughly as important as simplicity.

What this means in practice is that "the Right Thing" philosophy is dominated by experts — experts who have to make the system perfect before users can access it. On the other hand, the "Worse is Better" approach makes use of incremental development. Martelli paraphrased G. K. Chesterton's quote "if a thing is worth doing, it is worth doing badly", explaining that by doing it "badly", you get there earlier — and you can work on improving it.

Martelli went on to compare Gabriel's model with Eric Raymond's The Cathedral and the Bazaar. While the former covers the software design process, the latter focuses on the development process, but there are many parallels. Martelli observed that the "Cathedral" development style is close to "the Right Thing" approach — a defining characteristic of both models is that experts are in charge. There are also many similarities between the "Bazaar" and the "Worse is Better" model: it's a chaotic, iterative process in which a crowd is in charge. Raymond's mantra "given enough eyeballs, all bugs are shallow" emphasizes that bugs are found and fixed much faster in a crowd-sourced system.

Perfect as a verb

Martelli explained the problem with "perfection", which is that releasing a "perfect" system implies BDUF — Big Design Up Front. Everything must proceed top-down: you need perfect identification of requirements, a perfect architecture, perfect design, and perfect implementation. The problem is that this approach takes forever. Martelli observed that there's always something to improve. The real world also interferes with this approach, as users or customers don't want to wait forever for a new release.

In the real world, requirements change all the time, architecture varies with design choices, design varies with implementation technologies, and the implementation always has some bugs. In fact, most bugs are only discovered in real world deployment. Martelli argued that iterative development is therefore the only viable approach: deploy something, fix bugs, and improve the system based on feedback.

Summing up his thoughts on achieving perfection, Martelli suggested that "perfect" should be understood as a verb rather than an adjective. Perfecting your software is a laudable goal, but it's a process rather than a state as you never reach perfection — the goalposts keep shifting all the time.

Bugs are a normal part of this perfection process. While many programmers believe that they write perfect code, that's not how it actually works. In 1974, Martelli, then a university student of hardware design, and two colleagues had to write a Fortran program together. They used punch cards and the program had to be perfect as they only had one chance to run it. "You know about pair programming", Martelli remarked, "this was pair punching". As it turns out, the program ran perfectly the first time. Unfortunately, this was the only perfect program he wrote in his 40 year career, so "don't count on it as your mode of development".

The main question to consider with bugs is whether they cause irrecoverable losses. As long as your software only causes problems one can recover from, you're okay, especially if the software is clearly tagged as beta. If your bug could kill someone, for example because you work on medical device control software, "a bug could easily cause irrecoverable losses" and a different approach may be required. However, this is not the case in most situations.

The other aspect to consider is whether your reputation can recover from the damage your bugs cause. Martelli explained that the key is how you respond to bug reports — a courteous, speedy response to issues is vital, even when you're not paid by the user. Users spend their time evaluating your software and reporting issues to you, so they should be respected. "The person who points out the bug is not my enemy but my best friend", Martelli noted.

In a weird way, bugs may even be seen as a feature. Martelli mentioned the service recovery paradox — there is some evidence that the customers with the highest level of satisfaction are not those who never had any problem at all, but those who successfully have had a problem resolved. While this should not encourage programmers to introduce bugs on purpose, it shows that bugs — if properly dealt with — are not the end of the world.

What not to skimp on

While Martelli encouraged a "good enough" approach to software development, he noted that there are some things you cannot skimp on. You absolutely need a lightweight, agile process. Martelli doesn't care which, but the process has to include revision control (which one doesn't matter as they are all "good enough"), code reviews, and testing. Proper release engineering practices are also crucial, so you know what was released as which version. Additionally, you must promote good coding style, clarity, and elegance. Finally, documentation cannot be skipped: "if you're not documenting what you're releasing, you're essentially asking your users to reverse engineer what you've done". Summarizing these requirements, Martelli explained that "there's no condition under which cowboy coding is acceptable".

Martelli added that security must be a concern from the start as it's very difficult or impossible to add this later. He means security in a general sense, including aspects such as privacy and auditability. On the other hand, some features that would be nice to have from the beginning can usually be added by refactoring the code later. Such features include modularity and a plug-in architecture, an API, and scalability. "You can incur some technical debt", Martelli suggested, as long as you do it with care.

Conclusion

Toward the end of the talk, Martelli gave examples from other areas of life and explained that his "good enough" philosophy is not restricted to software design and implementation. For example, he asked whether it makes senses to hire the "perfect employee". It's quite likely that such a person would not be available for hire anyway, that they would exceed the budget, or that they simply don't exist. Instead, he suggested finding a good (not perfect) candidate who is a good match, in terms of personality and company culture, and to provide training for missing skills.

Finally, Martelli clarified that his aim is not to lower expectations. You should dream big, but the best way to achieve those dreams remains the "release early, release often" paradigm and to learn from real users' interactions. The abstract of Martelli's talk noted that "this talk is probably not perfect, but I do think it's good enough" — in my opinion, and judging from the reaction of the audience, it was certainly "good enough" to provoke a lot of interesting thoughts and discussions.

Comments (9 posted)

Philosophy and "for" loops — more from Go and Rust

July 3, 2013

This article was contributed by Neil Brown

Once upon a time, a new programming language could be interesting because of some new mechanism for structured flow control. An if statement that could guard a collection of statements would be so much easier than one which just guarded a goto. Or a for statement which took control of the loop variable could simplify matrix multiplication significantly. An illuminating insight into this earlier age can be found in Knuth's "Structured Programming with go to statements" [PDF]. Many of the issues that seemed important in 1974 seem very dated today, but some are still fresh and relevant.

The work of these early pioneers has left us with five basic forms that appear to be common to most if not all procedural languages: two conditional constructs, if and switch/case; two looping constructs, while and for; and one encapsulation construct: the function or procedure.

While interesting new control flow is unlikely to be a headline item on a newly developed language these days, each language must embody concrete choices concerning these structures and it is quite clear that, while there is similarity, we are far from uniformity. Exploring how a language handles control flow can provide interesting insights into the philosophy behind the language. In this article, we will continue our explorations of Go and Rust by looking at various control-flow structures, but particularly focusing on the "for" loop.

The background of for loops

The for loop first appeared in programming languages as an easy way to step through a fixed list of values. We can see this in Fortran, which used the word do rather than for (here 10 is the label of the statement after the loop):

    do 10 i = 1, 100, 2

and in Algol58:

    for i := 1(2)100 do

Algol60 adds some syntactic sugar

    for i := 1 step 2 until 100 do

while Pascal dropped the step clause so you would need:

    for j := 0 to 49 do

and then set i := j * 2 + 1 inside the loop.

The Algol60 for loop was actually quite rich as can be seen by the examples here. It is a richness that probably seems excessive by today's standard. In C, which came a decade later, several of the ideas in Algol were generalized and simplified to encapsulate all the interesting possibilities in just three expressions: initializer, test, and step, thus:

    for (i = 1; i < 100; i += 2)

As the three expressions can be almost arbitrarily complex, very rich looping constructs can be created from this simple form. The effect is that the head of the for forms a coroutine that is executed in concert with the body of the for loop. Control alternates between one and the other, so that together they achieve the desired result.

The coroutine nature of the for loop's head is made particularly obvious by the many (over 150) for_each macros that appear in the Linux kernel. With these the code for one routine is physically quite separate from the other, emphasizing the separate roles of the two pieces of code. An example of such a for_each macro, from include/linux/radix-tree.h is

    #define radix_tree_for_each_slot(slot, root, iter, start)               \
            for (slot = radix_tree_iter_init(iter, start) ;                 \
                 slot || (slot = radix_tree_next_chunk(root, iter, 0)) ;    \
                 slot = radix_tree_next_slot(slot, iter, 0))

This example is interesting for a couple of reasons.

First, the middle expression — the loop-continue condition — is not simply a condition, but contains an assignment and is sometimes used to find the next value. This makes it clear that they aren't simply expressions with fixed purposes, but rather three separate entry points into a coroutine.

Secondly, it contains two variables that change throughout the loop: slot and iter. The slot variable is the regular loop variable that any for loop would have, while iter contains extra state for tracking the path through the list and is largely of internal interest.

While it is primarily internal, it needs to be visible externally, and, in fact, needs to be declared externally. The for statement has some properties of a coroutine, but cannot define local variables for use throughout the loop.

So we see in the C for loop, particularly when combined with other features of C such as the rich expressions and the macro preprocessor, a very powerful, though not completely satisfactory, for loop mechanism. One that will serve as a basis for examining others.

Go for — broke or beautiful?

The for loop in Go comes in three different forms — not quite the range of Algol60, but seemingly more than C. One form is superficially very similar to that in C: the parentheses are not required, and the loop body must be a "block" rather than a simple statement. But these are syntactic differences which don't affect expressiveness. The earlier iterative example looks much the same in Go as in C:

    for i := 1; i < 100 ; i += 2 { .... }

The parallel ends there, however. Simple for loops will look much the same, but complex for loops will have to look quite different. This is partly because Go has no macro preprocessor and partly because Go expressions are not as rich as C expressions. While the C for loop simply contains three expressions, the Go for loop contains a "simple statement", an "expression", and another "simple statement", where "simple statement" specifically includes assignments and increments/decrements.

Were we to try a literal translation of the radix tree for_each loop into Go, we would have mixed success. Go allows the declaration of local variables inside a for loop head, so there would be no need to declare slot and iter separately. However, as the condition in a Go for statement cannot contain assignments, we find a complete literal translation is impossible. Of course measuring a language by how literal translations from another language fare is far from reasonable — we may not be using the best tool for the job and, as already noted, there are other forms of the for loop in Go.

The second form is really a reduced version of the first, with the two simple statements missing and, thus, their semicolons discarded:

    for i < 100 { ... }

That form is essentially what many other language would call a while loop.

This leaves the final form — the for/range loop.

    for x := range expression { ... }

will iterate though members of the result of the expression in various ways depending on the type of the result. This makes explicit a difference from the for loops in the earlier languages. For Fortran, ALGOL, and Pascal, the for loop dealt with sequences of numbers, or possibly "enumerated constants" which are very number-like. As we have seen, C can work with arbitrary values and the Go range clause make it clear that this loop is for much more than just numbers.

The value can be an array, a slice (part of an array), a string (of Unicode characters), a map (also known as a "hash", "associative array", or "dictionary" in other languages), or a "channel" (used for IPC). In the first four cases the for loop steps through the components of the value in a fairly obvious way. Channels are a bit different and will be examined shortly. As range does not work with user-defined types at all, we cannot translate our "radix_tree" loop directly into for/range and so must look elsewhere.

A reasonable place to look might be some existing body of Go code to see how such things are done. Though the Go compiler is not written in Go, the Go language source distribution includes many tests, libraries, examples, and tools written in Go, with a total of 2418 .go source files, all of which were presumably written by people quite familiar with the language. Altogether, there are over 7000 for loops to consider.

Of these, 1200 are of the while loop form, nearly 2800 are for/range loops, and the remaining 3000 are in the three-part form, the vast majority of which have a numeric loop variable (demonstrating that the numeric loops of yesteryear are very much alive and well). So there are not a lot of examples of iterating user-defined data structures — a fact which itself might be significant.

One example of interest is in src/pkg/container/list/list_test.go:

    for e := l.Front(); e != nil; e = e.Next() {
            le := e.Value.(int)
            ....

This example is not vastly unlike the for_each macros we saw written in C. The syntax is clearly different, but the idea of having a very simple "head" on the for loop, with the actual code for the coroutine being off in a different file, is represented quite clearly. The for loop fragment given could easily be for almost any data structure. If there was a desire to keep the value (le above) more distinct from the iterator (e above), a construct like:

    for slot, iter, ok := l.Front(); ok; slot, ok = iter.Next() {

could return a sequence of slots using an iterator much like the radix_tree_for_each_slot loop we saw earlier. This construct is really quite elegant and extremely general.

Another interesting example occurs in various files in src/pkg/net, such as src/pkg/net/hosts.go and takes the form:

    for line, ok := file.readLine(); ok; line, ok = file.readLine() {

This is very similar to the Front/Next example, except that Front and Next are identical. This could be considered to violate the DRY principle: Don't Repeat Yourself.

In C, this sort of loop is regularly written as:

    while ((line = fgets(buf, sizeof(buf), file)) != NULL) {

but that cannot be used in Go, as expressions do not include assignments.

This issue of expressions excluding assignments has clearly not gone unnoticed by the Go designers. If we look at the if and switch statements, we see that, while they can be given a simple expression, they can also be given a simple statement as well, such as:

    if i := strings.Index(s, ":"); i >= 0 {

which includes both an assignment and a test. This would work quite nicely for the readLine loop:

    while line, ok := file.readLine(); ok {

except that Go does not provide a while loop — only a for loop. Though the for loop does include two simple statements, neither are executed at a convenient place to make this loop work as expected. So if we are to remove the repetition of the readLine call, we must look elsewhere.

One possibility is to explore the fact that while expressions do not include assignments, they do include function calls and functions can include assignments. Go supports function literals. This means that the body of a function can be given anywhere the name of a function can be used. The body of a function may be assigned to a variable, or it may be called in place. Further, the function so defined can access any variables that are in the same scope as the function. So:

    for line := "";
        func() (ok bool) {
            line, ok = file.readLine()
        }(); {

is a for loop in the three-part form which behaves much the same as the example above from hosts.go but without repetition.

The "initialize" part of the for loop (line := "") declares a new variable, line which is initialized to the empty string (it syntactically needs to be initialized to something, though the value won't be used).

The "condition" part of the loop is an immediate call to a function literal which calls file.readLine(), returns the ok part of the result and has a side effect of assigning the line part of the result to the line variable. The = form of assignment is needed in the function, rather than the := form, so that it does not declare a new line variable, which is local to the function, but instead uses the one local to the for loop.

The "next" part of the loop is empty, and appears between the second ; and the {.

While this does remove the unfortunate repetition of the readLine call, the cure turns out to be much worse than the disease, as the loop is close to unreadable. While function literals certainly have their place, this is not that place.

This leaves one more possibility to explore — it is time to examine that "range channel" construct hinted at earlier.

Channels

Concurrency and multiple threads (known as goroutines) are deeply embedded in Go, and the preferred mechanism for communicating between goroutines is the "channel". A channel is somewhat like a Unix pipe. It conceptually has two ends, and data written to one end can be read from the other. While a pipe can only pass characters or strings of characters, a channel can pass any type known to Go, including other channels.

    for i := range my_channel {

will repeatedly assign to i each value received from my_channel and then run the body of the for loop. This is a lot like our readLine example — if only we could make lines appear on a channel. And, of course, we can.


    func lines (file *file) (<- chan string) {
        ch := make(chan string)
        go func () {
            for {
                line, ok := file.readLine()
                if !ok { break }
		ch <- line
	    }
	    close(ch)
	}()
	return ch
    }

This lines function creates a channel (the make function) and starts a goroutine (the function literal after the go keyword) that sends lines back over the channel. This could be called as:

    for line := range lines(file) {

which will very cleanly iterate over all the lines in the file with no violation of the DRY principle.

However, further examination shows that this isn't really ideal. It certainly works in the simple case, but problems arise when you break or return out of the for loop. When you do that, the channel is not destroyed and the goroutine remains in existence trying to write to it, though no one will ever read it again. Go has built in garbage collection that will reclaim unreferenced memory, but not unreferenced goroutines.

In order to clean up properly here, we would need to close the channel after breaking out of the for loop. Strangely only the write end of a channel can be closed and, since the return value of our lines function is currently the read end (<- chan string), we need to change it to return the double-ended channel. We also need to declare a variable to hold the channel:


    func lines (file *file) (chan string) {
        ch := make(chan string)
        go func () {
            for {
                line, ok := file.readLine()
                if !ok { break }
		ch <- line
	    }
	    close(ch)
	}()
	return ch
    }
    ...
    c := lines(file)
    defer close(c)
    for line := range c { ... }

Now we have a for loop that iterates over lines in a file, but that we can break out of without leaking channels or goroutines. However it isn't really elegant any more. Needing to return both ends of the channel, needing to declare a separate variable to hold that channel, and the explicit defer close are all warts which tarnishes the elegant:

    for line := range lines(file)

The conclusion is that despite the repetition, the form used in the net package of:

    for line, ok := file.readLine(); ok; line, ok = file.readLine() {

does seem to be the best way to implement the task. All of the alternatives fall short.

From loops to philosophy

It is in that last observation that part of the philosophy of Go seems to show itself. While Go offers a lot of functionality, it often seems quite restrictive in how this functionality is accessed. This is reminiscent of the 13th aphorism from the Zen of Python:

There should be one — and preferably only one — obvious way to do it.

We see this restrictiveness in for loops where the range syntax is only available for built-in types, and where the first/next structure is really the only way to do other for loops, even if it involves repeating yourself.

We can see a similar pattern with inter-goroutine communication, where channels have a privileged status. There are several language facilities that only work with raw channels much like for/range only works with internal data types. Send (ch <- v), receive (v <- ch), and the select statement (which is a bit like switch but chooses which of several blocking operations is ready to run) are completely unavailable to user-defined types.

Where Python provides a default implementation for "maps", but allows a class to provide an independent implementation using the same syntax, Go provides a built in "map" data type and permits no substitutes. The Go FAQ makes it clear that this is a conscious decision and not an oversight:

We believe that Go's implementation of maps is strong enough that it will serve for the vast majority of uses.

This is probably why we found so few examples of iterating user-defined data structures in the Go code — maps are used instead.

Finally, even the syntax has an element of restrictiveness. We saw this briefly in a previous article where the handling of semicolons impose certain style choices on the programmer. We can see it also in the go fmt command, which will reformat the code in a .go file to follow a particular standard. While this is not imposed on programmers, the language designers recommend the use of go fmt to ensure that code follows the one true layout.

This philosophy certainly has a lot to recommend it. By removing options from the programmer, the language removes the need to make choices and so frees the programmer to focus on the actual functionality that they need. It is a philosophy that also imposes heavy requirements on the language and support environment. If there is only one way to do something, then that one way had better work extremely well. Given the vibrant community that has been built up around Go, and the strong emphasis on performance shown in the recent release of Go 1.1, it seems likely that Go does live up to this requirement

Rusty loops

Turning to Rust we see a very different style of for loop. The example loop we started with which iterates over odd values from 1 to 99 would look like:

    for uint::range_step(1, 100, 2) |i| { ... }

Here the:

    |i| { ... }

piece is a function literal, similar to those we saw when exploring Go, though with a very different syntax and a different name. Rust like many other languages calls it a lambda expression. It consists of a list of formal parameters between vertical bars, and a statement block.

The

    uint::range_step(1, 100, 2)

is a reference to a function called range_step in the uint module. The uint::range_step() function actually takes 4 arguments: start, stop, step, and function. The behavior of range_step() is to call function, repeatedly passing values from start up to the stop, incrementing by step each time. Consequently our for loop could be realized simply by:

    uint::range_step(1, 100, 2, |i| {
            ...
    })

There are two problems with this. A minor point is that the syntax is arguably less pleasing than the first version. More importantly, constructs like break and continue don't have any meaning inside a function literal, so they could not affect the flow of this second loop.

The for statement addresses both of these. It provides syntax for writing the function literal outside the normal list of function parameters and it gives meaning to break, loop (the Rust equivalent of continue), and return.

By convention, the function in the head of for should stop looping when the function argument that it calls returns false. The for statement uses this by effectively translating break to return false and loop to return true. If any return statement appears in the body of the for loop, it is also translated to something that will "do the right thing".

This seems like a fairly complex set of transformations, but the end result is extremely flexible. It allows a very clear separation of the two coroutines that make up a for loop, with the head routine having the full power of a regular function that is able to declare local variables and to communicate in arbitrary ways with the body routine.

Both the "iterate over all the lines in a file" loop which we struggled with in Go, and the radix tree loop from the Linux kernel, would be trivial to implement as an iterator routine in Rust. The first of these would look like:

    pub fn every_line(f: @io::Reader, it: &fn(&str) -> bool) {
        while !f.eof() {
            let line = f.read_line();
            if !it(line) { break }
        }
    }

and could be called as:

    let f = io::file_reader(&Path("/etc/motd")).get();
    for every_line(f) |line| {
        io::println(fmt!("Line is %s", line));
    }

This power to write elegant iterators is not without its cost. While Rust allows an arbitrary function to provide the head of the for loop, it also requires the head of the for loop to be some function. The simple initialize, test, increment form of C and Go cannot be used.

If we go back and look at the nearly 3000 for loops in the Go source code that use a numeric loop variable, we find that the vast majority of them could be implemented using uint::range_step() or even the simple uint::range(). But not all. Some examples include:

    for ; i > 0; i /= 10 {

    for (mid = (bot+top)/2; mid < top; mid = (bot+top)/2) {

    for n := 1; n <= 256; n *= 2 {

    for rate := 0.05; rate < 10; rate *= 2 {

    for parent := ".."; ; parent = "../" + parent {

(the last one does not have a numeric variable of course, but is still a useful example).

Several of these could be supported by adding a very small number of extra iterators to the standard library, the rest could just as easily be implemented with a while loop. So this limitation doesn't really limit Rust a significant amount.

A Rusty philosophy?

We see, in the for loops of Rust, a very different philosophy to that of Go. While Go forces you into a particular mold, Rust lets you build your own mold with enormous freedom. You could even modify the exact behavior of break inside your for loops if that seems like a useful thing to do.

This freedom and flexibility extends to other parts of Rust too. In last month's article, we saw that Rust does not draw a distinction between expressions and statements, so it allows if and match constructs (the latter being similar to switch) deeply inside expression, whereas Go does not permit such things.

Rust goes even further with a rich macro language that can declare which syntactic elements (e.g. identifier, expression, type) may replace each macro parameter, and can repeat the body of the macro if the parameter is a list. This leaning towards extreme flexibility seems to pervade Rust and is reminiscent of the Perl programming motto: There is more than one way to do it.

Summary

There will always be a tension in language design between allowing the programmer freedom of expression and guiding the programmer toward clarity of expression. In a previous article, we saw how the type system of Rust prefers clarity over freedom. Go is not such a stickler, and is satisfied with run-time type checks in places where Rust would insist on compile-time checks. Here, when we look at the structuring of statements and expressions, we find Rust prefers freedom while Go seems more focused on clarity by eliminating unnecessary flexibility.

Which of these is to be preferred is almost certainly a very personal choice. Some people rebel against a constraining environment, others relish the focus it allows them. Both provide room for creativity and productivity. Go and Rust provide very different points in the spectrum of possibilities and it is good to have that choice ... except that it does mean that you have to choose.

Comments (29 posted)

The new CFP deadline calendar

For those who might be interested in putting a talk proposal in for an upcoming conference, we have added a new feature to the weekly Announcements page. The LWN events calendar has long been a feature of the site, but the CFP deadlines calendar was added more recently. Now the information from that calendar will also be posted to the Announcements page in tabular form. Hopefully that will help everyone keep track of those deadlines and lead to more submissions of interesting talks to the numerous conferences in our communities.

Comments (2 posted)

Page editor: Jonathan Corbet

Security

Mayhem finds 1200 bugs

By Jake Edge
July 3, 2013

The reporting of 1200 bugs, some of which may have security implications, is sure to overwhelm any distribution's bug handling abilities. So it was rather helpful that Alexandre Rebert started out by posting to the debian-devel mailing list rather than just flooding the bug tracker. Beyond just the sheer number of bugs, though, there is a question of dealing with so many potential security issues, which are generally handled differently than regular bugs. Rebert and other security researchers at Carnegie Mellon University (CMU) found the bugs in binaries from the Debian repositories using an automated bug finder called Mayhem [PDF]

Mayhem is a closed-source research project at CMU CyLab that uses symbolic execution on binary programs to find exploitable bugs in the code. It does its job by looking for load and store instructions that can be influenced by the inputs to the program. It examines the paths through the program using a "hybrid symbolic execution" mechanism that combines normal execution of the program with symbolic execution of an intermediate language representation that is created whenever a tainted (i.e. dependent on user input) branch condition is detected. The symbolic execution looks for ways to exploit the tainted code and builds an exploit if it can. The Mayhem paper goes into a lot more detail, perhaps enough for others to reproduce the technique.

The bugs are "exploitable" in the sense that each crash can execute arbitrary code. While code execution bugs are serious, the programs in question are typically run by regular users from the shell, so being able to get a shell (which is the usual proof of concept used by demonstration exploits as well as by Mayhem) is not a huge accomplishment. But being able to get a shell means that an exploit could do anything the user could do, including exposing or deleting files, participating in a botnet, sending spam, and so on. The exploits require specially crafted arguments and/or input files to trigger the bugs, so users would have to be tricked into running the programs that way.

Of course, any setuid programs or those accessible via the web or other internet services are a much larger concern. That's not to downplay what the Mayhem team has done in any way, but fuzzing has shown us that arbitrary inputs to programs often lead to crashes—the trick is finding a way to get users to provide crafted inputs that lead to an interesting (to the attacker) result. Regardless, the bugs do need to be fixed, and the Mayhem team has provided a wealth of information to do just that.

Each bug report comes with a tar file (an example for gcov was provided with Rebert's message) that contains a script to reproduce the problem, files containing the arguments and input that cause the crash, the core dump, and more. Reports for each of the bugs were sent to the appropriate Debian package maintainers, though some of those addresses were actually mailing lists, as Paul Wise pointed out. That allows us to see some of the reports, including one for the nfsidmap binary in the nfs-common package. Rebert's message also linked to a text file that lists all of the affected packages and their maintainers.

There are almost certainly more bugs out there for Mayhem to find as the team limited the search space of the tool, allowing just five minutes of run time per binary. They also limit the bugs reported to one per binary and five per package. There are likely to be plenty of duplicate bugs on the list as well; bugs in libraries may well appear for multiple binaries. And, of course, the bugs aren't limited to Debian, as many of the packages will be in the repositories of lots of different distributions; all or nearly all of them will not be Debian-specific at all.

Unfortunately, there is no automated way to extract addresses for the upstream developers or mailing lists from the Debian packages. The bug reports may ultimately need to make their way upstream, but the Mayhem team couldn't find a way to do that, so they started with the Debian maintainers. As Andreas Tille noted, some packages may have implemented the machine-readable debian/copyright file, which might provide an upstream contact and email address. But, for security reports, even that may not be the right place to send the message.

But, in fact, Rebert has recognized that the security tag on most of the proposed bug reports was probably not accurate. "It looks like a majority of the crashes have little security implications", he said, so that tag will be removed before the actual bug reports get submitted. It isn't clear that a security contact would be needed in the majority of cases but, since Mayhem sets out to find exploitable bugs, "responsible disclosure" might still indicate that a security list or email should be used to report the problems.

The problem is, in some ways, similar to the question of where bugs should be filed that we reported on last week. Which bug tracker (distribution or upstream) to use is contentious enough when looking at single bugs reported by users; 1200 bugs increases the scale of the problem significantly. The clear indication is that Mayhem can find lots more if it were given free rein, though the duplicates need to eliminated or substantially reduced or the team risks overwhelming distributions and upstreams.

The "huge pile of bugs" problem is a consequence of the closed-source nature of Mayhem. If the tool were available to be used by various projects' developers as part of their testing, the bugs could be found and fixed in the normal course of development. Rebert mentioned the possibility of creating some kind of Mayhem web service, but it would be far more useful if the tool was free software (even "free as in beer" would be better than the existing situation). Since public funds were used to develop the tool, one might hope the public would get a bit more out of that spending. The Mayhem paper mentions that the US Defense Advanced Research Projects Agency (DARPA) helped fund some of the work, but, alas, that funding doesn't seem to come with a mandate to publish the source.

It's clear that running Mayhem on the 23,000 or so binaries found in the Debian "Wheezy" repository has found real bugs, some of which are "exploitable" in limited scenarios. Some are probably worse than that, however, and as the tool gets improved, it may be able to narrow in on more dangerous bugs. One might guess that CMU and the Mayhem developers plan to commercialize Mayhem. That is, of course, their prerogative, but it is unfortunate that tools like Mayhem and the Coverity static analyzer (which came out of Stanford University) are not free software tools. One suspects they would see much more use—and, possibly, improvement—if they were.

Comments (9 posted)

Brief items

Security quotes of the week

If I could, I would repeal the Internet. It is the technological marvel of the age, but it is not — as most people imagine — a symbol of progress. Just the opposite. We would be better off without it. I grant its astonishing capabilities: the instant access to vast amounts of information, the pleasures of YouTube and iTunes, the convenience of GPS and much more. But the Internet's benefits are relatively modest compared with previous transformative technologies, and it brings with it a terrifying danger: cyberwar.
Robert J. Samuelson throws the baby out with the bath water

I find it hilarious that Redhat cripples their cryptographic security software. In the sense that it makes me wonder about the rest of their security processes and software. What the...
Jacob Appelbaum

The ancients, given a chance to observe today's intelligence and spying brouhaha, would likely assert that the gods are laughing at us, finding hilarious our public attempts at indignation not only over what is being done, but our laughable efforts to pretend that we didn't know about it all along.
Lauren Weinstein

The biological world is also open source in the sense that threats are always present, largely unpredictable, and always changing. Because of this, defensive measures that are perfectly designed for a particular threat leave you vulnerable to other ones. Imagine if our immune system were designed to deal only with a single strain of flu. In fact, our immune system works because it looks for the full spectrum of invaders — low-level viral infections, bacterial parasites, or virulent strains of a pandemic disease. Too often, we create security measures — such as the Department of Homeland Security's BioWatch program — that spend too many resources to deal specifically with a very narrow range of threats on the risk spectrum.
Rafe Sagarin

Comments (7 posted)

An interesting Android package verification vulnerability

Bluebox Security claims to have found a way to modify code contained within an Android application package without breaking the associated cryptographic signature. "All Android applications contain cryptographic signatures, which Android uses to determine if the app is legitimate and to verify that the app hasn’t been tampered with or modified. This vulnerability makes it possible to change an application’s code without affecting the cryptographic signature of the application – essentially allowing a malicious author to trick Android into believing the app is unchanged even if it has been." The problem was evidently disclosed to Google in February; details are promised at the Black Hat USA conference starting July 27.

Comments (2 posted)

New vulnerabilities

ffmpeg: multiple vulnerabilities

Package(s):ffmpeg CVE #(s):CVE-2013-3671 CVE-2013-3672 CVE-2013-3673 CVE-2013-3674
Created:June 27, 2013 Updated:July 3, 2013
Description:

From the Mageia advisory:

* CVE-2013-3671: The format_line function in log.c in libavutil uses inapplicable offset data during a certain category calculation, which allows remote attackers to cause a denial of service (invalid pointer dereference and application crash) via crafted data that triggers a log message.

* CVE-2013-3672: The mm_decode_inter function in mmvideo.c in libavcodec does not validate the relationship between a horizontal coordinate and a width value, which allows remote attackers to cause a denial of service (out-of-bounds array access and application crash) via crafted American Laser Games (ALG) MM Video data.

* CVE-2013-3673: The gif_decode_frame function in gifdec.c in libavcodec does not properly manage the disposal methods of frames, which allows remote attackers to cause a denial of service (out-of-bounds array access and application crash) via crafted GIF data.

* CVE-2013-3674: The cdg_decode_frame function in cdgraphics.c in libavcodec does not validate the presence of non-header data in a buffer, which allows remote attackers to cause a denial of service (out-of-bounds array access and application crash) via crafted CD Graphics Video data.

Alerts:
Gentoo 201310-12 ffmpeg 2013-10-25
Mageia MGASA-2013-0182 ffmpeg 2013-06-26

Comments (none posted)

Foreman: multiple vulnerabilities

Package(s):Foreman CVE #(s):CVE-2013-2113 CVE-2013-2121
Created:June 28, 2013 Updated:July 3, 2013
Description:

From the Red Hat advisory:

A flaw was found in the create method of the Foreman Bookmarks controller. A user with privileges to create a bookmark could use this flaw to execute arbitrary code with the privileges of the user running Foreman, giving them control of the system running Foreman (such as installing new packages) and all systems managed by Foreman. (CVE-2013-2121)

A flaw was found in the way the Foreman UsersController controller handled user creation. A non-admin user with privileges to create non-admin accounts could use this flaw to create admin accounts, giving them control of the system running Foreman (such as installing new packages) and all systems managed by Foreman. (CVE-2013-2113)

Alerts:
Red Hat RHSA-2013:0995-01 Foreman 2013-06-27

Comments (none posted)

openstack-keystone: authentication bypass

Package(s):openstack-keystone CVE #(s):CVE-2013-2157
Created:June 28, 2013 Updated:August 12, 2013
Description:

From the openSUSE bug report:

Jose Castro Leon from CERN reported a vulnerability in the way the Keystone LDAP backend authenticates users. When provided with an empty password, the backend would perform an anonymous LDAP bind that would result in successfully authenticating the user. An attacker could therefore easily impersonate and get valid tokens for any user. Only Keystone setups using LDAP authentication backend are affected.

Alerts:
Fedora FEDORA-2013-10713 openstack-keystone 2013-08-09
Fedora FEDORA-2013-10467 openstack-keystone 2013-07-20
Red Hat RHSA-2013:1083-01 openstack-keystone 2013-07-16
Red Hat RHSA-2013:0994-01 openstack-keystone 2013-06-27
openSUSE openSUSE-SU-2013:1089-1 openstack-keystone 2013-06-27

Comments (none posted)

php-radius: buffer overflow

Package(s):php-radius CVE #(s):CVE-2013-2220
Created:July 3, 2013 Updated:July 26, 2013
Description: From the Mandriva advisory:

Fix a security issue in radius_get_vendor_attr() by enforcing checks of the VSA length field against the buffer size.

Alerts:
Debian DSA-2726-1 php-radius 2013-07-25
Mageia MGASA-2013-0206 php-radius 2013-07-09
Fedora FEDORA-2013-11992 php-pecl-radius 2013-07-09
Fedora FEDORA-2013-11998 php-pecl-radius 2013-07-09
Fedora FEDORA-2013-11911 php-pecl-radius 2013-07-09
Mandriva MDVSA-2013:192 php-radius 2013-07-02

Comments (none posted)

python-keystoneclient: password disclosure

Package(s):python-keystoneclient CVE #(s):CVE-2013-2013
Created:June 28, 2013 Updated:September 18, 2013
Description:

From the openSUSE bug report:

OpenStack keystone places a username and password on the command line, which allows local users to obtain credentials by listing the process.

Alerts:
Fedora FEDORA-2013-13900 python-keystoneclient 2013-08-21
Fedora FEDORA-2013-14302 python-keystoneclient 2013-08-15
Slackware SSA:2013-260-01 glibc 2013-09-17
openSUSE openSUSE-SU-2013:1090-1 python-keystoneclient 2013-06-27

Comments (none posted)

python-keystoneclient: multiple vulnerabilities

Package(s):python-keystoneclient CVE #(s):CVE-2013-2166 CVE-2013-2167
Created:June 28, 2013 Updated:July 3, 2013
Description:

From the Red Hat advisory:

A flaw was found in the way python-keystoneclient handled encrypted data from memcached. Even when the memcache_security_strategy setting in "/etc/swift/proxy-server.conf" was set to ENCRYPT to help prevent tampering, an attacker on the local network, or possibly an unprivileged user in a virtual machine hosted on OpenStack, could use this flaw to bypass intended restrictions and modify data in memcached that will later be used by services utilizing python-keystoneclient (such as Nova, Cinder, Swift, Glance, and so on). (CVE-2013-2166)

A flaw was found in the way python-keystoneclient verified data from memcached. Even when the memcache_security_strategy setting in "/etc/swift/proxy-server.conf" was set to MAC to perform signature checking, an attacker on the local network, or possibly an unprivileged user in a virtual machine hosted on OpenStack, could use this flaw to modify data in memcached that will later pass signature checking in python-keystoneclient. (CVE-2013-2167)

Alerts:
Fedora FEDORA-2013-14302 python-keystoneclient 2013-08-15
Red Hat RHSA-2013:0992-01 python-keystoneclient 2013-06-27

Comments (none posted)

ruby: SSL server spoofing

Package(s):ruby CVE #(s):CVE-2013-4073
Created:June 28, 2013 Updated:August 6, 2013
Description:

From the Ruby advisory:

When a CA a SSL client trusts allows to issue the server certificate that has null byte in subjectAltName, remote attackers can obtain the certificate for ‘www.ruby-lang.org\0.example.com’ from the CA to spoof ‘www.ruby-lang.org’ and do man-in-the-middle between Ruby’s SSL client and SSL servers.

Alerts:
Debian DSA-2809-1 ruby1.8 2013-12-04
Debian DSA-2738-1 ruby1.9.1 2013-08-18
Red Hat RHSA-2013:1137-01 ruby193-ruby 2013-08-05
Mandriva MDVSA-2013:201 ruby 2013-07-26
Red Hat RHSA-2013:1103-01 ruby193-ruby 2013-07-23
CentOS CESA-2013:1090 ruby 2013-07-17
CentOS CESA-2013:1090 ruby 2013-07-17
Scientific Linux SL-ruby-20130717 ruby 2013-07-17
Oracle ELSA-2013-1090 ruby 2013-07-17
Oracle ELSA-2013-1090 ruby 2013-07-17
Red Hat RHSA-2013:1090-01 ruby 2013-07-17
openSUSE openSUSE-SU-2013:1181-1 ruby19 2013-07-11
openSUSE openSUSE-SU-2013:1179-1 ruby19 2013-07-11
Fedora FEDORA-2013-12062 ruby 2013-07-11
Fedora FEDORA-2013-12123 ruby 2013-07-11
Fedora FEDORA-2013-12663 ruby 2013-07-16
openSUSE openSUSE-SU-2013:1186-1 ruby19 2013-07-12
Ubuntu USN-1902-1 ruby1.8, ruby1.9.1 2013-07-09
Slackware SSA:2013-178-01 ruby 2013-06-27

Comments (none posted)

wireshark: two dissector vulnerabilities

Package(s):wireshark CVE #(s):CVE-2013-4079 CVE-2013-4080
Created:June 27, 2013 Updated:September 30, 2013
Description:

From the Mageia advisory:

The GSM CBCH dissector could crash (CVE-2013-4079).

The Assa Abloy R3 dissector could consume excessive memory and CPU (CVE-2013-4080).

Alerts:
Fedora FEDORA-2013-17635 wireshark 2013-12-19
Fedora FEDORA-2013-17661 wireshark 2013-09-28
Gentoo GLSA 201308-05:02 wireshark 2013-08-30
Gentoo 201308-05 wireshark 2013-08-28
Mageia MGASA-2013-0181 wireshark 2013-06-26
Debian-LTS DLA-497-1 wireshark 2016-05-31

Comments (none posted)

wordpress: multiple vulnerabilities

Package(s):wordpress CVE #(s):CVE-2013-2173 CVE-2013-2199 CVE-2013-2200 CVE-2013-2201 CVE-2013-2202 CVE-2013-2203 CVE-2013-2204 CVE-2013-2205
Created:July 2, 2013 Updated:July 3, 2013
Description: From the Mageia advisory:

A denial of service flaw was found in the way Wordpress, a blog tool and publishing platform, performed hash computation when checking password for password protected blog posts. A remote attacker could provide a specially- crafted input that, when processed by the password checking mechanism of Wordpress would lead to excessive CPU consumption (CVE-2013-2173).

Inadequate SSRF protection for HTTP requests where the user can provide a URL can allow for attacks against the intranet and other sites. This is a continuation of work related to CVE-2013-0235, which was specific to SSRF in pingback requests and was fixed in 3.5.1 (CVE-2013-2199).

Inadequate checking of a user's capabilities could allow them to publish posts when their user role should not allow for it; and to assign posts to other authors (CVE-2013-2200).

Inadequate escaping allowed an administrator to trigger a cross-site scripting vulnerability through the uploading of media files and plugins (CVE-2013-2201).

The processing of an oEmbed response is vulnerable to an XXE (CVE-2013-2202).

If the uploads directory is not writable, error message data returned via XHR will include a full path to the directory (CVE-2013-2203).

Content Spoofing in the MoxieCode (TinyMCE) MoxiePlayer project (CVE-2013-2204).

Cross-domain XSS in SWFUpload (CVE-2013-2205).

Alerts:
Fedora FEDORA-2013-11649 wordpress 2013-07-03
Fedora FEDORA-2013-11590 wordpress 2013-07-03
Fedora FEDORA-2013-11630 wordpress 2013-07-03
Debian DSA-2718-1 wordpress 2013-07-02
Mandriva MDVSA-2013:189 wordpress 2013-07-02
Mageia MGASA-2013-0198 wordpress 2013-07-01

Comments (none posted)

xdm: denial of service

Package(s):xdm CVE #(s):CVE-2013-2179
Created:July 2, 2013 Updated:July 3, 2013
Description: From the openSUSE advisory:

xdm was updated on crypt() NULL pointer crashes:
* Starting with glibc 2.17 (eglibc 2.17), crypt() fails with EINVAL (w/ NULL return) if the salt violates specifications. Additionally, on FIPS-140 enabled Linux systems, DES/MD5-encrypted passwords passed to crypt() fail with EPERM (w/ NULL return). If using glibc's crypt(), check return value to avoid a possible NULL pointer dereference.

Alerts:
openSUSE openSUSE-SU-2013:1117-1 xdm 2013-07-02

Comments (none posted)

xen: multiple vulnerabilities

Package(s):xen CVE #(s):CVE-2013-2211 CVE-2013-1432
Created:July 2, 2013 Updated:July 19, 2013
Description: From the Mageia advisory:

CVE-2013-2211: libxl allows guest write access to sensitive console related xenstore keys

CVE-2013-1432: Page reference counting error due to XSA-45/CVE-2013-1918 fixes

Alerts:
Debian DSA-3006-1 xen 2014-08-18
SUSE SUSE-SU-2014:0446-1 Xen 2014-03-25
Gentoo 201309-24 xen 2013-09-27
SUSE SUSE-SU-2013:1314-1 Xen 2013-08-09
CentOS 2013:X003 kernel 2013-07-18
openSUSE openSUSE-SU-2013:1404-1 xen 2013-09-04
Fedora FEDORA-2013-11871 xen 2013-07-07
Fedora FEDORA-2013-11874 xen 2013-07-07
Fedora FEDORA-2013-11785 xen 2013-07-06
Fedora FEDORA-2013-11768 xen 2013-07-06
Fedora FEDORA-2013-11837 xen 2013-07-03
Mageia MGASA-2013-0197 xen 2013-07-01
openSUSE openSUSE-SU-2013:1392-1 xen 2013-08-30

Comments (none posted)

xml-security-c: code execution

Package(s):xml-security-c CVE #(s):CVE-2013-2210
Created:June 28, 2013 Updated:July 3, 2013
Description:

From the Debian advisory:

Jon Erickson of iSIGHT Partners Labs discovered a heap overflow in xml-security-c, an implementation of the XML Digital Security specification. The fix to address CVE-2013-2154 introduced the possibility of a heap overflow in the processing of malformed XPointer expressions in the XML Signature Reference processing code, possibly leading to arbitrary code execution.

Alerts:
Mageia MGASA-2013-0193 xml-security-c 2013-07-01
Debian DSA-2717-1 xml-security-c 2013-06-28

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The 3.10 kernel was released on June 30. Linus's announcement said: "In the bigger picture (ie since 3.9) this release has been pretty typical and not particularly prone to problems, despite my waffling about the exact release date. As usual, the bulk patch-wise is all drivers (pretty much exactly two thirds), while the rest is evenly split between arch updates and 'misc'. No major new subsystems this time around, although there are individual new features." Some of those new features include a number of Ftrace enhancements, the memory pressure notification mechanism, tickless operation, ARM multi-cluster power management support (part of the big.LITTLE solution), the bcache block caching layer, and much more. See the KernelNewbies 3.10 page for lots of details.

The 3.11 merge window is open as of this writing; see the separate article below for a summary of what has been merged so far.

Stable updates: 3.9.8, 3.4.51, and 3.0.84 were released on June 27, 3.2.48 came out on June 30, and 3.9.9, 3.4.52, and 3.0.85 were released on on July 3.

Comments (none posted)

Quotes of the week

Hmm, I bet lockdep and the branch tracer probably don't play well together. They both are bullies, and want to beat up the same kid. The problem is, they want sole access to beat up that kid, and don't want help.
Steven Rostedt

In my defence, it didn't actually say the patch did this. Just that we "can".
Rusty Russell

At this point in the process, I want testers who choose to test. Hapless victim testers come later. Well, other than randconfig testers, but I consider them to be voluntary hapless victims.
Paul McKenney

Comments (none posted)

A flash filesystem tuning guide

Tim Bird has announced the availability of an extensive guide to tuning Linux for flash-based storage devices [PDF]. "This is the culmination of several months of effort, to determine the results of using different tuning options in the Linux kernel, with different filesystems running on flash-based block devices. The document was prepared by Cogent Embedded, and funded by the CE Workgroup of the Linux Foundation. In addition to describing different tuning options available, the document also gives methodologies for measuring performance on the filesystems and has extensive graphs showing the results of the different tuning options."

Full Story (comments: 12)

2013 Kernel Summit call for topics/proposals

The planning process for the 2013 Kernel Summit (October 23-25, Edinburgh) has begun; as in previous years, the program committee is looking for proposals for interesting topics in need of discussion. "The best topics for the kernel summit tend to focus on topics which are not appropriate for any of the subsystem-specific workshops or minisummits, and which can not be easily resolved using the normal e-mail and IRC channels. These include issues about our overall development process, and topics which span multiple subsystems." The deadline for proposals is July 19.

Full Story (comments: none)

Per-CPU reference counts

By Jonathan Corbet
July 3, 2013
Reference counting is used by the kernel to know when a data structure is unused and can be disposed of. Most of the time, reference counts are represented by an atomic_t variable, perhaps wrapped by a structure like a kref. If references are added and removed frequently over an object's lifetime, though, that atomic_t variable can become a performance bottleneck. The 3.11 kernel will include a new per-CPU reference count mechanism designed to improve scalability in such situations.

This mechanism, created by Kent Overstreet, is defined in <linux/percpu-refcount.h>. Typical usage will involve embedding a percpu_ref structure within the data structure being tracked. The counter must be initialized with:

    int percpu_ref_init(struct percpu_ref *ref, percpu_ref_release *release);

Where release() is the function to be called when the reference count drops to zero:

    typedef void (percpu_ref_release)(struct percpu_ref *);

The call to percpu_ref_init() will initialize the reference count to one. References are added and removed with:

    void percpu_ref_get(struct percpu_ref *ref);
    void percpu_ref_put(struct percpu_ref *ref);

These functions operate on a per-CPU array of reference counters, so they will not cause cache-line bouncing across the system. There is one potential problem, though: percpu_ref_put() must determine whether the reference count has dropped to zero and call the release() function if so. Summing an array of per-CPU counters would be expensive, to the point that it would defeat the whole purpose. This problem is avoided with a simple observation: as long as the initial reference is held, the count cannot be zero, so percpu_ref_put() does not bother to check.

The implication is that the thread which calls percpu_ref_init() must indicate when it is dropping its reference; that is done with a call to:

    void percpu_ref_kill(struct percpu_ref *ref);

After this call, the reference count degrades to the usual model with a single shared atomic_t counter; that counter will be decremented and checked whenever a reference is released.

The performance benefits of a per-CPU reference count will clearly only be realized if most of the references to an object are added or removed while the initial reference is held. In practice that is often the case. This mechanism has found an initial use in the control group code; the comments in the header file claim that it is used by the asynchronous I/O code as well, but that is not the case in the current mainline.

Comments (1 posted)

Kernel development news

The 3.11 merge window opens

By Jonathan Corbet
July 3, 2013
Once upon a time, Linus tried to limit merge window activity to roughly 1,000 commits in any given day. On July 2, the day he began pulling changes for 3.11, over 3,000 commits made their way into the mainline. Clearly, a lethargic 1,000 commits/day pace won't cut it in the 3.x era. Expect this to be another busy development cycle. That said, the number of new features merged for 3.11 so far is relatively small. Much of the work pulled to date consists of code cleanups (in the staging tree, for example), reworking of ARM architecture code to use common abstractions, and the removal of board-file support for some ARM subarchitectures.

The user-visible changes that have been pulled so far include:

  • The f2fs filesystem now supports security labels, enabling it to be used with security modules.

  • The Lustre distributed filesystem has been merged into the staging tree. It is disabled in the build system, though, since it has build problems on a number of architectures.

  • The ARM architecture (both 32- and 64-bit) has gained better huge page support, in the form of both the hugetlbfs filesystem and transparent huge pages.

  • The ARM64 architecture now supports virtualization with both KVM and Xen.

  • The new O_TMPFILE option to the open() and openat() system calls allows filesystems to optimize the creation of temporary files — files which need not be visible in the filesystem. When O_TMPFILE is present, the provided pathname is only used to locate the containing directory (and thus the filesystem where the temporary file should be). So, among other things, programs using O_TMPFILE should have fewer concerns about vulnerabilities resulting from symbolic link attacks.

  • New hardware support includes:

    • Systems and processors: Freescale i.MX6 SoloLite processors, Freescale Vybrid VF610 processors, Samsung EXYNOS5420 processors, Rockchip RK2928 and RK3xxx processors, TI Nspire processors, and STMicroelectronics STiH41x and STiH416 processors.

    • Miscellaneous: Marvell EBU device bus controllers, Marvell EBU PCIe controllers, ARM cache-coherent interconnect controllers, Microchip Technology MCP3204/08 analog-to-digital converters, Analog Devices AD7303 digital-to-analog converters, STMicroelectronics LPS331AP pressure sensors, and Samsung S3C24XX SoC pin controllers.

    • Networking: MTK USB Bluetooth interfaces.

    • USB: Faraday FUSBH200 host controllers and Cavium Networks Octeon host controllers.

Changes visible to kernel developers include:

  • There is a new struct file_operations method:

        int (*iterate) (struct file *, struct dir_context *);
    

    Its job is to iterate through the contents of a directory. This method is meant to serve as a replacement for the readdir() method that eliminates persistent race conditions associated with updating the current read position. All internal users have been converted, and the readdir() method has been removed.

  • There are a couple of new functions for working with atomic types:

        int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode);
        void wake_up_atomic_t(atomic_t *p);
    

    A call to wait_on_atomic_t() will block the calling thread until the given val goes to zero. Simply decrementing an atomic_t variable will not be sufficient to wake anybody waiting, though; an explicit call to wake_up_atomic_t() is required to do that.

  • The CONFIG_HOTPLUG configuration option has been removed; all kernels are hotplug enabled these days.

  • The wait/wound mutex locking primitive has been merged.

  • As part of the read-copy-update simplification effort, the "tiny-preempt" version of RCU has been removed from the kernel. From the commit message: "People currently using TINY_PREEMPT_RCU can get much better memory footprint with TINY_RCU, or, if they really need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively minor degradation in memory footprint."

  • The kernel now has the concept of power-efficient workqueues; these are simply marked as "unbound," so that jobs queued to them can run on any CPU in the system. Per-CPU workqueues may perform better in some situations, but they can also cause sleeping CPUs to wake up; that wakeup can be avoided if work items can be run on CPUs that are not sleeping. If the CONFIG_WQ_POWER_EFFICIENT_DEFAULT configuration option is set, a number of workqueues observed to impact power performance will be switched to the unbound mode.

    Kernel code can explicitly request power-efficient behavior by creating workqueues with the WQ_POWER_EFFICIENT flag or by using a couple of new systemwide workqueues: system_power_efficient_wq or system_freezable_power_efficient_wq.

  • The d_hash() and d_compare() methods in struct dentry_operations have lost their inode argument.

  • A new per-CPU reference count mechanism has been added; see this article for details.

A normal two-week merge window could be expected to close on July 16, but Linus has occasionally shortened the merge window in recent development cycles. If the development cycle as a whole lasts for the usual 70 days, then the 3.11 kernel can be expected around September 10.

Comments (3 posted)

Supporting KVM on the ARM architecture

July 3, 2013

This article was contributed by Christoffer Dall and Jason Nieh

One of the new features in the 3.9 kernel is KVM/ARM: KVM support for the ARM architecture. While KVM is already supported on i386 and x86/64, PowerPC, and s390, ARM support required more than just reimplementing the features and styles of the other architectures. The reason is that the ARM virtualization extensions are quite different from those of other architectures.

Historically, the ARM architecture is not virtualizable, because there are a number of sensitive instructions which do not trap when they are executed in an unprivileged mode. However, the most recent 32-bit ARM processors, like the Cortex-A15, include hardware support for virtualization as an ARMv7 architectural extension. A number of research projects have attempted to support virtualization on ARM processors without hardware virtualization support, but they require various levels of paravirtualization and have not been stabilized. KVM/ARM is designed specifically to work on ARM processors with the virtualization extensions enabled to run unmodified guest operating systems.

The ARM hardware extensions differ quite a bit from their x86 counterparts. A simplified view of the ARM CPU modes is that the kernel runs in SVC mode and user space runs in USR mode. ARM introduced a new CPU mode for running hypervisors called HYP mode, which is a more privileged mode than SVC mode. An important characteristic of HYP mode, which is central to the design of KVM/ARM, is that HYP mode is not an extension of SVC mode, but a distinct mode with a separate feature set and a separate virtual memory translation mechanism. For example, if a page fault is taken in HYP mode, the faulting virtual address is stored in a different register in HYP mode than in SVC mode. As another example, for the SVC and USR modes, the hardware has two separate page table base registers, which are used to provide the familiar address space split between user space and kernel. HYP mode only uses a single page table base register and therefore does not allow the address space split between user mode and kernel.

The design of HYP mode is a good fit with a classic bare-metal hypervisor design because such a hypervisor does not reuse any existing kernel code written to work in SVC mode. KVM, however, was designed specifically to reuse existing kernel components and integrate these with the hypervisor. In comparison, the x86 hardware support for virtualization does not provide a new CPU mode, but provides an orthogonal concept known as "root" and "non-root". When running as non-root on x86, the feature set is completely equivalent to a CPU without virtualization support. When running as root on x86, the feature set is extended to add additional features for controlling virtual machines (VMs), but all existing kernel code can run unmodified as both root and non-root. On x86, when a VM traps to the hypervisor, the CPU changes from non-root to root. On ARM, when a VM traps to the hypervisor, the CPU traps to HYP mode.

HYP mode controls virtualization features by configuring sensitive operations to trap to HYP mode when executed in SVC and USR mode; it also allows hypervisors to configure a number of shadow register values used to hide information about the physical hardware from VMs. HYP mode also controls Stage-2 translation, a feature similar to Intel's "extended page table" used to control VM memory access. Normally when an ARM processor issues a load/store instruction, the memory address used in the instruction is translated by the memory management unit (MMU) from a virtual address to a physical address using regular page tables, like this:

  • Virtual Address (VA) -> Intermediate Physical Address (IPA)

The virtualization extensions add an extra stage of translation known as Stage-2 translation which can be enabled and disabled only from HYP mode. When Stage-2 translation is enabled, the MMU translates address in the following way:

  • Stage-1: Virtual Address (VA) -> Intermediate Physical Address (IPA)
  • Stage-2: Intermediate Physical Address (IPA) -> Physical Address (PA)

The guest operating system controls the Stage-1 translation independently of the hypervisor and can change mappings and page tables without trapping to the hypervisor. The Stage-2 translation is controlled by the hypervisor, and a separate Stage-2 page table base register is accessible only from HYP mode. The use of Stage-2 translations allows software running in HYP mode to control access to physical memory in a manner completely transparent to a VM running in SVC or USR mode, because the VM can only access pages that the hypervisor has mapped from an IPA to the page's PA in the Stage-2 page tables.

KVM/ARM design

KVM/ARM is tightly integrated with the kernel and effectively turns the kernel into a first class ARM hypervisor. For KVM/ARM to use the hardware features, the kernel must somehow be able to run code in HYP mode because HYP mode is used to configure the hardware for running a VM, and traps from the VM to the host (KVM/ARM) are taken to HYP mode.

Rewriting the entire kernel to run only in HYP mode is not an option, because it would break compatibility with hardware that doesn't have the virtualization extensions. A HYP-mode-only kernel also would not work when run inside a VM, because the HYP mode would not be available. Support for running both in HYP mode and SVC mode would be much too invasive to the source code, and would potentially slow down critical paths. Additionally, the hardware requirements for the page table layout in HYP mode are different from those in SVC mode in that they mandate the use of LPAE (ARM's Large Physical Address Extension) and require specific bits to be set on the page table entries, which are otherwise clear on the kernel page tables used in SVC mode. So KVM/ARM must manage a separate set of HYP mode page tables and explicitly map in code and data accessed from HYP mode.

We therefore came up with the idea to split execution across multiple CPU modes and run as little code as possible in HYP mode. The code run in HYP mode is limited to a few hundred instructions and isolated to two assembly files: arch/arm/kvm/interrupts.S and arch/arm/kvm/interrupts_head.S.

For readers not familiar with the general KVM architecture, KVM on all architectures works by exposing a simple interface to user space to provide virtualization of core components such as the CPU and memory. Device emulation, along with setup and configuration of VMs, is handled by a user space process, typically QEMU. When such a process decides it is time to run the VM, it will call the KVM_VCPU_RUN ioctl(), which executes VM code natively on the CPU. On ARM, the ioctl() handler in arch/arm/kvm/arm.c switches to HYP mode by issuing an HVC (hypercall) instruction, which changes the CPU mode to HYP mode, context switches all hardware state between the host and the guest, and finally jumps to the VM SVC or USR mode to natively execute guest code. When KVM/ARM runs guest code, it enables Stage-2 memory translation, which completely isolates the address space of VMs from the host and other VMs. The CPU will be executing guest code until the hardware traps to HYP mode, because of a hardware interrupt, a stage-2 page fault, or a sensitive operation. When such a trap occurs, KVM/ARM switches back to the host hardware state and returns to normal KVM/ARM host SVC code with the full kernel mappings available.

When returning from a VM, KVM/ARM examines the reason for the trap, and performs the necessary emulation or resource allocation to allow the VM to resume. For example, if the guest performs a memory-mapped I/O (MMIO) operation to an emulated device, that will generate a Stage-2 page fault, because only physical RAM dedicated to the guest will be mapped in the Stage-2 page tables. KVM/ARM will read special system registers, available only in HYP mode, which contain the address causing the fault and report the address to QEMU through a shared memory-mapped structure between QEMU and the kernel. QEMU knows the memory map of the emulated system and can forward the operation to the appropriate device emulation code. As another example, if a hardware interrupt occurs while the VM is executing, this will trap to HYP mode, and KVM/ARM will switch back in the host state and re-enable interrupts, which will cause the hardware interrupt handlers to execute once again, but this time without trapping to HYP mode. While every hardware interrupt ends up interrupting the CPU twice, the actual trap cost on ARM hardware is negligible compared to the world-switch from the VM to the host.

HYP mode

Providing access to HYP mode from KVM/ARM was a non-trivial challenge, since HYP mode is a more privileged mode than the standard ARM kernel modes and there is no architecturally defined ABI for entering HYP mode from less privileged modes. One option would be to expect bootloaders to either install secure monitor handlers or hypercall handlers that would allow the kernel to trap back into HYP mode, but this method is brittle and error-prone, and prior experience with establishing TrustZone APIs has shown that it is hard to create a standard across different implementations of the ARM architecture.

Instead, Will Deacon, Catalin Marinas, and Ian Jackson proposed that we rely on the kernel being booted in HYP mode if the kernel is going to support KVM/ARM. In version 3.6, a patch series developed by Dave Martin and Marc Zyngier was merged that detects if the kernel is booted in HYP mode and, if so, installs a small stub handler that allows other subsystems like KVM/ARM to take control of HYP mode later on. As it turns out, it is reasonable to recommend that bootloaders always boot the kernel in HYP mode if it is available because even legacy kernels always make an explicit switch to SVC mode at boot time, even though they expect to boot into SVC mode already. Changing bootloaders to simply boot all kernels in HYP mode is therefore backward-compatible with legacy kernels.

Installing the hypervisor stub when the kernel is booted in HYP mode was an interesting implementation challenge. First, ARM kernels are often loaded as a compressed image, with a small uncompressed pre-boot environment known as the "decompressor" which decompresses the kernel image into memory. If the decompressor detects that it is booted in HYP mode, then a temporary stub must be installed at this stage allowing the CPU to fall back to SVC mode to run the decompressor code. The reason is that the decompressor must turn on the MMU to enable caches, but doing so in HYP mode requires support for the LPAE page table format used by HYP mode, which is an unwanted piece of complexity in the decompressor code. Therefore, the decompressor installs the temporary HYP stub, falls back to SVC mode, decompresses the kernel image, and finally, immediately before calling the uncompressed initialization code, switches back to HYP mode again. Then, the uncompressed initialization code will again detect that the CPU is in HYP mode and will install the main HYP stub to be used by kernel modules later in the boot process or after the kernel has finally booted. The HYP stub can be found in arch/arm/kernel/hyp-stub.S. Note that the uncompressed initialization code doesn't care whether the uncompressed code is started directly in HYP mode from a bootloader or from the decompressor.

Because HYP mode is a more privileged mode than SVC mode, the transition from SVC mode to HYP mode occurs only through a hardware trap. Such a trap can be generated by executing the hypercall (HVC) instruction, which will trap into HYP mode and cause the CPU to execute code from a jump entry in the HYP exception vectors. This allows a subsystem to use the hypervisor stub to fully take over control of HYP mode, because the hypervisor stub allows subsystems to change the location of the exception vectors. The HYP stub is called through the __hyp_set_vectors() function, which takes the physical address of the HYP exception vector as its only parameter, and replaces the HYP Vector Base Address Register (HVBAR) with that address. When KVM/ARM is initialized during normal kernel boot (after all main kernel initialization functions have run), it creates an identity mapping (one-to-one mapping of virtual addresses to physical address) of the HYP mode initialization code, which includes an exception vector, and sets the physical address of using the __hyp_set_vectors() function. Further, the KVM/ARM initialization code calls the HVC instruction to run the identity-mapped initialization code, which can safely enable the MMU, because the code is identity mapped.

Finally, KVM/ARM initialization sets up the HVBAR to point to the main KVM/ARM HYP exception handling code, now using the virtual addresses for HYP mode. Since HYP mode has its own address space, KVM/ARM must choose an appropriate virtual address for any code or data, which is mapped into HYP mode. For convenience and clarity, the kernel virtual addresses are reused for pages mapped into HYP mode, making it possible to dereference structure members directly as long as all relevant data structures are mapped into HYP mode.

Both traps from sensitive operations in VMs and hypercalls from the host kernel enter HYP mode through an exception on the CPU. Instead of changing the HYP exception vector on every switch between the host and the guest, a single HYP exception vector is used to handle both HVC calls from the host kernel and to handle traps from the VM. The HYP vector handling code checks the VMID field on the Stage-2 page table base register, and VMID 0 is reserved for the host. This field is only accessible from HYP mode and guests are therefore prevented from escalating privilege. We introduced the kvm_call_hyp() function, which can be used to execute code in HYP mode from KVM/ARM. For example, KVM/ARM code running in SVC mode can make the following call to invalidate TLB entries, which must be done from HYP mode:

    kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);

Virtual GIC and timers

ARMv7 architectures with hardware virtualization support also include virtualization support for timers and the interrupt controller. Marc Zyngier implemented support for these features, which are called "generic timers" (a.k.a. architected timers) and the Virtual Generic Interrupt Controller (VGIC).

Traditionally, timer operations on ARM systems have been MMIO operations to dedicated timer devices. Such MMIO operations performed by VMs would trap to QEMU, which would involve a world-switch from the VM to host kernel, and a switch from the host kernel to user space for every read of the time counter or every time a timer needed to be programmed. Of course, the timer functionality could be emulated inside the kernel, but this would require a trap from the VM to the host kernel, and would therefore add substantial overhead to VMs compared to running on native hardware. Reading the counter is a very frequent operation in Linux. For example, every time a task is enqueued or dequeued in the scheduler, the runqueue clock is updated, and in particular multi-process workloads like Apache benchmarks clearly show the overhead of trapping on each counter read.

ARMv7 allows for an optional extension to the architecture, the generic timers, which makes counter and timer operations part of the core architecture. Now, reading a counter or programming a timer is done using coprocessor register accesses on the core itself, and the generic timers provide two sets of timers and counters: the physical and the virtual. The virtual counter and timer are always available, but access to the physical counter and timer can be limited through control registers accessible only in HYP mode. If the kernel is booted in HYP mode, it is configured to use the physical timers; otherwise the kernel uses the virtual timers, allowing both an unmodified kernel to program timers when running inside a VM without trapping to the host, and providing the necessary isolation of the host from VMs.

If a VM programs a virtual timer, but is preempted before the virtual timer fires, KVM/ARM reads the timer settings to figure out the remaining time on the timer, and programs a corresponding soft timer in the kernel. When the soft timer expires, the timer handler routine injects the timer interrupt back into the VM. If the VM is scheduled before the soft timer expires, the virtual timer hardware is re-programmed to fire when the VM is running.

The role of an interrupt controller is to receive interrupts from devices and forward them to one or more CPUs. ARM's Generic Interrupt Controller (GIC) provides a "distributor" which is the core logic of the GIC and several CPU interfaces. The GIC allows CPUs to mask certain interrupts, assign priority, or set affinity for certain interrupts to certain CPUs. Finally, a CPU also uses the GIC to send inter-processor interrupts (IPIs) from one CPU core to another and is the underlying mechanism for SMP cross calls on ARM.

Typically, when the GIC raises an interrupt to a CPU, the CPU will acknowledge the interrupt to the GIC, interact with the interrupting device, signal end-of-interrupt (EOI) to the GIC, and resume normal operation. Both acknowledging and EOI-signaling interrupts are privileged operations that will trap when executed from within a VM, adding performance overhead to common operations. The hardware support for virtualization in the VGIC comes in the form of a virtual CPU interface that CPUs can query to acknowledge and EOI virtual interrupts without trapping to the host. The hardware support further provides a virtual control interface to the VGIC, which is accessed only by KVM/ARM, and is used to program virtual interrupts generated from virtual devices (typically emulated by QEMU) to the VGIC.

Since access to the distributor is typically not a common operation, the hardware does not provide a virtual distributor, so KVM/ARM provides in-kernel GIC distributor emulation code as part of the support for VGIC. The result is that VMs can acknowledge and EOI virtual interrupts directly without trapping to the host. Actual hardware interrupts received during VM execution always trap to HYP mode, and KVM/ARM lets the kernel's standard ISRs handle the interrupt as usual, so the host remains in complete control of the physical hardware.

There is no mechanism in the VGIC or generic timers to let the hardware directly inject physical interrupts from the virtual timers as virtual interrupts to the VMs. Therefore, VM timer interrupts will trap as any other hardware interrupt, and KVM/ARM registers a handler for the virtual timer interrupt and injects a corresponding virtual timer interrupt using software when the handler function is called from the ISR.

Results

During the development of KVM/ARM, we continuously measured the virtualization overhead and ran long-running workloads to test stability and measure performance. We have used various kernel configurations and user space environments (both ARM and Thumb-2) for both the host and the guest, and validated our workloads with SMP and UP guests. Some workloads have run for several weeks at a time without crashing, and the system behaves as expected when exposed to extreme memory pressure or CPU over-subscription. We therefore feel that the implementation is stable and encourage users to try and use the system.

Our measurements using both micro and macro benchmarks show that the overhead of KVM/ARM is within 10% of native performance on multicore platforms for balanced workloads. Purely CPU-bound workloads perform almost at native speed. The relative overhead of KVM/ARM is comparable to KVM on x86. For some macro workloads, like Apache and MySQL, KVM/ARM even has less overhead than on x86 using the same configuration. A significant source of this improved performance can be attributed to the optimized path for IPIs and thereby process rescheduling caused by the VGIC and generic timers hardware support.

Status and future work

KVM/ARM started as a research project at Columbia University and was later supported by Virtual Open Systems. After the 3.9 merge, KVM/ARM continues to be maintained by the original author of the code, Christoffer Dall, and the ARMv8 (64-bit) port is maintained by Marc Zyngier. QEMU system support for ARMv7 has been merged upstream in QEMU, and kvmtool also has support for KVM/ARM on both ARMv7 and ARMv8. ARMv8 support is scheduled to be merged for the 3.11 kernel release.

Linaro is supporting a number of efforts to make KVM/ARM itself feature complete, which involves debugging and full migration features including migration of the in-kernel support for the VGIC and the generic timers. Additionally, virtio has so far relied on a PCI backend in QEMU and the kernel, but a significant amount of work has already been merged upstream to refactor the QEMU source code concerning virtio to allow better support for MMIO-based virtio devices to accelerate virtual network and block devices. The remaining work is currently a priority for Linaro, as is support for the mach-virt ARM machine definition, which is a simple machine model designed to be used for virtual machines and is based only on virtio devices. Finally, Linaro is also working on ARMv8 support in QEMU, which will also take advantage of mach-virt and virtio support.

Conclusion

KVM/ARM is already used heavily in production by the SUSE Open Build Service on Arndale boards, and we can only speculate about its future uses in the green data center of the future, as the hypervisor of choice for ARM-based networking equipment, or even ARM-based laptops and desktops.

For more information, help on how to run KVM/ARM on your specific board or SoC, or to participate in KVM/ARM development, the kvmarm mailing list is a good place to start.

Comments (12 posted)

When the kernel ABI has to change

By Jonathan Corbet
July 2, 2013
Maintaining user-space ABI compatibility is one of the key guiding principles of Linux kernel development; changes that break user space are likely to be reverted quickly, often after an incendiary message from Linus. But what is to be done in cases where an ABI is deemed to be unworkable and unmaintainable? Control group maintainer Tejun Heo is trying to solve that problem, but, in the process, he is running into opposition from one of Linux's highest-profile users.

Control groups ("cgroups") allow an administrator to divide the processes in a system into a hierarchy of groups; this hierarchy need not match the process tree. The grouping function alone is useful; systemd uses it to keep track of all of the processes involved with a given service, for example. But the real purpose of control groups is to allow resource control policies to be applied to the processes within each group; to that end, the kernel contains a range of "controllers" that enforce policies on CPU time, block I/O bandwidth, memory usage, and more. Control groups are managed with a virtual filesystem exported by the kernel; see Documentation/cgroups/cgroups.txt for a thorough (if slightly dated) description of how this subsystem works.

The trouble with control groups

There is no doubt that the functionality provided by control groups is both extensive and flexible. Indeed, part of the problem is that it is too flexible. Consider, for example, the support for multiple hierarchies in the control group subsystem. Cgroups allow the creation of a hierarchy of processes to be used in dividing up a limited resource, such as available CPU time. But they allow the creation of an entirely different hierarchy for the control of a different resource. Thus, for example, CPU time could be placed under a policy that favors certain users over others, while memory use could, instead, be regulated depending on what program a process is running. Processes can be grouped in entirely different ways in each hierarchy.

The problem here is that, while the design allowing each controller to have its own hierarchy seems nice and orthogonal, the implementation cannot be that way. The controllers for memory usage, I/O bandwidth, and writeback throttling all look independent on the surface, but those problems are all intertwined in the memory management system in the kernel. All three of those controllers will need to associate pages of memory with specific control groups; if a given process is in one cgroup from the memory controller's point of view, but a different cgroup for the I/O bandwidth controller, that tracking quickly becomes difficult or impossible. It is easy to set up policies that conflict or that simply cannot be properly implemented within the kernel.

Another perceived problem is that the virtual filesystem interface is too low-level, exposing too many details of how control groups are implemented in the kernel. As the number of users of control groups grows, it will become increasingly hard to make changes without breaking existing applications. It's not clear what the correct cgroup interface should be, but those who spend enough time looking at the current implementation tend to come away convinced that changes are needed.

This problem is aggravated by an increasing tendency to use file permissions to hand subtrees of a cgroup hierarchy over to unprivileged processes. There are legitimate reasons to want to delegate authority in that way; complex applications may want to use cgroups to implement their own internal policies, for example. There are also use cases associated with virtualization and containers. But that delegation greatly increases the number of programs with an intimate understanding of how cgroups work, complicating any future changes. There are also any number of security issues that come with unprivileged access to a cgroup hierarchy; it is trivially easy to run denial-of-service attacks against a system if one has write access to a cgroup hierarchy. In short, the interface was just never meant to be used in this way.

For these reasons and more, there is a strong desire to rework the cgroup interface into something that is more maintainable, more secure, and easier to use. Getting there, though, is likely to be a long and painful process, as can be seen by the early discussions around the subject.

The solution and its discontents

The plan for control groups can be described in relatively few words; the resulting discussion, instead, is rather more verbose. Multiple hierarchies are seen to be misconceived and unmaintainable on their face; the plan is to phase out that functionality so that, in the end, all controllers are attached to a single, unified hierarchy of processes. Unprivileged access to the cgroup hierarchy will be strongly discouraged; the hope is to have a single, privileged process handling all of the cgroup management tasks. That process will, in turn, provide some sort of higher-level interface to the rest of the system.

Tim Hockin is charged with making Google's massive cluster of machines work properly for a wide variety of internal users. Google uses cgroups extensively for internal resource management; more to the point, the company also makes extensive use of multiple hierarchies. So, needless to say, Tim is not at all pleased with the prospect of that functionality going away. As he put it:

So yeah, I'm in a bit of a panic. You're making a huge amount of work for us. You're breaking binary compatibility of the (probably) largest single installation of Linux in the world. And you're being kind of flip about the reality of it...

The kernel's ABI rules have not been suspended for control groups Part of the reason for Tim's panic is that he was under the impression that the existing functionality would be removed within a year or two. That is decidedly not the case; the kernel's ABI rules have not been suspended for control groups. The plan is to add a new control interface, and any new features will probably only work with that new interface, but the existing interface, including multiple hierarchies, will continue to be supported until it's clear that it is no longer being used.

Tim described, in general terms, how Google uses multiple hierarchies. Essentially, every job in the system has two attributes: whether it's a production or "batch" job, and whether it gets I/O bandwidth guarantees. The result is a 2x2 matrix describing resource allocation policies (though one of the entries — batch processes with I/O guarantees, makes little sense and is not used). Using two independent cgroup hierarchies makes this set of policies relatively easy to express; Tim asserts that a unified hierarchy would not be usable in the same way.

Tejun was unimpressed, responding that this case could be managed by setting up three cgroups at the same level of the hierarchy, each of which would implement one of the three useful policy combinations. The problem with this solution, according to Tim, is that the processes without I/O bandwidth guarantees would be split into two groups, whereas in the current solution they are in one group. If one of those two groups has far more members than the other, the members of that larger group will get far less of the available bandwidth than the members of the small group. Tejun still thinks that the problem should be solvable, perhaps with the use of a user-space management daemon that would adjust the relative bandwidth allocations depending on the workload. Tim has answered that the situation is actually a lot more complicated, but he has not yet shared the details of how, so it is hard to understand what the real difficulties with a single hierarchy are.

A single management process?

Tim also dislikes the plan to have a single process managing the control group hierarchy. That process could be made to provide the functionality that Google (along with others) needs, though there are performance concerns associated with adding a process in the middle. But Tim was not alone in being concerned by this message from Lennart Poettering on the nature of that single process:

This hierarchy becomes private property of systemd. systemd will set it up. Systemd will maintain it. Systemd will rearrange it. Other software that wants to make use of cgroups can do so only through systemd's APIs.

Google does not currently run systemd and is not thrilled by the prospect of having to switch to be able to make use of cgroup functionality. So Tim responded that "If systemd is the only upstream implementation of this single-agent idea, we will have to invent our own, and continue to diverge rather than converge." There is no particular judgment against systemd implied by that position; it is simply that making that switch would affect a whole lot of things beyond cgroups, and that is more than Google feels like it would want to take on at the moment. But, in general, it would not be surprising if, in the long term, some users remain opposed to the idea of systemd as the only interface to cgroups. That suggests that we will be seeing competing implementations of the cgroup management daemon concept.

One of those alternatives may be about to come into view; Serge Hallyn confessed that he is working on a cgroup management daemon of his own. In some situations, a separate daemon might meet a lot of needs, but Lennart was clear that he would never have systemd defer to such a daemon. His position — not an entirely unreasonable one — is that the init process, as the creator of all other processes in the system, should not be dependent on any other process for its normal operation. He also seems to feel that it would not be possible to put the cgroup management code into a library that could be used in multiple places. So we are likely to see multiple implementations of this functionality in use before this story is done. That, in turn, could create headaches for developers of applications that need to interface with the cgroup subsystem.

The discussion, thus far, seems to have changed few minds. But Tejun has made it clear that he doesn't intend to just ignore complaints from users:

While the bar to overcome is pretty high, I do want to learn about the problems you guys are foreseeing, so that I can at least evaluate the graveness properly and hopefully compromises which can mitigate the most sore ones can be made wherever necessary.

He also acknowledged the biggest problem faced by the development community: despite having accumulated some experience on wrong ways to solve the problem, nobody really knows what the right solution is. More mistakes are almost certain, so it's too soon to try to settle on final solutions.

In the early years of Linux, most of the ABIs implemented by the kernel were specified by groups like POSIX or by prior implementation in other kernels. That made the ABI design problem mostly go away; it was just a matter of doing what had already been done before. For current problems, though, there are rather fewer places to look for guidance, so we are having to figure out the best designs as we go. Mistakes are certain to happen in such a setting. So we are going to have to get better at learning from those mistakes, coming up with better designs, and moving to them without causing misery for our users. The control group transition is likely to set a lot of precedents regarding how these changes should (or should not) be handled in the future.

Comments (35 posted)

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Virtualization and containers

Miscellaneous

  • Lucas De Marchi: kmod 14 . (July 3, 2013)

Page editor: Jonathan Corbet

Distributions

Fedora 19 and Apple hardware

By Jonathan Corbet
July 3, 2013
The Fedora 19 release brings a lot of goodies for Fedora users, but there is one class of users that may be a bit less happy: those who want to run Fedora on an Apple Mac system in a dual-boot configuration with OS X. A late bug in the Anaconda installer makes the creation of such systems nearly impossible. One might wonder why Fedora 19 shipped with this kind of problem; a look at the reasons gives a few insights into how the Fedora release process works.

The decision to proceed with the Fedora 19 release was announced on June 27. Unfortunately, bug #979205 had been filed shortly before. The installer fails to create the needed partitions for a dual-boot system on an OS X machine, causing the installation to fail. As Matthew Garrett put it when calling attention to the problem: "This is rather frustrating, since Fedora's the only distribution with any significant support for running on Apple hardware." A glance at any Linux-related conference will show that Apple systems are popular among developers; it seems a bit strange that a distribution that has put significant effort into working on that hardware would ship with a known problem of this nature. The explanation for what happened involves a number of separate issues.

The first is that the bug was introduced very late in the development cycle; according to Adam Williamson, it went into Anaconda 19.30.10, which first saw wide testing in the RC1 release on June 25. Naturally, the patch that caused the problem was a response to another bug; even so, the patch was the subject of some discussion before being merged into the otherwise-frozen Anaconda source. In the end, the patch was deemed to be sufficiently low-risk to be accepted — a judgment which, like many, is easy to criticize after the fact. At the time, though, it looked like a way to fix a known problem in the release.

The new code took several days to find its way into a build that would see wider testing; it was committed on a Thursday, and the build did not happen until after the following weekend. That left a period of about two days between the bug's general availability and the Fedora 19 go/no-go decision — not very long for an installation-time issue to surface. Some participants have suggested that, in the future, the time between an RC release and the go/no-go decision should be lengthened to increase the chances of catching a last-minute problem. But that probably would not have helped in this case.

The fact that the Fedora quality-assurance team only appears to have a single Mac system, and that they don't test it for dual-boot installations, also did not help. There was a clear hole in the QA net that this problem slipped through. One might argue that this does not necessarily indicate a problem: as Chris Murphy pointed out, Macs are not officially supported by the distribution. So it is not surprising that the testing resources available are unable to catch every problem. It also means that, even if the problem had been found before the go/no-go decision, it would not have been entitled to "blocker" status and, thus, might not have affected that decision.

While not saying that the release should have been delayed to fix this problem, Matthew did question one interesting bit of Fedora policy: once the go/no-go decision has been made in the "go" direction, the process becomes unstoppable. That means that, even if this bug were deemed to have a "blocker" level of severity, it still would not have blocked the release. Kevin Fenzi defended this policy, describing the long series of events that starts to unfold once the decision to make the release has been made. The explanation was not satisfying to everybody, but the policy exists and doesn't appear to be subject to change.

So Fedora 19 simply will not install properly in a dual-boot OS X configuration without a lot of extra work. And things are likely to stay that way; an installer problem cannot be fixed through the normal Fedora update process. There was some talk of a 19.1 release, but, as Kevin put it, "We are currently pretty unsetup for any kind of point releases." So this problem is likely to remain in the official Fedora distribution until Fedora 20. Not an ideal outcome by any means, but one that may have been hard to avoid.

Comments (19 posted)

Brief items

Distribution quote of the week

Maintainers shouldn't have to do the work to support any configuration they're not comfortable testing/etc, but if somebody else comes along to do it for them, the solution is cooperation, not revert wars.
-- Rich Freeman

Comments (none posted)

DoudouLinux 2.0 "Hyperborea" released

Version 2.0 of the DoudouLinux educational distribution is out. "But DoudouLinux is not just a CD/DVD of educative stuffs for children. DoudouLinux is now a vast project on its own. We have published with version 2.0 a manifesto that defines the philosophy and the ethics of our project: we want our children be able to fully master the digital world they are going to live in, instead of undergoing it. As a result we now feel very concerned about user privacy, especially when it comes to children." LWN looked at DoudouLinux in 2011.

Full Story (comments: none)

Fedora 19 released

The Fedora 19 release is now available. As usual, this release offers a lot of new features; see the announcement or the release notes for details.

Update: the Fedora 19 for ARM release is also available. The Fedora ARM team is clearly getting up to speed and is now able to offer releases on the same day as the primary architectures.

Full Story (comments: 72)

Announcing the release of Fedora 19 for Power

The Fedora Secondary Arch Team for Power has announced the release of Fedora 19 for Power architecture.

Full Story (comments: none)

GNU Linux-libre 3.10-gnu

GNU Linux-libre 3.10-gnu is out. "No big deblobbing news for this one: a handful of new drivers that requested blobs had to be deblobbed, a few others had to be updated because of new blob requests."

Full Story (comments: none)

13.10 (Saucy Salamander) Alpha 1 available for some flavors

Kubuntu, Lubuntu, Ubuntu GNOME, and UbuntuKylin have released a first alpha version of 13.10 "Saucy Salamander".

Full Story (comments: none)

Distribution News

Fedora

Cooperative Bug Isolation for Fedora 19 x86_64 and i386

The Cooperative Bug Isolation Project (CBI) is now available for Fedora 19. CBI is a research effort designed to find out what went wrong when software crashes. Download CBI packages and help squash bugs.

Full Story (comments: none)

Reminder: Fedora 17 end of life

Fedora 17 will reach its end-of-life on July 30, 2013. No further updates will be available after that time.

Full Story (comments: 1)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Kuhn: Berkeley DB 6.0 license change and Debian

Oracle recently changed the Berkeley DB license to AGPLv3 prompting a discussion on the Debian lists about possible conflicts between GPLv2 licensed software in Debian and the new AGPLv3 BDB. Bradley Kuhn sent an email to the Debian-legal mailing list with his point of view. "I know that some have complained that compliance with AGPLv3 may require more work by Debian redistributors. That is a reasonable concern, but I think the issue can be mitigated. The argument is roughly analogous to this one: complying with GPLv2 is more difficult than complying with the Apache license. But, unless Debian wants to take a wholesale position opposed to copyleft, I don't think this issue is or should be considered insurmountable."

Full Story (comments: 29)

Riddell: Kubuntu Won't be Switching to Mir or XMir

Jonathan Riddell has announced that the Kubuntu distribution will not be following Ubuntu in its switch to the Mir display server. "Here at Kubuntu we still want to work as part of the community development, taking the fine software from KDE and other upstreams and putting it on computers worldwide. So when Ubuntu Desktop gets switched to Mir we won't be following. We'll be staying with X on the images for our 13.10 release now in development and the 14.04LTS release next year. After that we hope to switch to Wayland which is what KDE and every other Linux distro hopes to do."

Comments (153 posted)

Zacchiroli: all Debian source are belong to us

Stefano Zacchiroli has announced the sources.debian.net (sources.d.n) web site, which hosts the source code for Debian packages. "Via sources.d.n you can therefore browse the content of Debian source packages with usual code viewing features like syntax highlighting. More interestingly, you can search through the source code (of unstable only, though) via integration with http://codesearch.debian.net. You can also use sources.d.n programmatically to query available versions or link to specific lines, with the possibility of adding contextual pop-up messages (example)."

Comments (21 posted)

Page editor: Rebecca Sobol

Development

FOSSology gains SPDX support

By Nathan Willis
July 3, 2013

A new release of the FOSSology source-code analysis tool is out. Although there have been minor updates, this is the first update in 2013 to bring additional functionality. The 2.0 release in 2012 marked a major shift for the project, debuting a new, more modular design and paving the way for faster releases. The newest update, version 2.2.0, includes a new permissions scheme and some usability improvements, but in the long run, the most notable feature in this release may be the improved compatibility with the Software Package Data Exchange (SPDX) standard for tracking software components, licenses, and copyrights.

FOSSology is designed to be a flexible platform for analyzing source code, but it is best known for its ability to scan large collections of files and pick out licenses and copyright statements. The resulting license and copyright information is then used to help an organization stay in compliance with the licensing requirements it inherits from upstream open source projects. However, there are other use cases—for instance, at LinuxCon Japan, Armijn Hemel mentioned using FOSSology to help automate the process for finding license violations in the source code of software shipped in embedded Linux devices. It is not hard to imagine the tool being adapted for other source-scanning tasks, such as to assemble a list of contributors needed to sign off on license change.

Users can upload source packages to FOSSology, then queue scanning jobs that analyze the packages for various types of information handled by scanning "agents." As new code is added, components are updated, and trees are rearranged, these scans can be run periodically, to help check for problematic license combinations or missing information. The basic agents available include a license recognizer, a copyright recognizer, a MIME-type analyzer, and a package header parser (which looks for the packaging information defined for RPM or .deb files). However, users can write their own agents to scan for arbitrary information.

All of the agents work by matching text patterns, which is a tricky business, considering all of the ways a licensing statement could be phrased, and the wide assortment of licenses that may be encountered. FOSSology defines 600 or so at the moment. Although they are sometimes less critical from a legal-compliance standpoint, recognizing copyright statements is also a pattern-matching game; FOSSology looks for text blocks that resemble copyright statements, as well as for email addresses and URLs.

Historically, FOSSology has been deployed on a web server backed by a PostgreSQL database, with multiple users uploading source code bundles and performing scans through the web UI. In October 2012, version 2.1.0 added a pair of command-line utilities, fo_nomos_license_list and fo_copyright_list, with which users could query the FOSSology database for license or copyright information. The command-line utilities free up users from the web UI, plus they make the FOSSology repository more accessible to scripting, and they are reported to run faster. Execution speed can be a major issue with large repositories, where a scan run in the web UI could time out if it took too long. But in the 2.1.0 release the tools were pretty limited in scope, since both required scanning an entire upload (that is, one package or source archive). The 2.2.0 release updates the utilities to accept a sub-tree as the starting node from which to perform a scan.

Version 2.2.0 also introduces a new permissions scheme that allows administrators to limit access to specific files on a per-file and per-user basis. The system implements its own set of internal user groups (i.e., separate from the Unix groups that may be associated with accounts); each user in a group can be granted read permission, write permission, and user/group-administration permission. The ability to upload source packages to the application is governed by a separate permission table, perm_upload, which grants upload permission for each folder to specific groups; each user gets his or her own group by default, which enables per-user upload restrictions. It is a fairly straightforward system, but it replaces the permission system used in previous releases (which bound permissions to each individual application plugin), so administrators may have to do some work migrating existing installations.

Licenses galore

There are, naturally, the usual collection of bugfixes and stability improvements in this release, plus the noteworthy addition of the ability to pull up the full text of a software license from within FOSSology itself (useful for those rare users who do not have the differences between GFDL v1.1 and GFDL v1.2 memorized, no doubt).

But the bigger news item on the license-presentation front is the fact that FOSSology has migrated its list of license names to be compatible with the canonical list supported by SPDX. The SPDX project is a relatively new effort (dating back to 2010); it defines a metadata format for describing the "bill of materials" of a software package, including everything from its creator and definitive name to its URL of origin and file checksums. In the list of mandatory items, as one might guess, is the "concluded license" that governs the package as a whole. SPDX is meant to be both human-readable and machine-parsable (RDF is the preferred file format), so the specification includes a list of open source licenses.

SPDX is also in use by a few other source analysis tools, such as the Ninka scanner and the commercial tools used by Black Duck Software. The specification is written by a Linux Foundation workgroup, which is currently drafting a new revision.

What SPDX support brings with it is the ability to use FOSSology data in conjunction with other tools based on sharing a common file format. The license-compliance problem is no longer one that organizations can ignore. Last week, Harald Welte won a GPL infringement case in Germany in which the court held that the violator had to ascertain on its own that it was in compliance with the licensing requirements it inherited from upstream suppliers. In other words, even if a device maker contracts out the software to a third party, it is still required to verify that the source code it offers in compliance with the GPL actually corresponds to the software on the device. For a device maker that does not do development itself, that could be a tricky undertaking. But with independent tools able to report licensing information in a compatible format, the problem becomes easier (although still not trivial) to solve.

For its part, FOSSology has adopted SPDX's names for the licenses already on its list of recognized licenses, and the 2.2.0 release notes comment that the application also added support for a few SPDX licenses not previously recognized by its license agent. FOSSology is most certainly a specialist's tool at this stage, but the refactoring that went into the 2.x series may make it useful for a wider variety of applications, if developers write scanning modules of their own to look for interesting nuggets buried in the source code. There was a one-year wait between version 1.4 and 2.0, but in the year since, the project has picked up the pace and delivered two stable releases with functional additions. Hopefully, that signals a platform that more developers will wish to contribute to. After all, the free software community is (justifiably) nitpicky where licenses and copyrights are concerned, but there are far more potentially useful bits of information to glean from a corpus of source code, given the proper tool to find them.

Comments (1 posted)

Brief items

Quote of the week

A friend asked yesterday if I knew of a tool to print a web page as a single-page PDF, i.e., making the PDF page as tall as necessary to keep everything on one page.

As a result, I know that Obnam's bug list is 4915 mm tall.

Lars Wirzenius

We want to thank all our loyal fans.
Google, after shutting down Google Reader.

Comments (none posted)

Qt 5.1 released

Version 5.1 of the Qt toolkit has been announced. "We have added many new modules that largely extend the functionality offered in 5.0. The new Qt Quick Controls and Qt Quick Layouts modules finally offer ‘widgets’ for Qt Quick. They contain a set of fully functional controls and layout items that greatly simplify the creation of Qt Quick based user interfaces."

Comments (1 posted)

Upstart 1.9 released

Version 1.9 of the Upstart init-replacement has been released. This version adds support for AppArmor through two new stanzas, adds a stateful re-exec, and allows inherited environment variables to be un-set for Session inits. In addition, a new D-Bus signal bridge has been added, as has a client library (libupstart) through which applications can communicate with Upstart.

Full Story (comments: none)

systemd 205 available

A new version of the systemd init-replacement has been released. Version 205 includes "a number of major new concepts, such as transient units, scopes and slices, which turn systemd into something that is far more dynamic than it ever was," a new systemd-run tool, and is the first release in which systemd assumes management of control groups (cgroups).

Full Story (comments: none)

GNUstep Objective-C Runtime 1.7 available

Version 1.7 of the GNUstep Objective-C runtime has been released. Changes include the move to a CMake-based build systems, a CTest-based test suite, and significant improvements in property introspection. The test suite itself has also been improved, as has integration with libdispatch and with foreign exceptions (e.g., exceptions from C++). Finally, MIPS64 is now supported in the assembly routines.

Full Story (comments: none)

Rust 0.7 released

Version 0.7 of the Rust language is out. "This release had a markedly different focus from previous releases, with fewer language changes and many improvements to the standard library. The highlights this time include a rewrite of the borrow checker that makes working with borrowed pointers significantly easier and a comprehensive new iterator module (std::iterator) that will eventually replace the previous closure-based iterators." See the release notes for details.

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

Akademy 2013 Keynote: Jolla's Vesa-Matti Hartikainen (KDE.News)

KDE.News has an interview with Jolla engineer Vesa-Matti Hartikainen who will be giving a keynote at KDE's Akademy conference in mid-July. The interview covers various topics, from the history of Jolla (and how it came out of MeeGo and the N9 Nokia phone efforts) to the use of Qt in Jolla's Sailfish OS. "For Jolla, Qt is a first class citizen. For developing apps using QML, we put a lot of effort into making them as good as possible. We have an awesome team working on the Sailfish Silica component set. It includes many of the original core developers of the QML language and runtime. And we have really experienced app developers from N9 and other Nokia projects. On the middleware level, a lot of the lower level APIs now have quite good Qt bindings for C++ developers."

Comments (none posted)

Swift: The Easy Scripting Language for Parallel Computing (Linux.com)

Linux.com introduces the Swift parallel scripting language. It may offer some assistance in solving the parallel programming problems noted by Andreas Olofsson in his keynote at this year's Linux Foundation Collaboration Summit. "Swift plays a simple but 'pervasively parallel' coordination role to create the upper level logic of more complex applications, [Argonne National Laboratory and the University of Chicago's Michael] Wilde said. 'It makes it very easy to parallelize what we often call the "outer loops".' Highly parallel applications can thus be composed by gluing together serial algorithms because Swift creates the parallelism automatically at runtime, without explicit direction from the programmer. It does this by first encapsulating the applications that are called within a script as 'functions' with uniform interfaces, and then applying automatic data flow, he said."

Comments (4 posted)

Nemeth: How Google pulled the plug on the public Jabber Network

At his blog, Adam Nemeth has harsh words to share about Google's recent decision to move away from the XMPP instant messaging protocol. Specifically, he criticizes XMPP itself: "Jabber failed to provide good enough spam protection, failed to provide a scalable protocol, failed to provide easy transfer of accounts between providers (if I change e-mail address, I don't have to re-add all my friends, it's enough to set a simple forward or inbox pulling - that's not true for Jabber IDs!)." The result, he argues, was that client application developers never found the protocol all that compelling.

Comments (94 posted)

Heilmann: The Fox is out of the bag #FirefoxOS

On his blog, FirefoxOS developer Christian Heilmann reflects on why he is excited about the phone operating system in light of the announcement of the first FirefoxOS smartphones. One of five things he highlights: "FirefoxOS does not assume a fast, stable and always available connection. When traveling I start hating my Android phone which I love to bits otherwise. Having dozens of megabyte updates over roaming is out of the question and neither is using flaky and slow wireless connections. Firefox OS has no native apps – all of them, including the system apps are written in HTML, CSS and JavaScript. Thus they are much smaller and can have atomic updates instead of having to be replaced as a unit every single time."

Comments (46 posted)

Page editor: Nathan Willis

Announcements

Brief items

AMD joins The Document Foundation advisory board

The Document Foundation has announced that AMD has joined its advisory board and will be working to support the acceleration of LibreOffice on its processors using the HSA architecture. "HSA is an innovative computing architecture that enables CPU, GPU and other processors to work together in harmony on a single piece of silicon by seamlessly moving the right tasks to the best suited processing element. This makes it possible for larger, more complex applications to take advantage of the power that has traditionally been reserved for more focused tasks. While the biggest impact will be for AMD APU users, supporting benefits of the work will improve the LibreOffice core data structures enabling larger spreadsheets to calculate faster for all users.

Full Story (comments: 10)

Bruce Schneier Joins EFF Board of Directors

The Electronic Frontier Foundation has announced that security expert Bruce Schneier has joined its Board of Directors. "Schneier is widely acclaimed for his criticism and commentary on everything from network security to national security. His insight is particularly important as we learn more and more about the unconstitutional surveillance programs from the National Security Agency and the depth and breadth of data the NSA is collecting on the public."

Full Story (comments: none)

Doug Engelbart RIP

GigaOM is reporting that Doug Engelbart, famous as the inventor of the mouse, has passed away. Another pioneer is gone.

Comments (1 posted)

Articles of interest

Free Software Supporter, Issue 63

The June edition of the Free Software Foundation newsletter covers a statement on PRISM revelations, FSF-certified device from ThinkPenguin, MediaGoblin 0.4.0, LibreWRT, and several other topics.

Full Story (comments: none)

IRS Puts Open Source Projects Under Microscope, Spawns Nonprofit Black Hole (Wired)

Wired looks at delays for open-source-oriented groups in getting their applications for non-profit status accepted—or denied—by the US Internal Revenue Service (IRS). Open source is on a list of organization types that require extra scrutiny from the IRS—"Tea Party" groups making that list have been in the news over the last month or so. "That has provided the documentary evidence for a phenomenon that many open source project leaders know all too well: For the past four years, it's been close to impossible to get an open source project approved for 501(c)(3) classification — a nonprofit status that allows supporters to make tax-exempt donations to the organization. Take the Open Source Geospatial Foundation, which builds open-source mapping software called OSGeo. It first applied for 501(c)(3) status more than five years ago, according to Tyler Mitchell, the former executive director of the foundation. 'It's not resolved today,' he says. 'You'll just keep thinking that it will be resolved in a couple of weeks. It never will be. '"

Comments (29 posted)

Unix luminary among seven missing at sea (The Register)

The Register reports that the schooner Nina, carrying Evi Nemeth and others, has been lost at sea. "One of the shining lights of the world of Unix, retired CU professor Evi Nemeth, is among a group of sailors missing at sea near New Zealand. The author of system administration tomes covering both Unix and Linux – and, incidentally a mathematician of sufficient quality to identify problems with Diffie-Helman encryption – has spent much of her retirement sailing."

Comments (15 posted)

FSFE: German Parliament elections: The parties' positions on Free Software

The Free Software Foundation Europe has published its Free Software related election questions for this fall's elections to the German parliament. "First, something pleasant: SPD, the Greens, the Pirate party, the Linke and the Free Voters want software where development was funded by the public administration to be published under a free licence."

Full Story (comments: none)

Calls for Presentations

CFP Deadlines: July 4, 2013 to September 2, 2013

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
July 6 September 23
September 27
Tcl/Tk Conference New Orleans, LA, USA
July 8 October 21
October 23
Open Source Developers Conference Auckland, New Zealand
July 15 August 16
August 18
PyTexas 2013 College Station, TX, USA
July 15 October 22
October 24
Hack.lu 2013 Luxembourg, Luxembourg
July 19 October 23
October 25
Linux Kernel Summit 2013 Edinburgh, UK
July 20 January 6
January 10
linux.conf.au Perth, Australia
July 21 October 21
October 23
KVM Forum Edinburgh, UK
July 21 October 21
October 23
LinuxCon Europe 2013 Edinburgh, UK
July 21 October 19 Central PA Open Source Conference Lancaster, PA, USA
July 22 September 19
September 20
Open Source Software for Business Prato, Italy
July 25 October 22
October 23
GStreamer Conference Edinburgh, UK
July 28 October 17
October 20
PyCon PL Szczyrk, Poland
July 29 October 28
October 31
15th Real Time Linux Workshop Lugano, Switzerland
July 29 October 29
November 1
PostgreSQL Conference Europe 2013 Dublin, Ireland
July 31 November 5
November 8
OpenStack Summit Hong Kong, Hong Kong
July 31 October 24
October 25
Automotive Linux Summit Fall 2013 Edinburgh, UK
August 7 September 12
September 14
SmartDevCon Katowice, Poland
August 15 August 22
August 25
GNU Hackers Meeting 2013 Paris, France
August 18 October 19 Hong Kong Open Source Conference 2013 Hong Kong, China
August 19 September 20
September 22
PyCon UK 2013 Coventry, UK
August 21 October 23 TracingSummit2013 Edinburgh, UK
August 22 September 25
September 27
LibreOffice Conference 2013 Milan, Italy
August 30 October 24
October 25
Xen Project Developer Summit Edinburgh, UK
August 31 October 26
October 27
T-DOSE Conference 2013 Eindhoven, Netherlands
August 31 September 24
September 25
Kernel Recipes 2013 Paris, France
September 1 November 18
November 21
2013 Linux Symposium Ottawa, Canada

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

openSUSE Conference Sponsors Announced

The openSUSE Conference team has announced the sponsors for the openSUSE Conference 2013 which takes place July 18-22 in Thessaloniki, Greece. Sponsors include SUSE Linux GmbH, ARM, DevHdR, Oracle, and more.

Full Story (comments: none)

LPI Hosts Exam Labs at openSUSE 2013 Conference

The Linux Professional Institute will host an exam lab at the openSUSE Conference in Thessaloniki, Greece on July 20.

Full Story (comments: none)

Events: July 4, 2013 to September 2, 2013

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
July 1
July 5
Workshop on Dynamic Languages and Applications Montpellier, France
July 1
July 7
EuroPython 2013 Florence, Italy
July 2
July 4
OSSConf 2013 Žilina, Slovakia
July 3
July 6
FISL 14 Porto Alegre, Brazil
July 5
July 7
PyCon Australia 2013 Hobart, Tasmania
July 6
July 11
Libre Software Meeting Brussels, Belgium
July 8
July 12
Linaro Connect Europe 2013 Dublin, Ireland
July 12
July 14
GNU Tools Cauldron 2013 Mountain View, CA, USA
July 12
July 14
5th Encuentro Centroamerica de Software Libre San Ignacio, Cayo, Belize
July 12 PGDay UK 2013 near Milton Keynes, England, UK
July 13
July 19
Akademy 2013 Bilbao, Spain
July 15
July 16
QtCS 2013 Bilbao, Spain
July 18
July 22
openSUSE Conference 2013 Thessaloniki, Greece
July 22
July 26
OSCON 2013 Portland, OR, USA
July 27
July 28
PyOhio 2013 Columbus, OH, USA
July 27 OpenShift Origin Community Day Mountain View, CA, USA
July 31
August 4
OHM2013: Observe Hack Make Geestmerambacht, the Netherlands
August 1
August 8
GUADEC 2013 Brno, Czech Republic
August 3
August 4
COSCUP 2013 Taipei, Taiwan
August 6
August 8
Military Open Source Summit Charleston, SC, USA
August 7
August 11
Wikimania Hong Kong, China
August 9
August 13
PyCon Canada Toronto, Canada
August 9
August 11
XDA:DevCon 2013 Miami, FL, USA
August 9
August 12
Flock - Fedora Contributor Conference Charleston, SC, USA
August 11
August 18
DebConf13 Vaumarcus, Switzerland
August 12
August 14
YAPC::Europe 2013 “Future Perl” Kiev, Ukraine
August 16
August 18
PyTexas 2013 College Station, TX, USA
August 22
August 25
GNU Hackers Meeting 2013 Paris, France
August 23
August 24
Barcamp GR Grand Rapids, MI, USA
August 24
August 25
Free and Open Source Software Conference St.Augustin, Germany
August 30
September 1
Pycon India 2013 Bangalore, India

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds