User: Password:
|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for November 15, 2012

LCE: The failure of operating systems and how we can fix it

By Michael Kerrisk
November 14, 2012

The abstract of Glauber Costa's talk at LinuxCon Europe 2012 started with the humorous note "I once heard that hypervisors are the living proof of operating system's incompetence". Glauber acknowledged that hypervisors have indeed provided a remedy for certain deficiencies in operating system design. But the goal of his talk was to point out that, for some cases, containers may be an even better remedy for those deficiencies.

Operating systems and their limitations

Because he wanted to illustrate the limitations of traditional UNIX systems that hypervisors and containers have been used to address, Glauber commenced with a recap of some operating system basics.

In the early days of computing, a computer ran only a single program. The problem with that mode of operation is that valuable CPU time was wasted when the program was blocked because of I/O. So, Glauber noted "whatever equivalent of Ingo Molnar existed back then wrote a scheduler" in order that the CPU could be shared among processes; thus, CPU cycles were no longer wasted when one process blocked on I/O.

A later step in the evolution of operating systems was the addition of virtual memory, so that (physical) memory could be more efficiently allocated to processes and each process could operate under the illusion that it had an isolated address space.

However, nowadays we can see that the CPU scheduling and virtual memory abstractions have limitations. For example, suppose you start a browser or another program that uses a lot of memory. As a consequence, the operating system will likely start paging out memory from processes. However, because the operating system makes memory-management decisions at a global scope, typically employing a least recently used (LRU) algorithm, it can easily happen that excessive memory use by one process will cause another process to suffer being paged out.

There is an analogous problem with CPU scheduling. The kernel allocates CPU cycles globally across all processes on the system. Processes tend to use as much CPU as they can. There are mechanisms to influence or limit CPU usage, such as setting the nice value of a process to give it a relatively greater or lesser share of the CPU. But these tools are rather blunt. The problem is that while it is possible to control the priority of individual processes, modern applications employ groups of processes to perform tasks. Thus, an application that creates more processes will receive a greater share of the CPU. In theory, it might be possible to address that problem by dynamically adjusting process priorities, but in practice this is too difficult, since processes may come and go quite quickly.

The other side of the resource-allocation problem is denial-of-service attacks. With traditional UNIX systems, local denial-of-service attacks are relatively easy to perpetrate. As a first example, Glauber gave the following small script:

    $ while true; do mkdir x; cd x; done

This script will create a directory structure that is as deep as possible. Each subdirectory "x" will create a dentry (directory entry) that is pinned in non-reclaimable kernel memory. Such a script can potentially consume all available memory before filesystem quotas or other filesystem limits kick in, and, as a consequence, other processes will not receive service from the kernel because kernel memory has been exhausted. (One can monitor the amount of kernel memory being consumed by the above script via the dentry entry in /proc/slabinfo.)

Fork bombs create a similar kind of problem that affects unrelated processes on the system. As Glauber noted, when an application abuses system resources in these ways, then it should be the application's problem, rather than being everyone's problem.

Hypervisors

Hypervisors have been the traditional solution to the sorts of problems described above; they provide the resource isolation that is necessary to prevent those problems.

By way of an example of a hypervisor, Glauber chose KVM. Under KVM, the Linux kernel is itself the hypervisor. That makes sense, Glauber said, because all of the resource isolation that should be done by the hypervisor is already done by the operating system. The hypervisor has a scheduler, as does the kernel. So the idea of KVM is to simply re-use the Linux kernel's scheduler to schedule virtual machines. The hypervisor has to manage memory, as does the kernel, and so on; everything that a hypervisor does is also part of the kernel's duties.

There are many use cases for hypervisors. One is simple resource isolation, so that, for example, one can run a web server and a mail server on the same physical machine without having them interfere with one another. Another use case is to gather accurate service statistics. Thus, for example, the system manager may want to run top in order to obtain statistics about the mail server without seeing the effect of a database server on the same physical machine; placing the two servers in separate virtual machines allows such independent statistics gathering.

Hypervisors can be useful in conjunction with network applications. Since each virtual machine has its own IP address and port number space, it is possible, for example, to run two different web servers that each use port 80 inside different virtual machines. Hypervisors can also be used to provide root privilege to a user on one particular virtual machine. That user can then do anything they want on that virtual machine, without any danger of damaging the host system.

Finally, hypervisors can be used to run different versions of Linux on the same system, or even to run different operating systems (e.g., Linux and Windows) on the same physical machine.

Containers

Glauber noted that all of the above use cases can be handled by hypervisors. But, what about containers? Hypervisors handle these use cases by running multiple kernel instances. But, he asked, shouldn't it be possible for a single kernel to satisfy many of these use cases? After all, the operating system was originally designed to solve resource-isolation problems. Why can't it go further and solve these other problems as well by providing the required isolation?

From a theoretical perspective, Glauber asked, should it be possible for the operating system to ensure that excessive resource usage by one group of processes doesn't interfere with another group of processes? Should it be possible for a single kernel to provide resource-usage statistics for a logical group of processes? Likewise, should the kernel be able to allow multiple processes to transparently use port 80? Glauber noted that all of these things should be possible; there's no theoretical reason why an operating system couldn't support all of these resource-isolation use cases. It's simply that, historically, operating systems were not built with these requirements in mind. The only notable use case above that couldn't be satisfied is for a single kernel to run a different kernel or operating system.

The goal of containers is, of course, to add the missing pieces that allow a kernel to support all of the resource-isolation use cases, without the overhead and complexity of running multiple kernel instances. Over time, various patches have been made to the kernel to add support for isolation of various types of resources; further patches are planned to complete that work. Glauber noted that although all of those kernel changes were made with the goal of supporting containers, a number of other interesting uses had already been found (some of these were touched on later in the talk).

Glauber then looked at some examples of the various resource-isolation features ("namespaces") that have been added to the kernel. Glauber's first example was network namespaces. A network namespace provides a private view of the network for a group of processes. The namespace includes private network devices and IP addresses, so that each group of processes has its own port number space. Network namespaces also make packet filtering easier, since each group of processes has its own network device.

Mount namespaces were one of the earliest namespaces added to the kernel. The idea is that a group of processes should see an isolated view of the filesystem. Before mount namespaces existed, some degree of isolation was provided by the chroot() system call, which could be used to limit a process (and its children) to a part of the filesystem hierarchy. However, the chroot() system call did not change the fact that the hierarchical relationship of the mounts in the filesystem was global to all processes. By contrast, mount namespaces allow different groups of processes to see different filesystem hierarchies.

User namespaces provide isolation of the "user ID" resource. Thus, it is possible to create users that are visible only within a container. Most notably, user namespaces allow a container to have a user that has root privileges for operations inside the container without being privileged on the system as a whole. (There are various other namespaces in addition to those that Glauber discussed, such as the PID, UTS, and IPC namespaces. One or two of those namespaces were also mentioned later in the talk.)

Control groups (cgroups) provide the other piece of infrastructure needed to implement containers. Glauber noted that cgroups have received a rather negative response from some kernel developers, but he thinks that somewhat misses the point: cgroups have some clear benefits.

A cgroup is a logical grouping of processes that can be used for resource management in the kernel. Once a cgroup has been created, processes can be migrated in and out of the cgroup via a pseudo-filesystem API (details can be found in the kernel source file Documentation/cgroups/cgroups.txt).

Resource usage within cgroups is managed by attaching controllers to a cgroup. Glauber briefly looked at two of these controllers.

The CPU controller mechanism allows a system manager to control the percentage of CPU time given to a cgroup. The CPU controller can be used both to guarantee that a cgroup gets a guaranteed minimum percentage of CPU on the system, regardless of other load on the system, and also to set an upper limit on the amount of CPU time used by a cgroup, so that a rogue process can't consume all of the available CPU time. CPU scheduling is first of all done at the cgroup level, and then across the processes within each cgroup. As with some other controllers, CPU cgroups can be nested, so that the percentage of CPU time allocated to a top-level cgroup can be further subdivided across cgroups under that top-level cgroup.

The memory controller mechanism can be used to limit the amount of memory that a process uses. If a rogue process runs over the limit set by the controller, the kernel will page out that process, rather than some other process on the system.

The current status of containers

It is possible to run production containers today, Glauber said, but not with the mainline kernel. Instead, one can use the modified kernel provided by the open source OpenVZ project that is supported by Parallels, the company where Glauber is employed. Over the years, the OpenVZ project has been working on upstreaming all of its changes to the mainline kernel. By now, much of that work has been done, but some still remains. Glauber hopes that within a couple of years ("I would love to say months, but let's get realistic") it should be possible to run a full container solution on the mainline kernel.

But, by now, it is already possible to run subsets of container functionality on the mainline kernel, so that some people's use cases can already be satisfied. For example, if you are interested in just CPU isolation, in order to limit the amount of CPU time used by a group of processes, that is already possible. Likewise, the network namespace is stable and well tested, and can be used to provide network isolation.

However, Glauber said, some parts of the container infrastructure are still incomplete or need more testing. For example, fully functional user namespaces are quite difficult to implement. The current implementation is usable, but not yet complete, and consequently there are some limitations to its usage. Mount and PID namespaces are usable, but likewise still have some limitations. For example, it is not yet possible to migrate a process into an existing instance of either of those namespaces; that is a desirable feature for some applications.

Glauber noted some of the kernel changes that are still yet to be merged to complete the container implementation. Kernel memory accounting is not yet merged; that feature is necessary to prevent exploits (such as the dentry example above) that consume excessive kernel memory. Patches to allow kernel-memory shrinkers to operate at the level of cgroups are still to be merged. Filesystem quotas that operate at the level of cgroups remain to implemented; thus, it is not yet possible to specify quota limits on a particular user inside a user namespace.

There is already a wide range of tooling in place that makes use of container infrastructure, Glauber said. For example, the libvirt library makes it possible to start up an application in a container. The OpenVZ vzctl tool is used to manage full OpenVZ containers. It allows for rather sophisticated management of containers, so that it is possible to do things such as running containers using different Linux distributions on top of the same kernel. And "love it or hate it, systemd uses a lot of the infrastructure". The unshare command can be used to run a command in a separate namespace. Thus, for example, it is possible to fire up a program that operates in an independent mount namespace.

Glauber's overall point is that containers can already be used to satisfy several of the use cases that have historically been served by hypervisors, with the advantages that containers don't require the creation of separate full-blown virtual machines and provide much finer granularity when controlling what is or is not shared between the processes inside the container and those outside the container. After many years of work, there is by now a lot of container infrastructure that is already useful. One can only hope that Glauber's "realistic" estimate of two years to complete the upstreaming of the remaining container patches proves accurate, so that complete container solutions can at last be run on top of the mainline kernel.

Comments (38 posted)

LCE: All watched over by machines of loving grace

By Michael Kerrisk
November 14, 2012

Karsten Gerloff is the current president of the Free Software Foundation Europe, the European sister organization of the FSF. He began his LinuxCon Europe 2012 talk, entitled "All watched over by machines of loving grace", by focusing on a number of technological trends. However, he prefaced that by saying that what he really wanted to focus on were the more interesting topics of society, power, and control.

Technological advances

The number of computers in our lives is increasing. Karsten noted that he could count at least 17 computers in his home: in his camera, freezer, alarm clock, network router, car IVI system, laptop, and so on. All of those computers can in principle perform any computable task, but, in many cases, the software turns them into appliances.

At the same time as the number of computers in our lives has increased, the cost of communication has plummeted, and the range of knowledge to which we have access has vastly increased. But, it is not so long since things were very different. To illustrate these points, Karsten drew a couple of examples from his own life. In 1994, he went as an exchange student from Germany to a high school the US. The following Christmas, his girlfriend asked her parents for just one thing in lieu of all other presents: a 30-minute phone call to her boyfriend. By contrast, today we think nothing of firing up a VOIP call to almost anywhere in the world.

At university in the 1990s, when Karsten wanted to learn about a new subject, he went to the library. Where the resources of the library ended, so too did his research, more or less. Beyond that, the best he might attempt was to venture to a university library in another city, or request a book or two on inter-library loan, gambling that it might be relevant to his research. By contrast, today we start our research on a new topic by going to Wikipedia or a search engine, and increasingly, cutting-edge information appears first on the net.

Karsten noted that these huge changes in the cost of communication and accessibility of information are based on two powerful tools: general-purpose computers that will do anything we teach them to do, and general-purpose networks that will transmit whatever we want.

Restricted devices and products

However, the technological advances described above are under threat from those who see profit in turning our general-purpose computers into limited appliances, or into devices that are controlled by someone other than the owner. So, Karsten says, when we approach a service or device, we need to ask: what can we do with this? For example, can I make the powerful computer in my phone do something it was not intended to do?

Restrictions on functionality are often added when marketing gets involved in the product-design cycle. At this point, product features that get in the way of business goals are eliminated. Here, Karsten mentioned a couple of examples. All digital cameras produce raw image output. However, after converting that format to JPEG, low-end cameras then discard the raw data. Photographers who want the option of keeping the raw data must instead purchase a more expensive "professional" camera that doesn't discard the raw data. In the telecoms world, mobile phone operators often try to block VOIP over their data services, in an effort to force their customers to make and to pay for calls over the operator's own service.

These sorts of marketing-driven restrictions are very much against our interests, because the kinds of technological leaps described above—the fall in the cost of sending information, the increase in speed of sending information, and the consequent increase in the amount of information that we have access to—were only possible because someone took a general-purpose computer and connected it to a general-purpose network, and made it do something that no one had thought of before. Allowing this sort of generality of operation paves the way for innovations that are often unforeseen. Thus, when Bob Kahn and Vint Cerf devised TCP in 1974, they didn't think of the world wide web, but they designed TCP in a way that allowed the web to happen. Similarly, Tim Berners-Lee didn't conceive of Wikipedia when he designed the World Wide Web, but the Web was designed in a way that allowed Wikipedia to happen.

A restricted device—a smart phone that can't make VOIP calls, for example—is inconvenient. But the real cost of these sorts of restrictions is the limitations they place on our ability to create and innovate, to come up with the next big idea. We thereby lose opportunities to improve the world, and, Karsten noted, in a world where thousands of people die of hunger and preventable diseases each day, that's a cost we can't afford.

Free software and the forces against it

Given unrestricted hardware and networks, how do we implement our next big ideas? That is, of course, where free software comes in. Free software is powerful, Karsten said, because it allows us to share and reuse work, and to work across physical, geographical, and company boundaries. Everyone working on free software has their own purpose, but all benefit from the work. By being accessible and promoting a spirit of investigation, free software lets us control the technology we use, rather than vice versa.

However, there are forces that work against free software innovation. Karsten looked at some current and notable examples of such forces: UEFI secure boot, DRM, patents, and centralized control of information.

UEFI is the successor to BIOS. UEFI's secure boot protocol, or "restricted boot" as Karsten likes to call it, is a boot mechanism whereby UEFI checks that a boot loader is signed by an recognized key, and if the check fails, the machine will not boot. Secure boot provides a way for the person who does the authorizing to control what software you install on your machine. Of course, in this case, the authorizer is Microsoft. Hardware vendors that want to sell computers with a "Windows compatible" logo must comply with the rules that Microsoft defines (and can change).

For example, Microsoft says that vendors of Intel PCs must provide a mechanism for users to disable secure boot. But, hardware vendors that want to use the "Windows compatible" logo on an ARM device are not allowed to provide a mechanism to disable UEFI secure boot. Thus, since Windows phones started shipping in October, millions of restricted computers are flooding onto the market every day. (Of course, this is in addition to the millions of Android and iPhone devices that are already locked down via vendor-specific equivalents of secure boot.) Returning to his earlier point that the real test of freedom is whether you can make your computer do something that was unintended by the manufacturer, Karsten said that if you buy a Windows phone (or another restricted phone) then, you don't own it, in the sense of having a general-purpose computing device.

DRM (Digital Rights Management, or as Karsten prefers, "Digital Restrictions Management") started as the music industry's attempt to preserve a business model. But it has crept elsewhere, into devices such as the Kindle. The Kindle is a powerful tablet device that thanks to DRM has been turned into, essentially, a digital shopping catalog. "Once upon a time, companies used to send me catalogs at their expense. Nowadays, I'm being asked to buy the catalog. I'm not sure it's a good deal."

Karsten then turned his attention to patents and, in particular, the strategies of some companies that are acquiring large numbers of patents. He began with the observation that he really likes 3D printers because "I hope they'll do for manufacturing what free software did for computing: put control back in our hands." A few weeks ago someone obtained a patent on a DRM system for 3D printers. That patent is like other DRM patents: a patent for a piece of software in the printer that checks with a server to see if the hash of the to-be-printed file is in the set of files that the user is allowed to print; if it is not, then the file is not printed.

Who obtained the patent? A company called Intellectual Ventures, which was cofounded by Nathan Myhrvold, former CTO of Microsoft. Intellectual Ventures is a company with a reputation: "Calling Intellectual Ventures a patent troll would be like calling the Atlantic Ocean a puddle." The company is rather secretive, so information about its workings is hard to come by. However, some researchers recently published a paper entitled The Giants Among Us that pulled together all of the information that they could find. By now, Intellectual Ventures controls tens of thousands of patents. The company's strategy is to monetize those patents in any way it can. Sometimes that is done by licensing the patents to people who make things, but more commonly it is done by "extorting" money from people who make something without asking Intellectual Ventures's permission first. The researchers identified around 1300 shell companies operated by Intellectual Ventures (but they suspect they haven't found them all).

Intellectual Ventures pursues strategies such as threatening people with patent litigation, and demanding that companies pay money to avoid that litigation. The paper notes that "patent mass aggregators" such as Intellectual Ventures are also believed to employ hitherto unusual strategies for acquiring patents—for example, offering universities in developing countries contracts where, in exchange for cash, the university gives the company rights on all innovations that the universities create for the next N years.

In short, Intellectual Ventures is trying to create a monopoly on innovation. And they are not alone: Intellectual Ventures is the largest of the mass aggregators, but there are many other companies now doing the same.

However, Karsten is not without optimism. Patents and patent trolls constitute a powerful threat to innovation and free software. But free software is a powerful opponent, because "we work together and have a large shared interest…we don't acknowledge company or country borders, [and] we're very creative at working around restrictions and eventually beating them." It's Karsten's hope that the current patent system will start shaking at its foundations within five years, and will be breaking down within ten years.

The problem of centralized control

By design, the Internet was built with no central point of control. And on top of that distributed network, we've built distributed, decentralized systems such as email. But the general-purpose nature of our networks and computers is not a given natural order. It can be reversed. And indeed that is what is happening in many areas as companies erect structures that centralize control. Thus "Facebook defines who we are, Google defines what we think, and Amazon defines what we want", because we feed them information, and they place us in a "comfortable bubble" where we no longer see other opinions and cease being curious.

The problem is that those companies will sell us out when it is in their interests to do so. Here, Karsten mentioned the case where Yahoo! surrendered the details of a Chinese blogger to the Chinese government. Most likely what happened was that the Chinese government threatened to exclude Yahoo! from doing business in China. Consequently, Yahoo! provided details identifying the blogger, despite apparently having some information that suggested they knew that the blogger had antigovernment sympathies and was therefore at risk of persecution.

But, asked Karsten, this only happens in dictatorships, right? Well, no. In September, Twitter handed over messages by some of its users who were part of the Occupy Wall Street movement to New York prosecutors. After originally declining to do this on the grounds of protecting free speech, a judge threatened the company with a fine based on a percentage of its earnings. This threat constituted a double blow, since it would have required Twitter, a private company, to actually reveal its earnings. Given a choice between loyalty to private shareholders and loyalty to users, Twitter chose the former.

We can, of course, leave these centralized structures of control. But, we do so much as dissidents left the Soviet Union, leaving behind friends and family. Yes, Karsten remarked, it is all only digital, but there is still some pain in leaving if you have invested part of your life in these structures.

Rather than submitting to centralized control, we can build decentralized structures, running on servers in every home. We already have enough computers to do that. And in fact we already have most of the required software components: it's just a matter of putting them together, which is the goal of the Freedom Box project. (A Freedom Box talk by Bdale Garbee was unfortunately scheduled at the conference at exactly the same time as Karsten's talk.) Here, Karsten listed a number of the tools and protocols that already exist: Diaspora and GNU social for social networking; protocols such as OStatus, WebFinger, and Federated Social Web that allow these services to work together; distributed file storage through ownCloud and GNUnet; user-owned ISPs; Bitcoin as a currency; and distributed search engines such as YaCy. We can replicate (in a decentralized fashion) all the things that we today use in a centralized fashion.

The key to controlling our own future is to master the necessary technologies and skills. Karsten quoted from Douglas Rushkoff's book Program or Be Programmed:

The real question is, do we direct technology, or do we let ourselves be directed by it and those who have mastered it? Choose the former and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make.

So, said Karsten, when you see a system or a service, ask yourself: who controls this? If you don't like the answer, don't buy the product or the service. Go and build something better.

The work of the FSFE

Karsten concluded his talk with a few words about the activities of the FSFE. Like its sister organizations (FSF, FSF Latin America, FSF India), FSFE is a non-profit organization that engages in various activities to support free software. The organization does a lot of work in politics, to try and get rid of bad laws, and get good laws made. They run campaigns to increase people's awareness of free software. For example, they are currently running a "Free your Android" campaign where they show people how to put a freer version of Android on their devices. (Interestingly, they are getting a lot of interest from people who characterize themselves as non-technical, but who are concerned about where their data is going.)

Karsten extended an invitation to support the FSFE. There are many ways to do this, from simply signing up as a supporter to more actively engaging as a Fellow of the foundation. For individuals and companies that have the resources, financial donations are of course useful. "But, more important than this, stop taking the road of least resistance. Start doing things that make yourselves more free."

[For those who are curious, the title of Karsten's talk comes from the title poem in a book of poetry by the American writer Richard Brautigan, probably by way of a BBC documentary TV series of the same name. Both the book and the TV series have resonances for the topic of the talk. Brautigan licensed the poems to be freely reprinted in books and newspapers if they are given away for free. Wikipedia notes that the TV series "argues that computers have failed to liberate humanity and instead have distorted and simplified our view of the world around us".]

Comments (9 posted)

RTS and the GPL

By Jake Edge
November 14, 2012

GPL violations, or allegations thereof, are typically handled in private—unless, of course, they get to the lawsuit stage. That makes it rather surprising to see a GPL violation accusation on the linux-kernel mailing list. The code in question is the LIO SCSI target implementation, which was merged for 3.1 after a prolonged struggle with another contender; the alleged violator is Rising Tide Systems (RTS), which was the original developer and contributor of LIO. As might be guessed, the dispute is rather messy; there are more questions than answers.

The opening salvo was from Andy Grover to SCSI target maintainer (and RTS CTO) Nicholas Bellinger, with a copy to linux-kernel: "Your company appears to be shipping kernel features in RTS OS that are not made available under the GPL, specifically support for the EXTENDED_COPY and COMPARE_AND_WRITE SCSI commands, in order to claim full Vmware vSphere 5 VAAI support." Bellinger, however, didn't quite see things that way. He maintains that RTS owns the copyright to the code it contributed to Linux, so it can license it to its customers any way that it wants:

In fact, we are not violating GPL. In short, this is because we wrote the code you are referring to (the SCSI target core in our commercial RTS OS product), we have exclusive copyright ownership of it, and this code contains no GPL code from the community. GPL obligations only apply downstream to licensees, and not to the author of the code. Those who use the code under GPL are subject to its conditions; we are not.

But there is a wrinkle here. RTS is shipping a Linux kernel, along with its proprietary SCSI target module, as RTS OS. Proprietary kernel modules have always been controversial, but they have largely been ignored when they are descended from another operating system's driver (e.g. NVIDIA) and are not distributed with the kernel. If that doesn't hold, it could become a bigger problem, as Dave Airlie pointed out:

But if you distribute a kernel and a module in one place which I assume RTS OS does, then you are in dangerous territory and could be hit with cease and desist notices, which has happened to people shipping kernels and linked nvidia binary drivers before.

In response, Bellinger made a common mistake regarding the kernel's symbol export macros. The EXPORT_SYMBOL() and EXPORT_SYMBOL_GPL() macros are an advisory mechanism that kernel developers use in an attempt to show which symbols can only be used by GPL-covered modules. Importantly, avoidance of EXPORT_SYMBOL_GPL() symbols is not, in itself, an indication that a module is not a derived work of the kernel (and thus subject to the terms of the GPL). It's somewhat of a tricky distinction, but anyone releasing a proprietary kernel module should very likely be getting legal advice; lawyers should have little trouble understanding the intent.

Specifically, Bellinger responded: "we only use Linux kernel symbols that are not marked as GPL". That provoked a quick response from Alan Cox, who noted that the distinction lies in the question of whether the SCSI target module is a derivative work—"The symbol tags are just a guidance". Furthermore, "either your work is [truly] not derivative of the kernel (which I find wildly improbable) or you have a problem". Cox also mentioned one other potential problem for RTS, certain patents (notably read-copy-update) are only licensed for use by GPL-covered code.

There are some other things to consider as well. Bellinger clearly indicated that RTS maintains a separate repository for its proprietary SCSI target. The kernel's version of LIO has seen numerous changes and fixes from others, but those were contributed under the GPL; Grover questioned whether some of those changes were flowing back into the proprietary version. Without seeing the code, it's a little hard to say for sure.

Enter the lawyer

As it turns out, RTS has retained Lawrence Rosen, the well-known free software savvy attorney, for advice. He stepped into the thread in an effort to try to smooth things over a bit. He reiterated the RTS stance that it maintains a separate version that is completely owned by the company, which it licenses to its customers under non-GPL terms.

Rosen also maintains that Oracle v. Google in the US (and, in Europe, SAS v. World Programming) debunks the claim that using the Linux APIs automatically creates a derivative work. That's a bit lawyerly, as no one in the thread was making that particular argument. Cox, Airlie, Grover, and others were arguing that using the APIs could make a derivative work—in fact is likely to make one—but not that it automatically does so. In fact, Airlie gave two examples (the Andrew Filesystem and binary GPU drivers) where at least some in the kernel community believe derivative works are not created. In the end, though, the courts have not given us a definitive decision on what constitutes a derivative work in the software realm, which is part of the reason for the murkiness here.

Beyond that, Bradley Kuhn pointed out that Rosen's interpretation of those two cases is not universally held. He also noted that there simply isn't enough information available for anyone outside of RTS to make any real assessment regarding the proprietary code. Whether that code is a derivative work or not requires more information, which has, as yet, not been forthcoming.

There is yet another wrinkle here. Grover has been posting using his redhat.com email address, which could give the impression that this is some kind of official Red Hat accusation (though Grover never claims that it is). Both Bellinger and Rosen drag Red Hat into the conversation, as do others in the thread. In fact, Rosen's response is copied to lawyers for both Red Hat and the Linux Foundation. That escalates the issue further, of course.

Enter the SCSI maintainer

Kernel SCSI maintainer James Bottomley was not particularly pleased with Grover's accusation. He questioned some of Grover's assumptions about which SCSI commands were actually supported by RTS OS, but also strongly suggested verifying how the software worked before slinging any more potential "libel". Bottomley seems concerned—perhaps overly—that Grover needlessly "exposes us all to jeopardy of legal sanction". Kuhn, at least, thought the legal dangers were overblown, and suggested that Grover continue to "seek facts".

As of November 13, no further responses from Bellinger or Rosen have been posted. In his initial post, Grover noted that he had tried to discuss the problem with Bellinger and RTS CEO Marc Fleischmann via private email, without getting a "useful response". Grover's public post finally did elicit some response, one that he thinks clearly indicates a GPL violation. But he also pointed out another sticky issue:

But let's forget licenses and talk community. Looking back, can anyone say that your push to get LIO accepted into mainline as the kernel target was in good faith? Back before LIO was merged, James chose LIO over SCST saying to the SCST devs:

"Look, let me try to make it simple: It's not about the community you bring to the table, it's about the community you have to join when you become part of the linux kernel."

RTS behaved long enough to get LIO merged, and then forget community. James is right, community is more important than code, and licenses enforce community expectations. RTS has appeared just community-focused enough to prevent someone else's code from being adopted, so it can extract the benefits and still maintain its proprietary advantage.

It is an unprecedented situation and one that is likely to be messy to unwind. The Linux SCSI target appears to be missing features because its maintainer is adding them to a proprietary driver that ships with the GPL-covered kernel. There is certainly nothing that forces Bellinger or RTS to make their improvements available unless, of course, there is a GPL requirement to do so. But if some other developer were to come along with an implementation of the missing features, is Bellinger going to accept them into the mainline driver, forcing (further) divergence with the proprietary code?

The open nature of the accusation, and that it is associated with Red Hat (fairly or not), will also complicate the issue. It is unlikely that the company would set out to address a possible violation in this way, but it now finds itself in a sticky situation. Extracting itself may be difficult, and completely ignoring the problem may not be possible either.

If it turns out that there is no GPL violation, there has still been some damage done, it would seem. The idea behind the choice of the GPL for the kernel is that competitors all have equal access, but in this case RTS appears to be in a privileged position with respect to SCSI target features. That would seem to violate the spirit of Linux development, even if it doesn't violate the license.

Comments (23 posted)

Page editor: Jonathan Corbet

Security

Potential pitfalls in DNS handling

By Jake Edge
November 14, 2012

The domain name system (DNS) seems relatively straightforward, at least from a high level, but there are some darker corners of the protocol that could easily trip up the unwary—or even the wary. A recent vulnerability in the Exim mail transfer agent shows one such example, but there are more. In fact, Exim developer Phil Pennock, who patched the recent vulnerability, has collected up a number of these places where DNS parsing can go awry.

The Exim hole was a fairly standard buffer overflow, but it came about because of the way DNS messages are structured. When a program requests a TXT record (for, say, a DomainKeys Identified Mail (DKIM) public key), the reply is broken up into multiple DNS "strings". The TXT record itself can be up to 64K in size, with an overall length specified in the "resource record" (RR) header, but it is broken up into multiple strings, each of which is prefaced with a length.

Each string is a one-octet length value, followed by that many octets of data. To construct the full TXT record, one collects each of the string payloads into a buffer, which is where Exim went astray. For DKIM verification, a 4K buffer was allocated for the TXT record. Each string was length checked, so that it couldn't overrun the buffer, but the loop did not terminate once the buffer was exhausted. An attacker-controlled DNS server (or a benign server that just had a TXT record larger than 4K) could send a large record and either crash Exim or execute arbitrary code.

The fix is simple, making two changes: check for buffer exhaustion before looking at the next string and increase the size of the buffer to 64K. Either of those would be sufficient to fix the problem, doing both just provides a more robust fix. It's not clear why the original 4K buffer size was chosen, but Pennock speculated that it seemed a reasonable limit to the original developer given that there was a test for overflow (though it turned out to be incorrect).

The problem was found in an Exim DKIM code inspection that was done after a US-CERT advisory as and a Wired article raised DKIM issues. While the specific problems reported were not present in Exim, Pennock was concerned that increased attention would be focused on that code, thus the code review.

There are other implications to consider with the strings that make up a TXT record. At first blush, joining the strings directly (rather than with a space or newline character) makes sense, but there are protocols that depend on the strings within a TXT record being treated as separate entities. DKIM and Sender Policy Framework (SPF) both explicitly say that the strings should be joined directly, but forcing that behavior for all TXT records retrieved by Exim broke some ad hoc uses.

Likewise, there is a question of how to handle multiple TXT records. Those records will be returned in random order, so two DKIM key TXT records (i.e. prefaced with "v=DKIM1;") could be returned in a query. If applications don't check for that possibility, or handle it differently than the DNS administrator creating the TXT records expected, problems could result. Once again, DKIM and SPF explicitly disallow multiple TXT records for their information, so compliant programs need to check. Other protocols may not be as clear.

Beyond that, DNS has some surprises in the kinds of names it allows. Many believe that domain and host names are restricted to certain subsets of characters, but that is not true. As RFC 2181 specifies, the limits are purely length-based (63 octets per component, 255 octets for a domain name). Each octet of the name can contain any value from 0 to 255. Looking at the host names returned by the following command is rather interesting:

    $ host -lva test.globnix.net nlns.globnix.net
    ...
    foo\\.bar.test.globnix.net. 600 IN      A       192.0.2.8
    ...
    cr\013\010lf.test.globnix.net. 600 IN   AAAA    2a02:898:31:dead:beef::32
    ...
    i-want-nul.test.globnix.net. 600 IN     CNAME   nul\000gap.test.globnix.net.
    ...

That domain is one that Pennock has had for years, and the entries are meant to be somewhat eye-opening. For example, note that '.' is legal in the components of a host name (represented textually as foo\.bar...). And that brackets ([, ]), colons, NULs (\000), newlines, backslashes, and so on are all legal. Any of those could pose a problem for a program that didn't expect to receive them. One of the ways that might happen is with a reverse lookup, where an IP address to host name mapping is sought.

For actual domain names, it may be difficult or impossible to register any with "weird" characters, but they are definitely legal as far as DNS is concerned. The registrars will shy away from those kinds of domains because they aren't legal in email addresses or URLs. But, as Pennock's examples show, domains with their own DNS can create all sorts of problematic host names.

These dark corners are hopefully well-known to DNS server and library developers, but they aren't necessarily obvious to those outside of those specialties. One can well imagine that there are bugs lurking in applications and tools that use DNS at a medium or low level. Some of those could easily result in security vulnerabilities.

[I would like to thank Phil Pennock for sharing his research and answering questions about DNS handling.]

Comments (26 posted)

Brief items

Security quotes of the week

Put another way, having the career of the beloved CIA Director and the commanding general in Afghanistan instantly destroyed due to highly invasive and unwarranted electronic surveillance is almost enough to make one believe not only that there is a god, but that he is an ardent civil libertarian.
-- Glenn Greenwald

In part it is because encryption with customer controlled keys is inconsistent with portions of their business model. This architecture limits a cloud provider's ability to data mine or otherwise exploit the users' data. If a provider does not have access to the keys, they lose access to the data for their own use. While a cloud provider may agree to keep the data confidential (i.e., they won't show it to anyone else) that promise does not prevent their own use of the data to improve search results or deliver ads. Of course, this kind of access to the data has huge value to some cloud providers and they believe that data access in exchange for providing below-cost cloud services is a fair trade.
-- Richard Falkenrath and Paul Rosenzweig at Nextgov

The concept is simple enough. We need to make abuse of the patent and copyright enforcement system so painful that even the most dedicated corporate executive masochist will think twice before pulling the trigger on their attacks.

Threats and the filing of takedowns, lawsuits, and other actions in the absence of strong and verifiable evidence of significant wrongdoing, not just haphazard shotgun barrages based on mere suspicion and wishful thinking, must trigger significant financial penalties and perhaps other serious sanctions as well.

How about a fine of a million dollars per false attack? Or 1% of gross earnings? And perhaps a five year prohibition against more filings?

If these sound draconian, or unrealistic, that's OK -- consider these to be the outer bounds starting points for discussion.

-- Lauren Weinstein

Comments (3 posted)

New vulnerabilities

catdoc: denial of service

Package(s):catdoc CVE #(s):
Created:November 13, 2012 Updated:November 21, 2012
Description: From the Red Hat bugzilla:

A Debian bug report noted that there is a buffer overflow in catdoc's src/xlsparse.c, which contains:

        for (i=0;i<NUMOFDATEFORMATS; i++);
        FormatIdxUsed[i]=0;

Because of the ";" at the end of the first line, it effectively sets i to NUMOFDATEFORMATS, which will cause it to write past defined buffer. This could lead to a denial of service (crash of catdoc). The Debian bug report indicates that this could possibly be used for worse things than a crash, but I'm not sure (I can see it writing past the end of the buffer, but all it is writing is 0's and not anything user-defined).

Alerts:
Fedora FEDORA-2012-17588 catdoc 2012-11-13
Fedora FEDORA-2012-17554 catdoc 2012-11-13

Comments (11 posted)

cgit: code execution

Package(s):cgit CVE #(s):CVE-2012-4548
Created:November 12, 2012 Updated:November 28, 2012
Description: From the openSUSE advisory:

Specially-crafted commits can cause code to be executed on the clients due to improperly quoted arguments.

Alerts:
Fedora FEDORA-2012-18462 cgit 2012-11-28
openSUSE openSUSE-SU-2012:1461-1 cgit 2012-11-12
Fedora FEDORA-2012-18464 cgit 2012-11-28
openSUSE openSUSE-SU-2012:1460-1 cgit 2012-11-12

Comments (none posted)

gegl: code execution

Package(s):gegl CVE #(s):CVE-2012-4433
Created:November 13, 2012 Updated:October 7, 2013
Description: From the Red Hat advisory:

An integer overflow flaw, leading to a heap-based buffer overflow, was found in the way the gegl utility processed .ppm (Portable Pixel Map) image files. An attacker could create a specially-crafted .ppm file that, when opened in gegl, would cause gegl to crash or, potentially, execute arbitrary code.

Alerts:
Gentoo 201310-05 gegl 2013-10-06
Fedora FEDORA-2013-12115 gegl 2013-07-12
Fedora FEDORA-2013-12108 gegl 2013-07-12
Fedora FEDORA-2013-12075 gegl 2013-07-12
Mandriva MDVSA-2013:081 gegl 2013-04-09
openSUSE openSUSE-SU-2013:0159-1 gegl 2013-01-23
openSUSE openSUSE-SU-2012:1627-1 ppm 2012-12-07
Scientific Linux SL-gegl-20121112 gegl 2012-11-12
Red Hat RHSA-2012:1455-01 gegl 2012-11-12
Mageia MGASA-2012-0335 gegl 2012-11-21
Oracle ELSA-2012-1455 gegl 2012-11-12
CentOS CESA-2012:1455 gegl 2012-11-12

Comments (none posted)

glance: access restriction bypass

Package(s):openstack-glance CVE #(s):CVE-2012-4573
Created:November 8, 2012 Updated:December 11, 2012
Description:

From the SUSE advisory:

OpenStack glance had a bug where image deletion was allowed for all logged in users (CVE-2012-4573).

Alerts:
Red Hat RHSA-2012:1558-01 openstack-glance 2012-12-10
Ubuntu USN-1626-2 glance 2012-11-09
Ubuntu USN-1626-1 glance 2012-11-08
SUSE SUSE-SU-2012:1455-1 openstack-glance 2012-11-08
Fedora FEDORA-2012-18085 openstack-glance 2012-11-21

Comments (none posted)

icedtea-web: code execution

Package(s):icedtea-web CVE #(s):CVE-2012-4540
Created:November 8, 2012 Updated:January 23, 2013
Description:

From the Red Hat advisory:

A buffer overflow flaw was found in the IcedTea-Web plug-in. Visiting a malicious web page could cause a web browser using the IcedTea-Web plug-in to crash or, possibly, execute arbitrary code. (CVE-2012-4540)

Alerts:
openSUSE openSUSE-SU-2015:1595-1 icedtea-web 2015-09-22
Gentoo 201406-32 icedtea-bin 2014-06-29
SUSE SUSE-SU-2013:1520-1 icedtea-web 2013-10-02
openSUSE openSUSE-SU-2013:1511-1 icedtea-web 2013-09-30
openSUSE openSUSE-SU-2013:1509-1 icedtea-web 2013-09-30
openSUSE openSUSE-SU-2013:0174-1 icedtea-web 2013-01-23
openSUSE openSUSE-SU-2012:1524-1 icedtea-web 2012-11-22
Fedora FEDORA-2012-17745 icedtea-web 2012-11-11
CentOS CESA-2012:1434 icedtea-web 2012-11-08
Scientific Linux SL-iced-20121107 icedtea-web 2012-11-07
Fedora FEDORA-2012-17762 icedtea-web 2012-11-11
Mandriva MDVSA-2012:171 icedtea-web 2012-11-09
Mageia MGASA-2012-0329 icedtea-web 2012-11-09
Ubuntu USN-1625-1 icedtea-web 2012-11-07
Oracle ELSA-2012-1434 icedtea-web 2012-11-07
Red Hat RHSA-2012:1434-01 icedtea-web 2012-11-07

Comments (none posted)

libav: multiple unspecified vulnerabilities

Package(s):libav CVE #(s):CVE-2012-2772 CVE-2012-2775 CVE-2012-2776 CVE-2012-2777 CVE-2012-2779 CVE-2012-2784 CVE-2012-2786 CVE-2012-2787 CVE-2012-2788 CVE-2012-2789 CVE-2012-2790 CVE-2012-2793 CVE-2012-2794 CVE-2012-2796 CVE-2012-2798 CVE-2012-2800 CVE-2012-2801 CVE-2012-2802
Created:November 12, 2012 Updated:February 18, 2013
Description: From the CVE entries:

Unspecified vulnerability in the ff_rv34_decode_frame function in libavcodec/rv34.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to "width/height changing with frame threading." (CVE-2012-2772)

Unspecified vulnerability in the read_var_block_data function in libavcodec/alsdec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to a large order and an "out of array write in quant_cof." (CVE-2012-2775)

Unspecified vulnerability in the decode_cell_data function in libavcodec/indeo3.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to an "out of picture write." (CVE-2012-2776)

Unspecified vulnerability in the decode_pic function in libavcodec/cavsdec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to "width/height changing in CAVS," a different vulnerability than CVE-2012-2784. (CVE-2012-2777)

Unspecified vulnerability in the decode_frame function in libavcodec/indeo5.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to an invalid "gop header" and decoding in a "half initialized context." (CVE-2012-2779)

Unspecified vulnerability in the decode_pic function in libavcodec/cavsdec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to "width/height changing in CAVS," a different vulnerability than CVE-2012-2777. (CVE-2012-2784)

Unspecified vulnerability in the decode_wdlt function in libavcodec/dfa.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to an "out of array write." (CVE-2012-2786)

Unspecified vulnerability in the decode_frame function in libavcodec/indeo4.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to the "setup width/height." (CVE-2012-2787)

Unspecified vulnerability in the avi_read_packet function in libavformat/avidec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to an "out of array read" when a "packet is shrunk." (CVE-2012-2788)

Unspecified vulnerability in the avi_read_packet function in libavformat/avidec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to a large number of vector coded coefficients (num_vec_coeffs). (CVE-2012-2789)

Unspecified vulnerability in the read_var_block_data function in libavcodec/alsdec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to the "number of decoded samples in first sub-block in BGMC mode." (CVE-2012-2790)

Unspecified vulnerability in the lag_decode_zero_run_line function in libavcodec/lagarith.c in FFmpeg before 0.11 has unknown impact and attack vectors related to "too many zeros." (CVE-2012-2793)

Unspecified vulnerability in the decode_mb_info function in libavcodec/indeo5.c in FFmpeg before 0.11 has unknown impact and attack vectors in which the "allocated tile size ... mismatches parameters." (CVE-2012-2794)

Unspecified vulnerability in the vc1_decode_frame function in libavcodec/vc1dec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to inconsistencies in "coded slice positions and interlacing" that trigger "out of array writes." (CVE-2012-2796)

Unspecified vulnerability in the decode_dds1 function in libavcodec/dfa.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to an "out of array write." (CVE-2012-2798)

Unspecified vulnerability in the ff_ivi_process_empty_tile function in libavcodec/ivi_common.c in FFmpeg before 0.11 has unknown impact and attack vectors in which the "tile size ... mismatches parameters" and triggers "writing into a too small array." (CVE-2012-2800)

Unspecified vulnerability in libavcodec/avs.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to dimensions and "out of array writes." (CVE-2012-2801)

Unspecified vulnerability in the ac3_decode_frame function in libavcodec/ac3dec.c in FFmpeg before 0.11 has unknown impact and attack vectors, related to the "number of output channels" and "out of array writes." (CVE-2012-2802)

Alerts:
Gentoo 201406-28 libav 2014-06-26
Gentoo 201310-12 ffmpeg 2013-10-25
Mandriva MDVSA-2013:079 ffmpeg 2013-04-09
Debian DSA-2624-1 ffmpeg 2013-02-16
Ubuntu USN-1705-1 libav 2013-01-28
Ubuntu USN-1675-1 ffmpeg 2012-12-19
Ubuntu USN-1674-1 libav 2012-12-19
Mageia MGASA-2012-0331 ffmpeg 2012-11-17
Ubuntu USN-1630-1 libav 2012-11-12

Comments (none posted)

mantisbt: multiple vulnerabilities

Package(s):mantisbt CVE #(s):CVE-2011-3578 CVE-2011-3755 CVE-2012-1121 CVE-2012-2691
Created:November 9, 2012 Updated:November 14, 2012
Description:

Cross-site scripting (XSS) vulnerability in bug_actiongroup_ext_page.php in MantisBT before 1.2.8 allows remote attackers to inject arbitrary web script or HTML via the action parameter, related to bug_actiongroup_page.php, a different vulnerability than CVE-2011-3357. (CVE-2011-3578)

MantisBT 1.2.4 allows remote attackers to obtain sensitive information via a direct request to a .php file, which reveals the installation path in an error message, as demonstrated by view_all_inc.php and certain other files. (CVE-2011-3755)

MantisBT before 1.2.9 does not properly check permissions, which allows remote authenticated users with manager privileges to (1) modify or (2) delete global categories. (CVE-2012-1121)

The mc_issue_note_update function in the SOAP API in MantisBT before 1.2.11 does not properly check privileges, which allows remote attackers with bug reporting privileges to edit arbitrary bugnotes via a SOAP request. (CVE-2012-2691)

Alerts:
Fedora FEDORA-2012-18294 mantis 2012-11-24
Gentoo 201211-01 mantisbt 2012-11-08
Fedora FEDORA-2012-18299 mantis 2012-11-24

Comments (none posted)

nspluginwrapper: insecure Private Browsing

Package(s):nspluginwrapper CVE #(s):CVE-2011-2486
Created:November 13, 2012 Updated:November 22, 2012
Description: From the Red Hat advisory:

It was not possible for plug-ins wrapped by nspluginwrapper to discover whether the browser was running in Private Browsing mode. This flaw could lead to plug-ins wrapped by nspluginwrapper using normal mode while they were expected to run in Private Browsing mode.

Alerts:
Mageia MGASA-2012-0336 nspluginwrapper 2012-11-21
Oracle ELSA-2012-1459 nspluginwrapper 2012-11-13
Red Hat RHSA-2012:1459-01 nspluginwrapper 2012-11-13
Scientific Linux SL-nspl-20121113 nspluginwrapper 2012-11-13
CentOS CESA-2012:1459 nspluginwrapper 2012-11-13

Comments (none posted)

openvswitch: unintended file access

Package(s):openvswitch CVE #(s):CVE-2012-3449
Created:November 13, 2012 Updated:November 14, 2012
Description: From the CVE entry:

Open vSwitch 1.4.2 uses world writable permissions for (1) /var/lib/openvswitch/pki/controllerca/incoming/ and (2) /var/lib/openvswitch/pki/switchca/incoming/, which allows local users to delete and overwrite arbitrary files.

Alerts:
Fedora FEDORA-2012-17477 openvswitch 2012-11-13

Comments (none posted)

plib: buffer overflow

Package(s):plib CVE #(s):CVE-2012-4552
Created:November 12, 2012 Updated:November 22, 2012
Description: From the Red Hat bugzilla:

Plib is prone to a stack based buffer overflow in the error function in ssg/ssgParser.cxx when it loads 3d model files as X (Direct x), ASC, ASE, ATG, and OFF, if a very long error message is passed to the function.

Alerts:
openSUSE openSUSE-SU-2013:0146-1 plib 2013-01-23
openSUSE openSUSE-SU-2012:1506-1 plib 2012-11-20
Mageia MGASA-2012-0334 plib 2012-11-21
Fedora FEDORA-2012-17482 plib 2012-11-11
Fedora FEDORA-2012-17465 plib 2012-11-11

Comments (none posted)

radsecproxy: SSL certificate verification weakness

Package(s):radsecproxy CVE #(s):CVE-2012-4523 CVE-2012-4566
Created:November 12, 2012 Updated:November 14, 2012
Description: From the Debian advisory:

Ralf Paffrath reported that Radsecproxy, a RADIUS protocol proxy, mixed up pre- and post-handshake verification of clients. This vulnerability may wrongly accept clients without checking their certificate chain under certain configurations.

Raphael Geissert spotted that the fix for CVE-2012-4523 was incomplete, giving origin to CVE-2012-4566.

Alerts:
Debian DSA-2573-1 radsecproxy 2012-11-10

Comments (none posted)

xen: denial of service

Package(s):xen CVE #(s):CVE-2012-4544
Created:November 12, 2012 Updated:February 8, 2013
Description: From the CVE entry:

The PV domain builder in Xen 4.2 and earlier does not validate the size of the kernel or ramdisk (1) before or (2) after decompression, which allows local guest administrators to cause a denial of service (domain 0 memory consumption) via a crafted (a) kernel or (b) ramdisk.

Alerts:
SUSE SUSE-SU-2014:0470-1 Xen 2014-04-01
SUSE SUSE-SU-2014:0446-1 Xen 2014-03-25
SUSE SUSE-SU-2014:0411-1 Xen 2014-03-20
Debian DSA-2636-2 xen 2013-03-03
Debian DSA-2636-1 xen 2013-03-01
Scientific Linux SL-xen-20130207 xen 2013-02-07
Oracle ELSA-2013-0241 xen 2013-02-07
CentOS CESA-2013:0241 xen 2013-02-07
Red Hat RHSA-2013:0241-01 xen 2013-02-07
SUSE SUSE-SU-2012:1487-1 Xen 2012-11-16
openSUSE openSUSE-SU-2012:1573-1 XEN 2012-11-26
openSUSE openSUSE-SU-2012:1572-1 XEN 2012-11-26
SUSE SUSE-SU-2012:1486-1 Xen 2012-11-16
Fedora FEDORA-2012-17408 xen 2012-11-09
SUSE SUSE-SU-2012:1503-1 libvirt 2012-11-19
Fedora FEDORA-2012-17204 xen 2012-11-09

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 3.7-rc5, released on November 11. "This is quite a small -rc, I'm happy to say. -rc4 was already fairly calm, and -rc5 has fewer commits still. And more importantly, apart from one revert, and a pinctl driver update, it's not just a fairly small number of commits, they really are mostly one-liners."

Stable updates: no stable updates have been released in the last week. 3.2.34 is in the review process as of this writing; it can be expected on or after November 16.

Comments (none posted)

Quotes of the week

I for one do mourn POSIX, and standardization in general. I think it's very sad that a lot of stuff these days is moving forward without going through a rigorous standardization. We had this little period known affectionately as the "Unix Wars" in the 1980s/90s and we're well on our way to a messy repeat in the Linux space.
Jon Masters

You're missing something; that is one of the greatest powers of open source. The many eyes (and minds) effect. Someone out there probably has a solution to whatever problem, the trick is to find that person.
Russell King

Unfortunately there is no EKERNELSCREWEDUP, so we usually use EINVAL.
Andrew Morton

Comments (8 posted)

Introducing RedPatch (Ksplice Blog)

Back in early 2011, we looked at changes to the way Red Hat distributed its kernel changes. Instead of separate patches, it switched to distributing a tarball of the source tree—a move which was met with a fair amount of criticism. The Ksplice team at Oracle has just announced the availability of a Git tree that breaks the changes up into individual patches again. "The Ksplice team is happy to announce the public availability of one of our git repositories, RedPatch. RedPatch contains the source for all of the changes Red Hat makes to their kernel, one commit per fix and we've published it on oss.oracle.com/git. With RedPatch, you can access the broken-out patches using git, browse them online via gitweb, and freely redistribute the source under the terms of the GPL." (Thanks to Dmitrijs Ledkovs.)

Comments (85 posted)

Masters: ARM atomic operations

Jon Masters has put together a summary of how atomic operations work on the ARM architecture for those who are not afraid of the grungy details. "To provide for atomic access to a given memory location, ARM processors implement a reservation engine model. A given memory location is first loaded using a special 'load exclusive' instruction that has the side-effect of setting up a reservation against that given address in the CPU-local reservation engine. When the modified value it is later written back into memory, using the corresponding 'store exclusive' processor instruction, the reservation engine verifies that it has an outstanding reservation against that given address, and furthermore confirms that no external agents have interfered with the memory commit. A register returns success or failure."

Comments (5 posted)

A break for linux-next

Linux-next maintainer Stephen Rothwell has announced that the November 15 linux-next release will be the last until November 26.

Full Story (comments: none)

3.5.x to get extended stable support

Herton Ronaldo Krzesinski has announced that Canonical intends to maintain the 3.5.x stable kernel, which is shipped in the Ubuntu 12.10 release. This kernel will be supported as long as 12.10, currently planned to be through the end of March, 2014.

Full Story (comments: 1)

Kernel development news

NUMA in a hurry

By Jonathan Corbet
November 14, 2012
The kernel's behavior on non-uniform memory access (NUMA) systems is, by most accounts, suboptimal; processes tend to get separated from their memory, leading to lots of cross-node traffic and poor performance. Until now, the work to improve this situation has been a story of two competing patch sets; it recently appeared that one of them may be set to be merged as the result of decisions made outside of the community's view. But nothing in memory management is ever simple, so it should be unsurprising that the NUMA scheduling discussion has become more complicated.

On November 6, memory management hacker Mel Gorman, who had not contributed code of his own toward NUMA scheduling so far, posted a new patch series called "Foundation for automatic NUMA balancing," or "balancenuma" for short. He pointed out that there were objections to both of the existing approaches to NUMA scheduling and that it was proving hard to merge the best from each. So his objective was to add enough infrastructure to the memory management subsystem to make it easy to experiment with different NUMA placement policies. He also implemented a placeholder policy of his own:

The actual policy it implements is a very stupid greedy policy called "Migrate On Reference Of pte_numa Node (MORON)". While stupid, it can be faster than the vanilla kernel and the expectation is that any clever policy should be able to beat MORON.

In short, the MORON policy works by instantly migrating pages whenever a cross-node reference is detected using the NUMA hinting mechanism. Mel's second version, posted one week later, fixes a number of problems, adds the "home node" concept (that tries to keep processes and their memory on a single "home" NUMA node), and adds some statistics gathering to implement a "CPU follows memory" policy that can move a process to a new home node if it appears that better memory locality would result.

Andrea Arcangeli, author of the AutoNUMA approach, said that balancenuma "looks OK" and that AutoNUMA could be built on top of it. Ingo Molnar, instead, was less accepting, saying "I've picked up a number of cleanups from your series and propagated them into tip:numa/core tree." He later added a request that Mel rebase his work on top of the numa/core tree. He clearly did not see the patch set as a "foundation" on which to build. A new numa/core patch set was posted on November 13.

Peter Zijlstra, meanwhile, has posted an "enhanced NUMA scheduling with adaptive affinity" patch set. This one does away with the "home node" concept altogether; instead, it looks at memory access patterns to determine where a process's memory lives and who that memory might be shared with. Based on that information, the CPU affinity mechanism is used to move processes to the appropriate nodes. Peter says:

Note that this adaptive NUMA affinity mechanism integrated into the scheduler is essentially free of heuristics - only the access patterns determine which tasks are related and grouped. As a result this adaptive affinity code is able to move both threads and processes close(r) to each other if they are related - and let them spread if they are not.

This patch set has not gotten a lot of review comments, and it does not appear to have been folded into the numa/core series as of this writing.

What will happen in 3.8?

The numa/core approach remains in linux-next, which is intended to be the final stage for code that is intended to be merged. And, indeed, Ingo has reiterated that he plans to merge this code for the 3.8 cycle, saying "numa/core sums up the consensus so far." The use of that language might rightly raise some eyebrows; when there are between two and four competing patch sets (depending on how one counts) aimed at the same problem, the term "consensus" does not usually come to mind. And, indeed, it seems that this consensus does not yet exist.

Andrew Morton has been overtly grumpy; the existence of numa/core in linux-next has made the management of his tree (which is based on linux-next) difficult — his tree needs to be ready for the 3.8 merge window where, he thinks, numa/core should not be under consideration:

And yes, I'm assuming you're not targeting 3.8. Given the history behind this and the number of people who are looking at it, that's too hasty... And I must say that I deeply regret not digging my heels in when this went into -next all those months ago. It has caused a ton of trouble for me and for a lot of other people.

Hugh Dickins, a developer who is not normally associated with this sort of discussion, chimed in as well:

People are still reviewing and comparing competing solutions. Maybe this latest will prove to be closest to the right answer, maybe it will not. It's, what, about two days old right now?

If we had wanted to push in a good solution a little prematurely, we would surely have chosen Andrea's AutoNUMA months ago, despite efforts to block it; and maybe we shall still want to go that way.

Please, forget about v3.8, cut this branch out of linux-next, and seek consensus around getting it right for v3.9.

Rik van Riel agreed, saying "Having unreviewed (some of it NAKed) code sitting in tip.git and you trying to force it upstream is not the right way to go." He also suggested that, if anything should be considered for merging in 3.8, it would be Mel's foundation patches.

And that is where the discussion stands as of this writing. There is a lot of uncertainty about what might happen with NUMA scheduling in 3.8, meaning that, most likely, nothing will happen at all. It is highly unlikely that Linus would merge the numa/core set in the face of the above complaints; he would be far more likely to sit back and tell the developers involved to work out something they can all agree with. So this is a discussion that might go on for a while yet.

Making changes to the memory management subsystem is a famously hard thing to do, especially when the changes are as large as those being considered here. But there is another factor that is complicating this particular situation. As the term "NUMA scheduling" suggests, this is not just a memory management problem. The path to improved NUMA performance will require coordinated changes to — and greater integration between — the memory management subsystem and the CPU scheduler. It's telling that the developers on one side of this divide are primarily associated with scheduler development, while those on the other side are mostly memory management folks. Each camp is, in a sense, invading the other's turf in an attempt to create a comprehensive solution to the problem; it is not surprising that some disagreements have emerged.

Also implicit in this situation is that Linus is unlikely to attempt to resolve the disagreement by decree. There are too many developers and too many interrelated core subsystems involved. So some sort of rough consensus will have to be found. Your editor's explicitly unreliable prediction is that little NUMA-related work will be merged in the 3.8 development cycle. Under pressure from several directions, the developers involved will figure out how to resolve their biggest differences in the next few months. The resulting code will likely be at least partially merged for 3.9 — later than many would wish, but the end result is likely to be better than would be seen with a patch set rushed into 3.8.

Comments (none posted)

vmpressure_fd()

By Jonathan Corbet
November 14, 2012
One of the nice features of virtual memory is that applications do not have to concern themselves with how much memory is actually available in the system. One need not try to get too much work done to realize that some applications (or their developers) have taken that notion truly to heart. But it has often been suggested that the system as a whole would work better if interested applications could be informed when memory is tight. Those applications could react to that news by reducing their memory requirements, hopefully heading off thrashing or out-of-memory situations. The latest proposal along those lines is a new system call named vmpressure_fd(); it is unlikely to be merged in its current form, but it still merits a look.

The idea behind Anton Vorontsov's vmpressure_fd() patch set is to create a mechanism by which the kernel can inform user space when the system is under memory pressure. An application using this call would start by filling out a vmpressure_config structure:

    #include <linux/vmpressure.h>

    struct vmpressure_config {
    	__u32 size;
	__u32 threshold;
    };

The size field should hold the size of the structure; it is there as a sort of versioning mechanism should more fields be added to the structure in the future. The threshold field indicates the minimum level of notification the application is interested in; the available levels are:

VMPRESSURE_LOW
The system is out of free memory and is having to reclaim pages to satisfy new allocations. There is no particular trouble in performing that reclamation, though, so the memory pressure, while non-zero, is low.

VMPRESSURE_MEDIUM
A medium level of memory pressure is being experienced — enough, perhaps, to cause some swapping to occur.

VMPRESSURE_OOM
Memory pressure is at desperate levels, and the system may be about to fall prey to the depredations of the out-of-memory killer.

An application might choose to do nothing at low levels of memory pressure, clean up some low-value caches at the medium level, and clean up everything possible at the highest level of pressure. In this case, it would probably set threshold to VMPRESSURE_MEDIUM, since notifications at the VMPRESSURE_LOW level are not actionable.

Signing up for notifications is a simple matter:

    int vmpressure_fd(struct vmpressure_config *config);

The return value is a file descriptor that can be read to obtain pressure events in this format:

    struct vmpressure_event {
        __u32 pressure;
    };

The current interface only supports blocking reads, so a read() on the returned file descriptor will not return until a pressure notification has been generated. Applications can use poll() to determine whether a notification is available; the current patch does not support asynchronous notification via the SIGIO signal.

Internally, the virtual memory subsystem has no simple concept of memory pressure, so the patch has to add one. To that end, the "reclaimer inefficiency index" is calculated by looking at the number of pages examined by the reclaimer and how many of those pages could not be reclaimed. The need to look at large numbers of pages to find reclaim candidates indicates that reclaimable pages are getting hard to find — that the system is under memory pressure in other words. The index is simply the ratio of reclamation failures to the number of pages examined, expressed as a percentage.

This percentage is calculated over a "window" of pages examined; by default, it is generated each time the reclaimer looks at 256 pages. This value can be changed by tweaking the new vmevent_window sysctl knob. There are also controls to set the levels at which the various notifications occur: vmevent_level_medium (default 60) and vmevent_level_oom (default 99); the "low" level is wired at zero, so it will trigger anytime that the system is actively looking for pages to reclaim.

An additional mechanism exists to detect the out-of-memory case, since it can be hard to distinguish it using just the reclaimer inefficiency index. The reclaim code includes the concept of a "priority" which controls how aggressive it can be to reclaim pages; its value starts at 12 and falls over time as attempts to locate enough pages fail. If the priority falls to four (by default; it can be set with the vmevent_level_oom_priority knob), the system is deemed to be heading into an out-of-memory state and the notification is sent.

Some reviewers questioned the need for a new system call. We already have a system call — eventfd() — that exists to create file descriptors for notifications from the kernel. Actually using eventfd() tends to involve an interesting dance where the application gets a file descriptor from eventfd(), opens a sysfs file, and writes the file descriptor number into the sysfs file to connect it to a specific source of events. But it is an established pattern that might be best maintained here. Another reviewer suggested using the perf events subsystem, but Anton believes, not without reason, that perf brings a lot of complexity to something that should be relatively simple.

The other complaint has to do with the integration (or lack thereof) with the "memcg" control-group-based memory usage controller. Memcg already has a notification mechanism (described in Documentation/cgroups/memory.txt) that can inform a process when a control group is running out of memory; it might make sense to use the same mechanism for this purpose. Anton responded that the memcg mechanism does not provide the same information, it does not account for all memory use, and that it requires the use of control groups — not always a popular kernel feature. Still, even if vmpressure_fd() is merged as a separate mechanism, it will at least have to be extended to work at the control group level as well. The code shows that this integration has been thought about, but it has not yet been implemented.

Given these concerns, it seems unlikely that the current patch set will find its way into the mainline. But there is a clear desire for this kind of functionality in all kinds of use cases from very large systems to very small ones (Anton's patches were posted from a linaro.org address). So, one way or another, a kernel in the near future will probably have the ability to inform processes that it is experiencing some memory pressure. The next challenge will then be getting applications to use those notifications and reduce that pressure.

Comments (1 posted)

LCE: Realtime, present and future

By Jonathan Corbet
November 13, 2012
As the standing-room-only crowd at Thomas Gleixner's LinuxCon Europe 2012 talk showed, there is still a lot of interest in the progress of the realtime preemption patch set. Your editor attended with the main purpose of heckling Thomas, thinking that our recent Realtime Linux Workshop coverage would include most of the information to be heard in Barcelona. As it turns out, though, there were some new things to be learned, along with some concerns about the possible return of an old problem in a new form.

At the moment, realtime development is concentrated in three areas. The first is the ongoing work to mitigate problems with software interrupts; that has been covered here before and has not changed much since then. On the memory management front, the SLUB allocator is now the default for realtime kernels. A few years ago, SLUB was considered hopeless for the realtime environment, but it has improved considerably since then. It now works well when allocating objects from the caches. Anytime it becomes necessary to drop into the page allocator, though, behavior will not be deterministic; there is little to be done about that.

Finally, the latest realtime patches include a new option called PREEMPT_LAZY. Thomas has been increasingly irritated by the throughput penalty experienced by realtime users; PREEMPT_LAZY is an attempt to improve that situation. It is an option that applies only to the scheduling of SCHED_OTHER tasks (the non-realtime part of the workload); it works by deferring context switches after a task is awakened, even if the newly-awakened task has a higher priority. Doing so reduces determinism, but SCHED_OTHER was never meant to be deterministic in the first place. The benefit is a reduction in context switches and a corresponding improvement in SCHED_OTHER throughput.

When SLUB and PREEMPT_LAZY are enabled, the realtime kernel shows a 60% throughput increase with some workloads. Someday, Thomas said (not entirely facetiously), realtime will be faster than mainline, especially for workloads involving a lot of networking. He is looking forward to the day when the realtime kernel offers superior network performance; there should be some interesting conversations with the networking developers when that happens.

The realtime kernel, he claimed in summary, is at production quality.

Looking at all the code that has been produced in the realtime effort, Thomas concluded that, at this point, 95% of it has been upstreamed into the mainline kernel. What is missing before the rest can go upstream is "mainline sellable" solutions for memory management, high-resolution timers (hrtimers), and software interrupts. The memory management work is the most complex, and the patches are still "butt ugly." A significant amount of cleanup work will be required before those patches can make it into the mainline.

The hrtimer code, instead, just requires harassing the maintainer (a certain Thomas Gleixner) to get it into shape; it's just a matter of time. There needs to be a "less horrible" way to solve the software interrupt problem; the search is on. The rest of the realtime tree, including the infamous locking patches, is all nicely self-contained and should not be a problem for upstreaming.

So what is coming in the future? The next big feature looks to be CPU isolation. This is not strictly a realtime need, but it is useful for some realtime users. CPU isolation gives one or more processors over to user-space code, with no kernel overhead at all (as long as that code does not make any system calls, naturally). It is useful for applications that cannot wait even for a deterministic interrupt response; instead, they poll a device so that they can respond even more quickly to events. There are also high-performance users who pour vast amounts of money into expensive hardware; these users are willing to expend great effort for a 1% performance gain. For some kinds of workloads, the increased cache locality offered by CPU isolation can give an improvement of 3-4%, so it is unsurprising that these users are interested. A number of developers are working on this problem; some sort of solution should be ready before too long.

Also coming is the long-awaited deadline scheduler. According to Thomas, this code shows that, sometimes, it is possible to get useful work out of academic institutions. The solution is close, he said, and could possibly even be ready for the 3.8 merge window. There is also interest in doing realtime work from a KVM guest system. That will allow us to offload our realtime automation tasks into the cloud. Thomas clearly thought that virtualized realtime was a bit of a silly idea, but there is evidently a user community looking for this capability.

Where are the contributors?

Given that things seem so close, Thomas asked himself why things were taking so long; the realtime effort has been going for some eight years now. The answer is that the problems are hard and that the manpower to solve them has been lacking. Noting that few developers have been contributing to the realtime tree, Thomas started to wonder if there was a lack of interest in the concept overall. Perhaps all this work was being done, but nobody was using it?

An opportunity to answer that question presented itself when kernel.org went down for an extended period in 2011. It became necessary to provide an alternative site for people wanting to grab the realtime patches; that, in turn, was the perfect opportunity to obtain download statistics. It turns out that most realtime patch set releases saw about 3,000 downloads within the first three days. About 30% of those went to European corporations, 25% to American corporations, 20% to Asian corporations, 5% to academic institutions, and 20% "all over." 75% of the downloads, he said, were done by users at identifiable corporations constituting a "who's who" of the industry.

All told, there were 2,250 corporations that downloaded the realtime patch set while this experiment was taking place. Of all those companies, less than 5% reported back to the realtime developers in any way, be it a patch, a bug report, or even an "it works, thanks" sort of message. A 5% response rate may seem good; it should be enough to get the bugs fixed. But a further look showed that 80% of the reports came from Red Hat and the Open Source Automation Development Laboratory; add in Thomas's company Linutronix, and the number goes up to 90%. "What," he asked the audience, "are the rest of you doing?"

Thomas's conclusion is that something is wrong. Perhaps we are seeing a return of the embedded nightmare in a new form? As in those days, he does see private reports from companies that are keeping all of their work secret. Private reports are better than nothing, but he would really like to see more participation in the community: more success reports, bug reports, documentation contributions, and fixes. Even incorrect fixes are a great thing; they give a lot of information about the problem and ease the process of making a proper fix.

To conclude, Thomas noted that some people have complained that his roadmap slides are insufficiently serious. In response, he said, he took a few days off and took a marketing course; that has allowed him to produce a more suitable roadmap that looks like this:

[Thomas's new roadmap]

Perhaps the best conclusion to be drawn from this roadmap is that Thomas is unlikely to switch from development to marketing anytime in the near future. That is almost certainly good news for Linux users — and probably for marketing people as well.

[Your editor would like to thank the Linux Foundation for funding his travel to Barcelona.]

Comments (45 posted)

Patches and updates

Kernel trees

Architecture-specific

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous

  • Lucas De Marchi: kmod 11 . (November 11, 2012)

Page editor: Jonathan Corbet

Distributions

Crowding out OpenBSD

By Jonathan Corbet
November 13, 2012
Unix as a whole predates Linux by many years, and even the rather younger BSD variant was well into its teens by the time Linus released his first kernel. BSD networking defined and enabled the Internet. This illustrious history notwithstanding, BSD has long since ceded the spotlight to Linux in most settings. As Linux has come to dominate the free software development world, the result has been some occasional pain for other operating system distributions. Now, as a recent discussion on an OpenBSD mailing list shows, BSD developers are feeling that pain in a heightened manner. This situation has some serious implications.

On November 6, longtime OpenBSD hacker Marc Espie complained to the OpenBSD project's "tech" list about behavior from "upstream vendors" that, in his view, is proving harmful to the OpenBSD project. In short, projects like desktop environments are increasingly adding dependencies on changes being made at other levels of the (Linux) systems on which they are developed. That makes it harder for OpenBSD to port and support that code, to the point that "if you don't have tens of people, it becomes more and more of a losing battle". The OpenBSD project doesn't have those people, so it is hurting. Marc continued, saying:

It's also quickly turning Posix and Unix into a travesty: either you have the linux goodies, or you don't. And if you don't, you can forget anything modern...

I'm pretty sure there's a lot of good intention behind the "progress" in recent desktops. But this is turning the field of OS distributions into a wasteland. Either you're a modern linux with pulseaudio and pam and systemd, or you're dying. So much for the pioneer spirit of opensource, where you were free to innovate and do cool things, and more or less have interesting software able to run on your machine...

One could easily poke holes in this complaint; the characterization of PAM as "modern" is somewhat amusing; it is 1990s technology. There is an evident case of cognitive dissonance shown in the simultaneous desire for the comfortable "Posix and Unix" world of decades past and the ability to "innovate and do cool things." It is difficult to simultaneously innovate and stand still, but that is what Marc seems to be asking for here.

In a subsequent message, Marc acknowledged the real source of the problem: OpenBSD simply does not have enough developers to influence the direction of projects like X.org, GNOME, or KDE. Antoine Jacoutot, who works on GNOME for OpenBSD, went further, stating that almost all of the work is being done by "Linux people" with little or no representation from other systems. Why, he asked, should they be concerned about portability in that situation?

In most free software projects, the developers who write the code have the most say over the direction the project takes. The BSD distributions have trouble coming up with enough developers to do the ports to their own systems; finding developers to help push projects forward — and influence their direction — is an even taller order. In the absence of active developers, they are, in a sense, just another user, able to make requests but with no ability to create the changes they would like to see. So big software projects move forward in directions that are not always convenient for systems like BSD.

One could argue that the Linux community is throwing its weight around. But we are really just seeing the way that free software development projects work; in the early 1990s, BSD-oriented developers were equally unconcerned about the difficulty of porting their code to Linux. When developers have enough problems of their own to solve, trying to make life easier for operating systems they do not use tends to end up fairly low on the list of priorities.

Where this is heading seems reasonably clear: without the ability to participate in these projects, or at least to keep up with them, the BSD projects will have an increasingly hard time supporting contemporary desktop environments. Their hardware support will continue to lag. They will not be able to take advantage of the work that is being done to operate well on mobile systems. There will be fewer and fewer settings where BSD-based systems will operate in the way their users want.

That, needless to say, is a recipe for irrelevance and, eventually, disappearance.

It may be tempting to shrug one's shoulders and say that none of this matters anymore. Your editor, whose first kernel hacking experience was on a BSD-running VAX, would not be so sanguine. But, in truth, even the most determined Linux fanboy should be concerned about a development like this. BSD is more than a repository of a great deal of Unix history and the continued home of a great many talented developers. It is an important part of the free software ecosystem.

BSD is a place where developers can experiment with different approaches to kernel design, filesystems, packaging systems, and more. OpenBSD remains a center for security-related work that does not exist to the same degree in the Linux world. The existence of alternative systems gives us resilience in case Linux is undermined by legal issues, security problems, or corporate mismanagement. Monocultures are unhealthy in general; a Linux monoculture may be the ultimate vindication of our approach to development, but it still would not be a good thing for the world as a whole. As in natural ecosystems, diversity is a source of strength.

Even so, a monoculture may be where we are headed, sometime years into the future. Economies of scale and network effects push in that direction; the fact that Linux is the best system for so many purposes helps to ensure that it will continue to get better in the future while alternatives will languish. Few developers will be able to find the time to, in effect, subsidize alternative operating systems by holding back progress in Linux. It is an outcome to anticipate and, possibly, plan for, but it is not one to celebrate or to try to hasten. If other free operating systems start to vanish, we will eventually realize that we were better off when they were still around.

Comments (237 posted)

Brief items

Distribution quote of the week

It turns out that software development is hard. It's especially hard when you have a hugely complicated system with no central management and no real incentive for most of the skilled workers to cooperate on sections of the project that influence each other. It's nigh-near impossible when you have the same set of people tasked to simultaneously stabalise an upcoming release and do the development work for the forthcoming release. The miracle isn't that Anaconda is taking longer than desirable. It's that it's as close to finished as it is.
-- Matthew Garrett

Comments (none posted)

Parsix GNU/Linux 4.0 released

Parsix GNU/Linux is a Debian derivative distribution with a focus on desktop performance and time-based releases. The 4.0 release is available now; see the release notes for details. "Parsix GNU/Linux 4.0 (code name Gloria) brings tons of updated packages, faster live boot, improved installer system and quality new features. This version has been synchronized with Debian testing repositories as of November 7, 2012 and brings lot of updated packages compared to Parsix 3.7 aka Raul. Parsix Gloria is project's first release with GNOME 3 series and ships with LibreOffice productivity suit by default."

Full Story (comments: 3)

ROSA Enterprise Linux Server "Helium" 2012

ROSA has announced the release of ROSA Enterprise Linux Server "Helium" 2012. RELS 2012 is based on Red Hat Enterprise Linux, and includes additional applications from upstream sources as well as ROSA tools and applications.

Full Story (comments: none)

Distribution News

Debian GNU/Linux

Bits from the release team - Freeze update

The Debian release team has an update on the Wheezy freeze status. "At the time of writing, we have 403 RC Bugs left that need your attention. Once this number gets lower, we will increase the usage of tags similar to last release, making it clear which bugs will be ignored and which are blocking the release process." The bits also lists a number of upcoming Bug Squashing Parties.

Full Story (comments: none)

Fedora

Fedora 18 now scheduled for January 2013

The Fedora Engineering Steering Committee (FESCo) decided to push the Fedora 18 beta back two weeks until November 27. That in turn pushes the full release until January 8 of next year. "Today at FESCo meeting [1] it was decided to slip Fedora 18 Beta release by *two* weeks to give the Installer team, the new upgrade tool and Secure Boot time to finish and polish these features to meet our release quality standards." We looked at some of the problems facing Fedora 18 in last week's edition.

Full Story (comments: 14)

Announcing the rawhide kernel nodebug repository

Fedora rawhide kernels with debug turned off are available from the new nodebug repository. "Bugs against this kernel should be filed in bugzilla against the rawhide kernel."

Full Story (comments: none)

openSUSE

The openSUSE Board Election 2012

The openSUSE Election Committee has announced the opening of the 2012 Board Election. "So, if you want to participate in the openSUSE board and influence the future direction of the project please stand up and announce your candidacy. If you want to vote for the candidates, please make sure your openSUSE membership is approved. If you are a contributor of openSUSE but you are not a member yet, apply for membership now and be a part of the changes to come."

Comments (none posted)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Page editor: Rebecca Sobol

Development

KDE Bodega

By Jake Edge
November 14, 2012

"App" stores are all the rage these days, with everyone from Apple and Amazon to mobile phone carriers and free software projects trying their hand. KDE's Aaron Seigo recently announced Bodega, which is a platform for publishing and distributing digital content of various sorts. Bodega is initially targeted at Plasma Active, KDE's touch-device-friendly mobile user interface, but making it more widely applicable is definitely on the agenda. The first version of Bodega actually shipped with Plasma Active 3 in mid-October "and people are, indeed, using it", Seigo said.

Bodega goes beyond just serving up "apps", as it is meant to handle anything that can be delivered over the network, including books, music, artwork, services, and, yes, applications. Plasma Active's "Add Ons" application uses the Bodega client code, which is based on lots of KDE-specific libraries and frameworks. The server, on the other hand, doesn't use KDE or even Qt, but instead uses node.js, PostgreSQL, and Redis, none of which are particularly KDE-related. One would not normally expect to see a program like that as part of the KDE repository.

But the whole project—client and server—is being proposed for inclusion into the KDE project. Seigo addresses questions about the server side in a blog post. He notes that the recently adopted KDE Manifesto makes it easier to see why Bodega makes sense as a KDE project. In the past, it was more difficult:

Prior to the Manifesto, it was a lot harder to identify if something like Bodega ought to belong under the KDE umbrella. Other server-side projects struggled with this exact issue in the past, at times with rather unfortunate consequences.

But, using the Manifesto and the related Principles of a KDE Project, he makes a convincing case for bringing all of Bodega into KDE. "Now it is quite straight-forward; we simply have to ask, 'Does it push forward KDE's technical agenda, and does it meet KDE's documented principles and commitments?'"

Bodega is organized around storefronts, each of which can give a different view into a collective pile of content items, organized using tags. The example Seigo uses is the KDE project itself, which could run one Bodega instance that would allow each sub-project to have its own "catalog" (i.e. storefront view). That catalog could contain items from the common pool and items that are specific to the sub-project, along with content from elsewhere on the net (e.g. free e-books from Project Gutenberg).

Purchases are made using a points-based system, which is modeled on online video game stores. Those points can be earned in a variety of ways, or they can be purchased via credit card. Importantly, there is no requirement for pricing the items at all. Free (as in beer) Bodegas are definitely part of the plan.

The existing client integrates well with Plasma Workspaces, Seigo said, and an HTML 5 version is likely. Right now, the client can install applications (via PackageKit), Plasma packages, e-books, and wallpapers, but it can be extended to install other kinds of content.

In addition to putting Bodega out for review, and possible inclusion into KDE, Seigo is, of course, looking for more contributors. There is a fairly extensive, if rough, "to do" list on the home page, which is one place to start. He is also interested in feedback, naturally.

Since Bodega is free software, one of the first complaints heard was about the color name. In this case, though, the complaints may be somewhat more than just bikeshedding. Evidently, depending on one's location, "bodega" can mean anything from a small mini-market or grocery store (likely in a Spanish-speaking area) to a winery to a cheap place to drink and get drunk. The latter is an association some would rather avoid. While that meaning is used in several places in Europe, there was not any huge push to change the name—at least yet.

More substantively, Josef Spillner suggested two possibilities to add to Bodega: services and physical goods. Basically the idea would be that Bodega could streamline delivery and payment options for people to sell or share different kinds of physical goods. In addition, services like online storage or ownCloud synchronization accounts could be integrated into Bodega.

Both of those ideas seemed plausible to Seigo. In fact, work has already been done on integration with ownCloud. Physical goods have requirements like shipping and inventory management that certainly could be added, though they are likely to be further out, he said, "but i won't exclude it as an idea for the future".

Overall, Bodega is an interesting vision of a free software marketplace. It is clearly targeted at many different kinds of uses, for lots of different projects and, perhaps eventually, companies. While it ticks the "app store" checkbox for Plasma Active, it is aimed far beyond just that.

Comments (11 posted)

Brief items

Quotes of the week

I personally believe LLVM is the latest shiny thing. It's not better than gcc, it's just the new new new new cloud cloud cloud! equivalent of compiler technology so everyone is falling over themselves to get on an LLVM bandwagon in time to fragment the existing support we have, thus requiring support for two compilers over just one. But that's a digression. Suffice it to say, I'm not a fanboy.
Jon Masters

public boolean isUserAGoat ()

Used to determine whether the user making this call is subject to teleportations.

Returns
whether the user making this call is a goat

Android 4.2 reference manual

Comments (7 posted)

GNOME 3.8 to drop fallback mode

It's official: the GNOME project will be dropping the GNOME 2-like fallback mode in the 3.8 release. "We've come to the conclusion that we can't maintain fallback mode in reasonable quality, and are better off dropping it."

Comments (130 posted)

Matplotlib 1.2.0 released

Matplotlib 1.2.0 has been released. It is the first version of the Python 2D plotting library to support Python 3. Beyond that, it adds support for PGF/TikZ output, 3D trisurface plots, streamplots, new features for Tripcolor, boxplot, colorbars, and contour plots, and more. "After months of hard work by a veritable army of contributors, I'm pleased to announce the release of matplotlib 1.2.0. This is the first time we've released without the assistance of John Hunter, who is sorely missed. I hope this is at least a small way to say thanks for all of his great work."

Full Story (comments: 3)

Open Letter from the Hildon Foundation Board to the Maemo Community

Tim Samoff has penned an open letter to Maemo and MeeGo community members on behalf of the Hildon Foundation Board, a new entity hoping to pick up the role vacated by Nokia when the company switched to the Windows Phone platform. "It is the Hildon Foundation that will oversee the transition of the Maemo Community away from Nokia and into the hands of the community. Of course, the Foundation is also very concerned with ongoing development within both the Mer and Nemo projects, so facilitating their future is also quite important."

Comments (3 posted)

The Shumway open SWF runtime project

The Mozilla Research blog introduces Shumway, a new, open-source Flash runtime. "Mozilla’s mission is to advance the Open Web. We believe that we can offer a positive experience if we provide support for the SWF format that is still used on many web sites, especially on mobile devices where the Adobe Flash Player is not available." Source is available on Github.

Comments (30 posted)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Phipps: Stop patent mischief by curbing patent enforcement

At InfoWorld, Simon Phipps responded to Richard Stallman's WIRED essay on limiting the effects of software patents, via a counterproposal that he says will "adjust the system so that these patents cannot be used to harm the software industry, which to date hasn't needed patents to drive innovation." Phipps' proposal is to "make software patents only enforceable against implementations of standards where the patent was declared in the standards process. All other software contexts should become off-limits for patent enforcement." Nevertheless, he said, there will still need to be other measures to completely fix the software patent mess. (Thanks to Davide Del Vento)

Comments (51 posted)

Seigo: ending the cults of personality in free software

On his blog, Aaron Seigo has a thoughtful look at the role personality cults play in free software. Why should we worry about what Linus Torvalds runs on his desktop, he asks, as it is just one data point—one that gets hugely inflated because of who Torvalds is. "Let's step to the side and consider this from a different angle: Imagine that someone made Linus' perfect desktop environment. Something that satisfied him entirely and which he could happily talk about whenever he felt like it. Would that environment be interesting and useful for the general public, or would it be something great for kernel developers and grumpy-heads like Linus? It could go either way, really, because (once again) the fact that Linus liked it would not be useful information when held in isolation by itself."

Comments (71 posted)

Mena-Quintero: A Friday rant on Gnome 3, journalists, and power users

Longtime GNOME hacker Federico Mena-Quintero reflects on the kinds of complaints that occur frequently in and around free software communities. In a sharply worded blog post, complete with animated cat GIFs, he looks at some history, and adds a bit of ranting about complainers, bloggers, journalists, and so on. "We think, "good riddance" when someone threatens to stop using Gnome. (And our next thought is probably, poor people in the next project, who are going to suffer this person soon.) [...] All of those poisonous people are relatively easy to brush away. The crazies. The slashdot hordes, the peanut gallery. We make names for them — we encapsulate them, give them a name, go up one level of abstraction, take a gulp of Pepto, and move that named entity into a mental /dev/null. But they leave some residue."

Comments (235 posted)

Page editor: Nathan Willis

Announcements

Brief items

Cyanogenmod.com falls off the net

Anybody trying to get at cyanogenmod.com at the moment is likely to be wondering what is wrong. According to this blog post, the problem is a dispute with a (former) member of the CyanogenMod team. "Today, it happened: all of our records were deleted, and cyanogenmod.com is slowly expiring out of the Internet and being replaced by blank pages and non-existing sites. @cyanogenmod.com e-mail is now being directed to a mailserver completely out of our control, too." The project can still be reached at cyanogenmod.org while this is playing out.

Update: This situation has now been resolved, but the project plans to remain with the .org address as its home in the future.

Comments (2 posted)

FSFE: Finnish activist, Danish hacker share Nordic Free Software Award 2012

The Free Software Foundation Europe has announced that Finnish Free Software activist Otto Kekäläinen and Danish hacker Ole Tange are the recipients of the 2012 Nordic Free Software Award.

Full Story (comments: none)

Survey: Newcomer experience and contributor behavior

Kevin Carillo, a PhD student at the School of Information Management of Victoria University of Wellington in New Zealand, is conducting a survey on newcomer experiences. From Carillo's invitation: "If you have joined one of the following FOSS communities within the last 3 years (after January 2010): Debian, GNOME, Gentoo, KDE, Mozilla, Ubuntu, NetBSD, or OpenSUSE, I would like to invite you to complete an online survey. I am interested in hearing from people who are either technical or non-technical contributors, and who have had either positive or negative newcomer experiences."

Comments (1 posted)

Calls for Presentations

Fosdem 2013 - Crossdesktop devroom call for talks

FOSDEM (Free and Open source Software Developers' European Meeting) 2013 will be held in Brussels, Belgium on February 2-3, 2013. There will be a CrossDesktop DevRoom at FOSDEM, which will host Desktop-related talks. The call for desktop-related talks is December 14, 2012. "Topics accepted include, but are not limited to: Enlightenment, Gnome, KDE, Unity, XFCE, Windows, Mac OS X, general desktop matters, applications that enhance desktops and web (when related to desktop). Talks can be very specific, such as developing mobile applications with Qt Quick; or as general as predictions for the fusion of Desktop and web in 5 years time. Topics that are of interest to the users and developers of all desktop environments are especially welcome."

Full Story (comments: none)

Upcoming Events

Events: November 15, 2012 to January 14, 2013

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
November 10
November 16
SC12 Salt Lake City, UT, USA
November 12
November 16
19th Annual Tcl/Tk Conference Chicago, IL, USA
November 12
November 17
PyCon Argentina 2012 Buenos Aires, Argentina
November 16 PyHPC 2012 Salt Lake City, UT, USA
November 16
November 19
Linux Color Management Hackfest 2012 Brno, Czech Republic
November 20
November 24
8th Brazilian Python Conference Rio de Janeiro, Brazil
November 24 London Perl Workshop 2012 London, UK
November 24
November 25
Mini Debian Conference in Paris Paris, France
November 26
November 28
Computer Art Congress 3 Paris, France
November 29
November 30
Lua Workshop 2012 Reston, VA, USA
November 29
December 1
FOSS.IN/2012 Bangalore, India
November 30
December 2
CloudStack Collaboration Conference Las Vegas, NV, USA
November 30
December 2
Open Hard- and Software Workshop 2012 Garching bei München, Germany
December 1
December 2
Konferensi BlankOn #4 Bogor, Indonesia
December 2 Foswiki Association General Assembly online and Dublin, Ireland
December 5 4th UK Manycore Computing Conference Bristol, UK
December 5
December 7
Open Source Developers Conference Sydney 2012 Sydney, Australia
December 5
December 7
Qt Developers Days 2012 North America Santa Clara, CA, USA
December 7
December 9
CISSE 12 Everywhere, Internet
December 9
December 14
26th Large Installation System Administration Conference San Diego, CA, USA
December 27
December 29
SciPy India 2012 IIT Bombay, India
December 27
December 30
29th Chaos Communication Congress Hamburg, Germany
December 28
December 30
Exceptionally Hard & Soft Meeting 2012 Berlin, Germany

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds