LWN.net Weekly Edition for September 11, 2015
Automating architecture bootstrapping in Debian
Debian supports a lengthy list of hardware architectures—twelve on the official list, plus twelve unofficial ports and a variety of other "port-like" projects such as distributions based on non-Linux kernels. Nevertheless, starting a new architecture-support effort involves a lot of repetitive work that Helmut Grohne (and others) think could be automated. Grohne presented the topic at DebConf 2015 in Heidelberg, discussing the issues involved when bootstrapping a new architecture and what needs to be improved. The good news is that progress is being made and that the work benefits the rest of the project, even those not interested in architecture bootstrapping.
In fact, Grohne started the session by discussing why everyone in Debian should care about automating the architecture-bootstrap process. "Bootstrapping," he said, just means the process of getting the initial, core suite of Debian packages up and running on the new platform. Roughly speaking, that means getting the new architecture to the point where the build-essential metapackage can be used; at that point most other Debian packages can be built on the target system.
The project averages about one new bootstrap per year, he said; ARM64 and PowerPC64-EL are the most recently added architectures, while MIPS64-EL, RISC-V, and OpenRISC are on the horizon. Improving the bootstrapping process will only make Debian a more inviting distribution in areas like embedded development, he said, where Debian may not be the OS of choice. But it also forces the project to re-examine much of its build-from-source tool set, which might otherwise languish, and improving the process could encourage new projects like bootstrapping sub-architectures (for example, creating an x32-optimized port of Debian, or a port that uses the musl C library).
Grohne is the author of rebootstrap, a QA tool for bootstrapping a new architecture. It currently runs on Debian's Jenkins server, testing 20 different architectures about once each week. Each test tries to cross-build about 100 packages, which is only a subset of the packages build-essential pulls in or depends on. Nevertheless, rebootstrap has caught 190 bugs so far (120 of which have been fixed). Grohne plans to expand the package set covered by rebootstrap, but said that one of the lasting benefits of the process is catching and fixing bugs in the core package set.
Cross toolchains and cross-building
He then turned his attention to outlining the steps involved in bootstrapping an architecture, beginning with a description of the cross toolchains used in Debian. Two options are in common usage; both include a version of GCC that can cross-compile for the target architecture, plus target-architecture versions of binutils, glibc, glibc headers, and gcc-defaults. The two toolchains differ in how dependencies are handled: one expects multi-architecture builds to be available on the build system for all dependencies, while the other expects target-architecture versions of all dependencies.
![Helmut Grohne [Helmut Grohne at DebConf]](https://static.lwn.net/images/2015/09-debconf-grohne-sm.jpg)
Both of the approaches work, Grohne said. The toolchain packages are now in Debian unstable (which was not true as recently as two years ago). Today, though, most bootstrapping projects can begin with the back-and-forth GCC/glibc "dance." First the user cross-compiles a minimalist version of GCC for the new architecture, which is then used to build the glibc-header package. Then a bit more of GCC can be built, which in turn allows more of glibc to be built, and so forth.
There are, however, a few architectures where cross toolchain support is still problematic. Alpha and HPPA have glibc conflicts, while OpenRISC, RISC-V, armel, armhf, and SuperH have GCC bugs. Patches are available to fix each of these problems, but they have not yet been merged. Thus, anyone needing to bootstrap or cross-compile on those architectures will need to get the patches from the bug-tracking system and apply them before proceeding. Grohne encouraged anyone who saw their "favorite architecture" on the problematic list to get in touch after the talk.
He then described the process for cross-building an individual package. Thanks to the Emdebian team, some packages have supported cross-building for close to ten years. For the rest, most Debian packages can be cross-built using sbuild or dpkg-buildpackage, so long as the appropriate flags are set to build for the target architecture. What does cause problems, though, is satisfying a package's Build-Depends dependencies when cross-building.
Problems and solutions
A lot of packages in the Debian archive are multi-architecture, which should allow the build system's version to satisfy Build-Depends for a cross-build. But, in reality, the long chains of transitive dependencies can break down if just one package without multi-architecture support is involved. Grohne said that out of Debian's 20,000 packages, Build-Depends problems mean that only about 3,000 can be automatically cross-built. There is a web page available that monitors the status of the dependency issues; interested developers can check there for packages that need attention.
In many cases, he said, the fixes required to unstick a problematic Build-Depends chain are simple enough—such as rewriting dependency rules that inadvertently assume that the build architecture and host architecture are the same. For example, he said, the dependency rule:
Build-Depends: g++ (>= 4:5)
is probably meant to specify that the package should be built with a recent version of G++, but the rule is interpreted as a package that needs to be present on the target system. For now, bootstrappers usually solve these problems through a lot of manual effort. Better solutions have been proposed, such as special "compiler for host" packages, which could be specified in dependency rules:
Build-Depends: g++-for-host (>= 4:5)
A proof-of-concept package for this idea is in Debian experimental.
Interested Debian contributors can also make a significant difference by adding multi-architecture support to more and more packages in the archive. Most of the work required involves straightforward fixes, such as changing compiler references to use target triplets (which allow different build and host architectures).
There are a few "funky issues" that arise when working on multi-architecture support, however. The most common is encountered in interpreted languages. For example, a "Architecture: any" Perl application may depend on a "Architecture: all" Perl module, which in turn depends on a "Architecture: any" Perl extension. But "all" and "any" are not the same to the dependency resolver. Whereas "all" usually designates a package that will work, unaltered, on any processor (such as a collection of Perl scripts), "any" means that the package can be built for any architecture.
Unfortunately, due to that minor distinction, passing through the "all" architecture rule in the middle of the chain breaks the chain, since the build system's version of the package satisfies that dependency. At that point, the dependency resolver stops looking for packages in the target architecture. The bootstrapping team has not yet decided on a solution to this problem, he said, although there is a workaround: manually changing the all to an any and adding another rule (Multi-Arch: same) to every dependency in the chain.
There are, of course, quite a few other problems encountered when cross-building a large set of packages. Grohne gave multiple examples, some of which raise difficult-to-answer questions. For example, there are some packages that are their own build dependency (he noted cracklib2 and nss in this group) because they expect to access certain data files during the build process, and those files are shipped in the same package as the source code. Fixing that circular dependency without breaking native builds requires careful thought, he said.
Grohne closed the session with a brief status report and some ideas for future development. Bootstrapping a new architecture currently involves about 500 source packages. His rebootstrap tool only tests 100 of those, which means it would require a lot of additional work to be comprehensive. Instead, he has proposed implementing the Build Profiles specification, which would essentially allow developers to define a separate set of build dependencies and compilation targets to be used for cross-builds. If widely implemented, it can reduce the amount of manual tweaking required. The architecture-bootstrapping team has added Build Profile support to a number of core packages already, but more remains to be done.
At the conclusion of the talk, the audience had quite a few questions for Grohne, most of which focused in on the particulars of cross-compilation or of specifying build dependencies. On the whole, it seems as though the Debian community is interested in doing what it can to make cross-building packages more reliable. For developers interested in bringing Debian up from scratch on a new processor architecture, the long-term outlook may be good, but there is considerable work to be done in the days ahead.
[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]
Realtime KVM
Realtime virtualization may sound like an oxymoron to some, but (with some caveats) it actually works and is yet another proof of the flexibility of the Linux kernel. The first two presentations at KVM Forum 2015 looked at realtime KVM from the ground up. The speakers were Rik van Riel, who covered the kernel side of the work (YouTube video and slides [PDF]) and Jan Kiszka, who explained how to configure the hosts and how to manage realtime virtual machines (YouTube video and slides [PDF]). This article recaps both talks, beginning with Van Riel's.
The PREEMPT_RT kernel
Realtime is about determinism, not speed. Realtime workloads are those where missing deadlines is bad: it results in voice breaking up in telecommunications equipment, missed opportunities in stock trading, and exploding rockets in vehicle control and avionics. These applications can have thousands of deadlines a second; the maximum allowed response time can be as low as a few dozen microseconds, and it has to be met 99.999% of the time, if not ... just always. Speed is useful, but guaranteeing this kind of latency bound almost always results in lower throughput.
Nearly every latency source in a system comes from the kernel. For example, a driver could disable interrupts and prevent high-priority programs from being scheduled. Spinlocks are another cause of latency in a non-realtime kernel, because Linux cannot schedule() while holding a spinlock. These issues can be controlled by running a kernel built with PREEMPT_RT, the realtime kernel patch set. A PREEMPT_RT kernel tries hard to make every part of the Linux kernel preemptible, except for short sections of code.
Most of the required changes have been merged into Linus's kernel tree: kernel preemption support, priority inheritance, high-resolution timers, support for interrupt handling in threads, annotation of "raw" spinlocks, and NO_HZ_FULL mode. The PREEMPT_RT patch, while still large, has to do much less than it used to. The main three things it does are: turn non-raw spinlocks into mutexes with priority inheritance, actually run all interrupt handlers in threads so that realtime tasks can preempt them, and an RCU implementation that supports preemption.
The main remaining problem is in firmware. System management interrupts (SMIs) for x86 take care of things such as fan speed, even on servers. SMIs cannot be blocked by the operating system and can take up to milliseconds to run in extreme cases. During this time, the operating system is completely blocked from running. There is no solution other than buying hardware that behaves well. A kernel module, hwlatdetect, can help detect the problem; it blocks interrupts on a CPU, looks for unexpected latency spikes, and uses model-specific registers (MSRs) to correlate the spikes to SMIs.
Realtime virtualization, really?
Now, realtime virtualization may sound implausible, but it can be done. Of course, there are problems: for example, the priority of the tasks in the virtual machine (VM) is not visible to the host and neither are lock holders inside a guest. This limits the scheduler's flexibility and prevents priority inheritance, so all of the virtual CPUs (VCPUs) have to be placed at a very high priority. Only ksoftirqd has a higher priority, since it delivers interrupts to the virtual CPUs. In order to avoid starving the host, systems have to be partitioned between CPUs running system tasks and isolated CPUs (marked with the isolcpus and nohz_full kernel command-line arguments) running realtime guests. The guest has to be partitioned in the same way between realtime VCPUs and those that run generic tasks. The latter could occasionally cause exits to the host user space, which are potentially long and—much like SMIs on bare metal—prevent the guest scheduler from running.
Thus, a virtualized realtime guest uses more resources than the same workload running on bare-metal, and those resources have to be dedicated to a particular guest. But this can be an acceptable price to pay for the improved isolation, manageability, and hardware compatibility that virtualization provides. In addition, lately each generation of processors has made more and more cores available within one CPU socket; Moore's Law seems to be compensating for this problem, at least for now.
Once the design of realtime KVM was worked out as above, the remaining piece is to fix the bugs. A lot of the fixes were either not specific to KVM, or not specific to PREEMPT_RT, so they will benefit all real-time users and all virtualization users. For example, RCU was changed to have an extended quiescent state while the guest runs. NOHZ_FULL support was extended to disable the timer tick altogether when running a SCHED_FIFO (realtime) task. In this case, that task will not be rescheduled, because anything with a higher priority would have already preempted it, so the timer tick is not needed. A few knobs were added to disable unnecessary KVM features that can introduce latency, such as synchronization of time from the host to the guest; this can take several microseconds and the solution is simply to run ntpd in the guest.
Virtualization overhead can be limited by using PREEMPT_RT's "simple wait queues" instead of the full-blown Linux wait queues. These only take locks for a bounded time so that the length of the operations is also bounded (wakeups often happen from interrupt handlers, so their cost directly affects latency). Merging simple wait queues in the mainline kernel is being discussed.
Another trick is to schedule KVM's timers a little in advance to compensate for the overhead of injecting virtual interrupts. It takes a few microseconds for the hypervisor to pass an interrupt down to the guest, and a parameter in the kvm kernel module allows for tuning the adjustment based on the guest's benchmarked latency.
And finally, new processor technology can help too. This is the case for Intel's "Cache Allocation Technology" (CAT), available on some Haswell CPUs. The combined cost of loads from DRAM and TLB misses can cause a single uncached context switch to add up to over 50 microseconds. CAT allows reserving parts of the cache to specific applications, preventing one workload from evicting another workload from the cache, and it is controlled nicely with a control-groups-based interface. The patches, however, have not yet been included in Linux.
The results, measured with cyclictest, are surprisingly good. Bare-metal latencies are less than 2 microseconds, but KVM's measurement of 6-microsecond latencies is also a very good result. To achieve these numbers, of course, the system needs to be carefully set up to avoid all kinds of high-latency system operations: no CPU frequency changes, no CPU hotplug, no loading or unloading of kernel modules, and no swapping. The applications also have to be tuned to avoid slow devices (e.g. disks or sound devices) except in non-realtime helper programs. So deploying realtime KVM requires deep knowledge of the system (for example, to ensure the time stamp counter is stable and the system will never fall back to another clock source) and the workload. Some new bottlenecks will be found as people use realtime KVM more, but the work on the kernel side is, in general, proceeding well.
"Can I have this in my cloud?"
At this point, Van Riel left the stage to Kiszka, who talked more about the host configuration, how to automate it, and how to manage the systems with libvirt and OpenStack.
Kiszka is a long-time KVM contributor who works for Siemens. He started using KVM many years ago to tackle hardware-compatibility problems with legacy software [PDF]. He has been toying with realtime KVM [YouTube] for several years, and people are now asking: "Can I have this in my cloud?".
The answer is "yes", but there are some restrictions. This is not something for a public cloud, of course. Doing realtime control for an industrial plant will not go well if you need to do I/O from some data center far away. "The cloud" here is a private cloud with a fast Ethernet link between the industrial process and the virtual machine. Many features of a cloud environment will also be left behind, because they do not provide deterministic latencies. For example, the realtime path must not use disks or live migration, but this is generally not a problem.
In going beyond the basic configuration that Van Riel had explained, the first thing to look at is networking. Most of QEMU is still protected by a "big QEMU lock", and device passthrough has latency problems too. While progress is being made on these fronts, it's already possible to use a paravirtualized device (virtio-net) together with a non-QEMU backend.
KVM supports two such virtio-net backends, namely vhost-net and vhost-user. vhost-net lies in the kernel; it connects a TAP device from the Linux network stack to a virtio-net device in a virtual machine. However, it does not have acceptable latency, yet, either. vhost-user, instead, lets any user-space process provide networking, and can be used together with specialized network libraries.
Examples of realtime-capable network libraries include Data Plane Development Kit (DPDK) or SnabbSwitch. These alternative stacks opt for an aggressive polling strategy; this reduces the amount of event signaling and, as consequence, latency as well. Kiszka's set up uses DPDK as a vhost-user client; of course, it runs at a realtime priority too. For the client to deliver interrupts to VCPUs in a timely fashion, it has to be placed at a higher priority than the VCPU threads.
Kiszka's application does not have high packet rates, so a single physical CPU is enough to run the switch for all the network interfaces in the systems; more demanding applications might require one physical CPU for each interface.
After prototyping realtime virtualization in the lab, moving it to the data center requires a lot more work. There are hundreds of VMs and many different networks, some of them realtime and some not; that needs to managed and accounted for flexibly. This requires a cloud-management stack, so OpenStack was chosen and extended with realtime capabilities. The reference architecture then includes (from the bottom up): the PREEMPT_RT kernel, QEMU (which has to be there for the guest's non-realtime tasks and to set up the vhost-user switch), the DPDK-based switch, libvirt, and OpenStack. Each host, or "compute node", is set up with isolated physical CPUs as explained in the first half of the talk. IRQ affinities also have to be set explicitly (or through the irqbalance daemon) because, by default, they do not respect the kernel's isolcpus setting. But, depending on the workload, little tuning may be needed and, in any case, the setup is easily replicated if there are many similar hosts. There is also a tool called partrt that helps to set up isolation.
Libvirt and OpenStack
Higher up comes libvirt, which doesn't require much policy, as it only executes commands from the higher layers. All required tunables are available in libvirt 1.2.13: setting the scheduling parameters (policy, priority, pinning to physical CPUs), asking QEMU to mlock() all guest RAM, and starting VMs connected to vhost-user processes. The consumer for these parameters is OpenStack's compute-node-handling Nova component.
Nova can already be configured to enable VCPU pinning and dedicated physical CPUs. Other settings, though, are missing in OpenStack, and are being discussed in a blueprint. While it is not yet complete (for example it doesn't support associating non-realtime physical CPUs to non-realtime QEMU threads), the blueprint will enable the usage of the remaining libvirt knobs. Patches for it are being discussed and the target is OpenStack's "Mitaka" release, due in the first half of 2016. Kiszka's team is integrating the patches into its deployment; the team will come up with extensions to the patches and to the blueprint.
OpenStack also controls networking through the Neutron component. However, realtime networks tend to be special: they might not use TCP/IP at all, and Neutron really wants to manage its networks in its own way. Siemens is thus introducing "unmanaged" networks (which do no DHCP and possibly even no IP) into Neutron.
All in all, work in the higher layers of the stack is mostly about standardizing the basic setup of realtime-capable compute nodes, and a lot of the work will be about improving the tuning process in tools such as partrt. As mentioned during the Q&A session, tuned is also being extended to support a realtime tuning profile. However, Kiszka also plans to take another look lower in the stack; the newest chipsets have functionality that eliminates interrupt latency introduced when assigning devices directly to VMs by directly routing the interrupt without involving the hypervisor. In addition, Kiszka's older work [PDF] to let QEMU emulate realtime devices could be brought back sometime in the future.
Tor's .onion domain approved by IETF/IANA
The Tor project gained an important piece of official recognition this week when two key Internet oversight bodies gave their stamp of approval to Tor's .onion top-level domain (TLD). While .onion has been in use on the Tor network for several years, it was always as a "pseudo-domain" in the past. Its official recognition should make wider interoperability possible (as well as shield the domain from being claimed by a domain registrar).
To recap, Tor first introduced .onion in a 2004 white paper that described how hidden services on the Tor network could be accessed. A application designed for Internet usage (such as a web browser) needs the hostnames of servers to be looked up through a DNS-like mechanism that returns an IP address. The .onion TLD serves the corresponding purpose for a server running on the Tor network rather than on the Internet, but .onion hostnames are substantially different.
The server has a foo.onion hostname, where "foo" is the hash of the server's public encryption key. When the browser sends an HTTPS request to foo.onion, rather than performing a DNS lookup, the Tor proxy looks up the hash in Tor's distributed hash table and, assuming the server is online, gets the address of a Tor "rendezvous" node in return. Tor then contacts the rendezvous node and establishes the connection. The end result is functionally the same as the DNS case—the client gets a working connection to the server—but the .onion protocol makes the connection happen without either endpoint learning about the other's location.
Informalities
The .onion mechanism works reliably enough that recent years have seen several high-profile service providers add Tor hidden-service entry points. Facebook famously crunched through a massive set of hash calculations before it stumbled onto its easily remembered Tor address, facebookcorewwwi.onion [Tor link]. Search engine DuckDuckGo, news outlet The Intercept, and several other well-known web sites have followed suit (albeit without Facebook's easy-to-memorize hash).
Nevertheless, as long as .onion remained an unofficial TLD, nothing would formally prevent a new registrar from applying to the Internet Corporation for Assigned Names and Numbers (ICANN) to register and manage a .onion TLD on the public Internet. ICANN opened the doors to applications for new TLDs in 2012, and has received several thousand.
There have been other well-known pseudo-domains in years past—readers with long memories may recall .uucp or .bitnet—but those pseudo-domains were never formally specified. ICANN's new policy for accepting open submissions for new TLDs means that such informal conventions are a risky proposition. For example, RFC 6762 lists several TLDs "recommended" for private usage on internal networks, including .home, .lan, .corp, and .internal. Of those, .lan and .internal still seem to be unclaimed, but the ICANN site lists six registrar applications to manage .corp and eleven for the .home domain.
Consequently, Tor's Jacob Appelbaum (along with Facebook engineer Alec Muffett) submitted an Internet Draft proposal to the IETF to have .onion officially recognized as a "special-use domain name." The proposal specifies the expected behavior for application software and domain-name resolvers, and it forbids DNS registrars and DNS servers from interfering with Tor's usage of .onion. Specifically, it requires registrars to refuse any registrations for .onion domain names and it requires DNS servers to respond to all lookup requests for .onion domains with the "non-existent domain" response code, NXDOMAIN. Application software and caching DNS resolvers need to either resolve .onion domains through Tor or generate the appropriate error indicating that the domain cannot be resolved.
On September 9, the IETF approved Appelbaum and Muffett's proposal as a Draft RFC, and ICANN's Internet Assigned Numbers Authority (IANA) added .onion to the official list of special-use domain names. That list, unlike RFC 6762, is a formal one; apart from the reverse lookups for the reserved IP-address blocks, only a few domains are included (such as .test, .localhost, .local, .invalid, and several variations of "example").
What's next
The most immediate effect of the approval will likely be that general-purpose software can implement support for .onion, since there is now no concern that the TLD could be "overloaded" in the future by being adopted in a non-Tor setting. Appelbaum, of course, has lobbied the free-software community in recent years to start building in support for Tor as a generic network-transport layer. He proposed the idea at GUADEC 2012, and raised it again at DebConf 2015. Implementing system-wide Tor support would not be trivial, but it is perhaps now a more reasonable request.
In the longer term, though, the official recognition of .onion may have other ripple effects. Facebook's Tor team posted an announcement about the change, and noted that it raises the possibility of getting SSL certificates for .onion domains:
Together, this assures the validity and future availability of SSL certificates in order to assert and protect the ownership of Onion sites throughout the whole of the Tor network....
The CAB Forum ballot linked to by the announcement proposed a set of validation rules for issuing certificates for .onion domains and for certificate authorities (CAs) to sign those certificates. It makes straightforward arguments—namely, that users benefit if site owners can publicly prove their ownership of a .onion address. Apart from Facebook, after all, most .onion URLs are quite difficult to remember.
That said, the forum ballot passed with six "yes" votes from CAs, two "no" votes, and 13 abstentions, plus "yes" votes from three browser vendors. That result might not be interpreted as a strong mandate among CAs. In addition, the CAB Forum is not a governing body, so its approval does not necessarily dictate that any particular CA will issue .onion certificates in the future.
Nevertheless, approval for the .onion TLD is undoubtedly a positive sign for Tor and for hidden services in particular. The project can point to it as acceptance that the technology has grown in popularity among Internet users and is a far cry from the "dark web" so often alluded to in the general press. Just as importantly, developers can count on .onion as a stable service-naming scheme, which may lead to interesting new developments down the line.
Security
Hardware technologies for securing containers
There are plenty of security concerns with running containers and applications that have been containerized—some of those concerns can be reduced or eliminated using hardware techniques. Intel's Arjan van de Ven described some x86 technologies that can help with some of the security problems that containers face at a LinuxCon North America presentation. One of the technologies is brand new, having only been announced a few days before the talk.
Many people are downloading and running containers from the internet without any real checking on their provenance, which "should scare the hell out of you", Van de Ven said. That is a "sharp knife problem" that cannot be solved with hardware technologies, since it all comes down to trust. There are a number of trust issues with that, including whether a container truly comes from where it purports to originate with the binaries that are expected, whether it contains software that has vulnerabilities that have been discovered since it was created, and whether the contents are complying with the licenses that govern the code. Those are all of the same problems that users face when downloading a Linux distribution—the same kinds of solutions will need to applied to containers.
![[Arjan van de Ven]](https://static.lwn.net/images/2015/lcna-vandeven-sm.jpg)
But if you look "beyond the sharp knife", there are security problems where hardware can help. One major concern is that the container is leaky somehow, such that the containerized application can escape its containment. An attacker may use that ability to directly attack the host operating system (OS) or they may attack another container running on the host. In addition, how does a container know that the OS it is running on has not been compromised? These are places where "hardware-assisted security" can help.
Intel's Kernel-Guard Technology (KGT) tries to protect the kernel against certain kinds of malware, Van de Ven said. It places a small monitor between the kernel and the hardware to protect certain kernel data structures or CPU registers from modification. The monitor is not a full hypervisor, but uses similar techniques to protect the system from certain kinds of attacks. Kernel code pages, interrupt descriptor table contents, and page table mappings could be protected using KGT, as could CPU control registers and model-specific registers (MSRs).
Containers, applications, and other components will be able to detect changes in the underlying system and its software using the attestation feature that the Intel Cloud Integrity Technology (CIT) provides. Attestation is a way to prove that the binaries for components like the firmware, bootloader, kernel, and, say, Docker daemon or rkt binary, have not changed. A chain of hashes is calculated for the elements and the Trusted Platform Module (TPM) is used to sign the hash in such a way that others can verify that those elements have not been changed.
The attestation can be extended to prove that a container is running in the right data center or in the right country. That may be important for countries that require their citizens' data to be stored domestically, for example.
It is a "picky and fragile" solution in some ways, since anything that gets changed will change the hash chain. So upgrades need to be handled carefully. In addition, it only proves the state of the software when it was started; if the binary gets changed later by way of a compromise, it won't be detected. There is also a performance cost associated with the feature, so it does not come for free, he said. Attestation is "not for the faint of heart", but can help solve some security problems for containers.
Clear Containers are another technology that can help secure "containers". It provides the isolation of virtualization with the performance of containers by actually running the container in a lightweight virtual machine. He didn't go into much detail about Clear Containers, as he gave another full talk on that subject at the conference. Support for Clear Containers has been added to the rkt container engine as a proof of concept. It works, but there are still plenty of "interesting problems" left to solve, he said.
The supervisor mode access protection (SMAP) and execution prevention (SMEP) features of some x86 processors are changing some of things that we learned in school about CPUs, Van de Ven said. Instead of the traditional ring model, where the most-privileged ring has access to the data in all rings, SMAP and SMEP make the rings almost completely disjoint. If an exploit tricks the kernel into accessing or running user-space code, the CPU will simply fault, stopping the attack in its tracks.
Of course, the kernel needs to access user-space data at times, which is where the overlap between the rings comes into play. The Linux kernel already has special methods to access user-space data; those can lift the SMAP protections for the duration of that access. Any other access will trigger the fault. It doesn't prevent all attacks using bad kernel pointers, but it does make it harder to exploit them. (Support for a feature similar to SMAP for ARM processors has been merged for the 4.3 kernel.)
The final feature he covered had only been announced two days earlier: Intel Software Guard Extensions (SGX). This new feature is "a little weird", Van de Ven said. It allows the system to define a special zone of memory (called an "enclave") that will be used to hold encrypted memory for both code and data. The enclave will also have some defined entry points. Only code that is running inside the enclave can see the unencrypted contents of the memory. Even the kernel cannot access the code and data inside the enclave from the outside.
The typical use case for SGX would be for secure cryptography; the key can be placed in the enclave and cannot be extracted from it. The entry points would provide services using the key, like signing. In addition, the CPU can attest that it is running from within the enclave to a remote server.
It is effectively a "black box with a call table". You may be able to trick the enclave into signing things that it shouldn't have signed, he said, but getting the key out is not possible. If there is a security hole in the code inside the enclave, though, all bets are off. In addition, debugging the code inside the enclave is difficult—you can't simply attach GDB.
The enclave is populated from a driver, Van de Ven said in answer to a question from the audience. Another attendee suggested the "Intel SGX for Dummies" site for more information on the feature.
He circled back around to KGT as he was winding down the talk. That feature will perhaps be the most generally useful for protecting against various kinds of attacks. It can protect all of the read-only memory in the kernel along with all of the MSRs and CPU configuration registers. Many of the data structures in the kernel can be made read-only and be protected using KGT. It can be configured with a set of rules that, for example, would allow only certain functions to change certain parts of memory. So KGT could enforce that only the user-space access methods in the kernel are allowed to change the SMEP and SMAP settings.
KGT is implemented as a mini-hypervisor that requires no kernel changes. The code is available (under the Apache 2.0 license) for those interested.
These hardware technologies are certainly not limited to protecting containers or containerized applications—they are more widely applicable. SMEP and SMAP have been around for a while, but Clear Containers, CIT, KGT, and definitely SGX are all relatively new, so Van de Ven's talk provided a nice quick overview of those ideas. It will be interesting to see how they got used in the future.
[I would like to thank the Linux Foundation for travel assistance to Seattle for LinuxCon North America.]
Brief items
Security quotes of the week
Using such a system, attackers could trick a self-driving car into thinking something is directly ahead of it, thus forcing it to slow down. Or they could overwhelm it with so many spurious signals that the car would not move at all for fear of hitting phantom obstacles.
We have a difficult enough time building secure systems without backdoors, and the presence of a backdoor must necessarily weaken the security of the system still further. With the dreadful history of backdoors, its little wonder most security professionals believe building backdoors right is practically impossible.
Mozilla: Improving Security for Bugzilla
The Mozilla blog has disclosed
that the official Mozilla instance of Bugzilla was recently
compromised by an attacker who stole "security-sensitive
information
" related to unannounced vulnerabilities in
Firefox—in particular, the PDF
Viewer exploit discovered on August 5. The blog post explains that
Mozilla has now taken several steps to reduce the risk of future
attacks using Bugzilla as a stepping stone. "As an immediate
first step, all users with access to security-sensitive information
have been required to change their passwords and use two-factor
authentication. We are reducing the number of users with privileged
access and limiting what each privileged user can do. In other words,
we are making it harder for an attacker to break in, providing fewer
opportunities to break in, and reducing the amount of information an
attacker can get by breaking in.
"
New vulnerabilities
bind: denial of service
Package(s): | bind | CVE #(s): | CVE-2015-5986 | ||||||||||||||||||||
Created: | September 3, 2015 | Updated: | September 10, 2015 | ||||||||||||||||||||
Description: | From the Arch Linux advisory:
CVE-2015-5986 (An incorrect boundary check can trigger a REQUIRE assertion failure in openpgpkey_61.c): An incorrect boundary check in openpgpkey_61.c can cause named to terminate due to a REQUIRE assertion failure. This defect can be deliberately exploited by an attacker who can provide a maliciously constructed response in answer to a query. | ||||||||||||||||||||||
Alerts: |
|
bind: denial of service
Package(s): | bind | CVE #(s): | CVE-2015-5722 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | September 3, 2015 | Updated: | October 5, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat advisory:
A denial of service flaw was found in the way BIND parsed certain malformed DNSSEC keys. A remote attacker could use this flaw to send a specially crafted DNS query (for example, a query requiring a response from a zone containing a deliberately malformed key) that would cause named functioning as a validating resolver to crash. (CVE-2015-5722) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
drupal6-ctools: multiple vulnerabilities
Package(s): | drupal6-ctools | CVE #(s): | |||||||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||||||
Description: | From the Drupal advisory:
Cross Site Scripting (XSS) Ctools in Drupal 6 provides a number of APIs and extensions for Drupal, and is a dependency for many of the most popular modules, including Views, Panels and Entityreference. Many features introduced in Drupal Core once lived in ctools. This vulnerability can be mitigated by the fact that ctools must load its javascript on the page and the user has access to submit data through a form (such as a comment or node) that allows 'a' tags. Access bypass This module provides a number of APIs and extensions for Drupal, and is a dependency for many of the most popular modules, including Views, Panels and Features. The module doesn't sufficiently verify the "edit" permission for the "content type" plugins that are used on Panels and similar systems to place content and functionality on a page. This vulnerability is mitigated by the fact that the user must have access to edit a display via a Panels display system, e.g. via Panels pages, Mini Panels, Panel Nodes, Panelizer displays, IPE, Panels Everywhere, etc. Furthermore, either a contributed module provides a CTools content type plugin, or a custom plugin must be written that inherits permissions from another plugin and must have a different permission defined; if no "edit" permission is set up for the child object CTools did not check the permissions of the parent object. One potential scenario would allow people who did not have edit access to Fieldable Panels Panes panes, which were specifically set to not be reusable, to edit them despite the person's lack of access. | ||||||||||
Alerts: |
|
drupal6-views_bulk_operations: access bypass
Package(s): | drupal6-views_bulk_operations | CVE #(s): | CVE-2015-5515 | ||||||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||||||
Description: | From the Drupal advisory:
The Views Bulk Operations module enables you to add bulk operations to administration views, executing actions on multiple selected rows. The module doesn't sufficiently guard user entities against unauthorized modification. If a user has access to a user account listing view with VBO enabled (such as admin/people when the administration_views module is used), they will be able to edit their own account and give themselves a higher role (such as "administrator") even if they don't have the "'administer users'" permission. This vulnerability is mitigated by the fact that an attacker must have access to such a user listing page and that the bulk operation for changing Roles is enabled. | ||||||||||
Alerts: |
|
freeimage: integer overflow
Package(s): | freeimage | CVE #(s): | CVE-2015-0852 | ||||||||||||||||||||||||||||||||
Created: | September 8, 2015 | Updated: | October 6, 2016 | ||||||||||||||||||||||||||||||||
Description: | From the Mageia advisory:
FreeImage is vulnerable to an integer overflow in PluginPCX.cpp, making the PCX loader vulnerable to malicious images with a bad window specification. | ||||||||||||||||||||||||||||||||||
Alerts: |
|
jsoup: cross-site scripting
Package(s): | jsoup | CVE #(s): | CVE-2015-6748 | ||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Mageia advisory:
Jsoup before 1.8.3 was vulnerable to a possible XSS issue in the validator, related to how it handled tags without a closing '>' when reaching EOF. | ||||||
Alerts: |
|
libvdpau: multiple vulnerabilities
Package(s): | libvdpau | CVE #(s): | CVE-2015-5198 CVE-2015-5199 CVE-2015-5200 | ||||||||||||||||||||||||||||||||||||
Created: | September 4, 2015 | Updated: | November 3, 2015 | ||||||||||||||||||||||||||||||||||||
Description: | From the CVE entries: It was discovered that libvdpau incorrectly checks if the process underwent a security transition at startup, related to processing of the VDPAU_DRIVER_PATH environment variable. This may allow local attackers gain additional privileges. (CVE-2015-5198) It was discovered that libvdpau does not guard against directory traversal while processing the VDPAU_DRIVER environment variable. This may allow local attackers gain additional privileges. (CVE-2015-5199) It was discovered that the trace functionality of libvdpau can be used to overwrite arbitrary files if the process underwent a trust transition at startup. This may allow local attackers gain additional privileges. CVE-2015-5200) | ||||||||||||||||||||||||||||||||||||||
Alerts: |
|
mediawiki: multiple vulnerabilities
Package(s): | mediawiki | CVE #(s): | CVE-2013-7444 CVE-2015-6737 CVE-2015-6736 CVE-2015-6727 CVE-2015-6733 CVE-2015-6732 CVE-2015-6731 CVE-2015-6730 CVE-2015-6728 CVE-2015-6729 CVE-2015-6735 CVE-2015-6734 | ||||||||||||||||
Created: | September 4, 2015 | Updated: | September 10, 2015 | ||||||||||||||||
Description: | From the CVE entries: CVE-2013-7444 - The Special:Contributions page in MediaWiki before 1.22.0 allows remote attackers to determine if an IP is autoblocked via the "Change block" text. CVE-2015-6737 - Cross-site scripting (XSS) vulnerability in the Widgets extension for MediaWiki allows remote attackers to inject arbitrary web script or HTML via vectors involving base64 encoded content. CVE-2015-6736 - The Quiz extension for MediaWiki allows remote attackers to cause a denial of service via regex metacharacters in a regular expression. CVE-2015-6727 - The Special:DeletedContributions page in MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2 allows remote attackers to determine if an IP is autoblocked via the "Change block" text. CVE-2015-6733 - GeSHi, as used in the SyntaxHighlight_GeSHi extension and MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2, allows remote attackers to cause a denial of service (resource consumption) via unspecified vectors. CVE-2015-6732 - Multiple cross-site scripting (XSS) vulnerabilities in the SemanticForms extension for MediaWiki allow remote attackers to inject arbitrary web script or HTML via the (1) wpSummary parameter to Special:FormEdit, the (2) "Template label (optional)" field in a form, or a (3) Field name in a template. CVE-2015-6731 - Multiple cross-site scripting (XSS) vulnerabilities in the SemanticForms extension for MediaWiki allow remote attackers to inject arbitrary web script or HTML via a (1) section_*, (2) template_*, (3) label_*, or (4) new_template parameter to Special:CreateForm or (5) target or (6) alt_form parameter to Special:FormEdit. CVE-2015-6730 - Cross-site scripting (XSS) vulnerability in thumb.php in MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2 allows remote attackers to inject arbitrary web script or HTML via the f parameter, which is not properly handled in an error page, related to "ForeignAPI images." CVE-2015-6728 - The ApiBase::getWatchlistUser function in MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2 does not perform token comparison in constant time, which allows remote attackers to guess the watchlist token and bypass CSRF protection via a timing attack. CVE-2015-6729 - Cross-site scripting (XSS) vulnerability in thumb.php in MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2 allows remote attackers to inject arbitrary web script or HTML via the rel404 parameter, which is not properly handled in an error page. CVE-2015-6735 - The reset functionality in the TimedMediaHandler extension for MediaWiki does not create a new transcode, which allows remote attackers to cause a denial of service (transcode deletion) by resetting a transcode. CVE-2015-6734 - Cross-site scripting (XSS) vulnerability in contrib/cssgen.php in the GeSHi, as used in the SyntaxHighlight_GeSHi extension and MediaWiki before 1.23.10, 1.24.x before 1.24.3, and 1.25.x before 1.25.2, allows remote attackers to inject arbitrary web script or HTML via unspecified vectors. | ||||||||||||||||||
Alerts: |
|
ntp: multiple vulnerabilities
Package(s): | ntp | CVE #(s): | CVE-2015-5194 CVE-2015-5195 CVE-2015-5196 CVE-2015-5219 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | September 9, 2015 | Updated: | November 11, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Mageia advisory:
It was found that ntpd could crash due to an uninitialized variable when
processing malformed logconfig configuration commands, for example,
It was found that ntpd exits with a segmentation fault when a statistics
type that was not enabled during compilation (e.g. timingstats) is
referenced by the statistics or filegen configuration command, for example,
It was found that the :config command can be used to set the pidfile and
driftfile paths without any restrictions. A remote attacker could use
this flaw to overwrite a file on the file system with a file containing
the pid of the ntpd process (immediately) or the current estimated drift
of the system clock (in hourly intervals). For example, It was discovered that sntp would hang in an infinite loop when a crafted NTP packet was received, related to the conversion of the precision value in the packet to double (CVE-2015-5219). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
openafs: denial of service
Package(s): | openafs | CVE #(s): | CVE-2015-6587 | ||||||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||||||
Description: | From the Mageia advisory:
The vlserver allows pattern matching on volume names via regular expressions when listing attributes. Because the regular expression is not checked for situations which can overflow the buffers used, an attack is possible which reads arbitrary memory beyond the end of the buffer and can act on it as part of the expression evaluation, potentially crashing the process. | ||||||||||
Alerts: |
|
openshift: denial of service
Package(s): | openshift | CVE #(s): | CVE-2015-5250 | ||||
Created: | September 4, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Red Hat advisory: Improper error handling in the API server can cause the master process to crash. A user with network access to the master could cause this to happen. | ||||||
Alerts: |
|
openslp: denial of service
Package(s): | openslp-dfsg | CVE #(s): | CVE-2015-5177 | ||||||||||||
Created: | September 3, 2015 | Updated: | September 10, 2015 | ||||||||||||
Description: | From the Debian-LTS advisory:
CVE-2015-5177: A double free in the SLPDProcessMessage() function could be used to cause openslp to crash. | ||||||||||||||
Alerts: |
|
openstack-nova: denial of service
Package(s): | openstack-nova | CVE #(s): | CVE-2015-3241 | ||||||||
Created: | September 4, 2015 | Updated: | October 16, 2015 | ||||||||
Description: | From the Red Hat advisory: A denial of service flaw was found in the OpenStack Compute (nova) instance migration process. Because the migration process does not terminate when an instance is deleted, an authenticated user could bypass user quota and deplete all available disk space by repeatedly re-sizing and deleting an instance. | ||||||||||
Alerts: |
|
oxide-qt: code execution
Package(s): | oxide-qt | CVE #(s): | CVE-2015-1332 | ||||
Created: | September 9, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Ubuntu advisory:
A heap corruption issue was discovered in oxide::JavaScriptDialogManager. If a user were tricked in to opening a specially crafted website, an attacker could potentially exploit this to cause a denial of service via application crash, or execute arbitrary code with the privileges of the user invoking the program. | ||||||
Alerts: |
|
php: multiple vulnerabilities
Package(s): | php | CVE #(s): | CVE-2015-6834 CVE-2015-6835 CVE-2015-6836 CVE-2015-6837 CVE-2015-6838 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | September 9, 2015 | Updated: | October 8, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | The php package has been updated to version 5.6.13, which fixes several
security issues and other bugs. See the upstream ChangeLog for more details.
The oss-security CVE assignment contains additional information. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
screen: denial of service
Package(s): | screen | CVE #(s): | CVE-2015-6806 | ||||||||||||
Created: | September 4, 2015 | Updated: | September 10, 2015 | ||||||||||||
Description: | From the Red Hat bug report: A vulnerability was found in screen causing stack overflow which results in crashing the screen server process. After running malicious command inside screen, it will recursively call MScrollV to depth n/256. This is time consuming and will overflow the stack if 'n' is huge. | ||||||||||||||
Alerts: |
|
spice: code execution
Package(s): | spice | CVE #(s): | CVE-2015-3247 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | September 4, 2015 | Updated: | September 15, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat advisory: A race condition flaw, leading to a heap-based memory corruption, was found in spice's worker_update_monitors_config() function, which runs under the QEMU-KVM context on the host. A user in a guest could leverage this flaw to crash the host QEMU-KVM process or, possibly, execute arbitrary code with the privileges of the host QEMU-KVM process. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
struts: input validation bypass
Package(s): | struts | CVE #(s): | CVE-2015-0899 | ||||||||||||||||
Created: | September 4, 2015 | Updated: | September 10, 2015 | ||||||||||||||||
Description: | From the Red Hat bug report: The Validator in Apache Struts 1.1 and later contains a function to efficiently define rules for input validation across multiple pages during screen transitions. This function contains a vulnerability where input validation may be bypassed. When the Apache Struts 1 Validator is used, the web application may be vulnerable even when this function is not used explicitly. | ||||||||||||||||||
Alerts: |
|
thunderbird: code execution
Package(s): | iceape thunderbird | CVE #(s): | CVE-2015-4496 | ||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the CVE entry:
Multiple integer overflows in libstagefright in Mozilla Firefox before 38.0 allow remote attackers to execute arbitrary code via crafted sample metadata in an MPEG-4 video file. | ||||||
Alerts: |
|
tor: information disclosure
Package(s): | tor | CVE #(s): | |||||
Created: | September 8, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Tor advisory:
When a socks5 client application sends a request with a malformed hostname, the following is logged: Your application (using socks5 to port 42) gave Tor a malformed hostname: [host.example.com]. Rejecting the connection. It should say [scrubbed] as SafeLogging was not set to 0. Bug is in src/or/buffers.c :: parse_socks(), where it uses escaped() on the request address rather than escaped_safe_str_client(). | ||||||
Alerts: |
|
util-linux: file name collision
Package(s): | util-linux | CVE #(s): | CVE-2015-5224 | ||||
Created: | September 9, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Mageia advisory:
The chfn and chsh commands in util-linux's login-utils are vulnerable to a file name collision due to incorrect mkstemp usage. If the chfn and chsh binaries are both setuid-root they eventually call mkostemp in such a way that an attacker could repeatedly call them and eventually be able to overwrite certain files in /etc | ||||||
Alerts: |
|
vorbis-tools: buffer overread
Package(s): | vorbis-tools | CVE #(s): | CVE-2015-6749 | ||||||||||||||||||||
Created: | September 9, 2015 | Updated: | October 27, 2015 | ||||||||||||||||||||
Description: | From the Mageia advisory:
A buffer overread is possible in vorbis-tools in oggenc/audio.c when opening a specially crafted AIFF file. | ||||||||||||||||||||||
Alerts: |
|
webmin: cross-site scripting
Package(s): | webmin | CVE #(s): | CVE-2015-1990 | ||||
Created: | September 9, 2015 | Updated: | September 10, 2015 | ||||
Description: | From the Mageia advisory:
A malicious website could create links or Javascript referencing the xmlrpc.cgi script, triggered when a user logged into Webmin visits the attacking site. | ||||||
Alerts: |
|
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The 4.3 merge window is still open; see the separate article below for a summary of what has been merged in the last week.Stable updates: none have been released in the last week.
Quotes of the week
The Linux Test Project has been released for September 2015
The Linux Test Project (LTP) has made a stable release for September 2015. The previous release was in April. This release has a number of new test cases including ones for user namespaces, virtual network interfaces, umount2(), getrandom(), and more. In addition, the network namespace test cases were rewritten and regression tests have been added for inotify, cpuset, futex_wake(), and recvmsg(). We looked at writing LTP test cases back in January.
Kernel development news
4.3 Merge window, part 2
As of this writing, some 10,200 non-merge changesets have been pulled into the mainline repository — 6,200 since last week's summary. The 4.3 development cycle thus looks to be a busy one, even if it doesn't quite match the volume seen in 4.2. Quite a few interesting features have been pulled into the mainline over the last week.First, though, a couple of items from last week deserve a followup mention:
- As predicted, the removal of the ext3
filesystem eventually went through. Linus was worried about the
effect of the removal on ext3 users, but was eventually convinced that
the ext4 maintainers will continue to support those users without
forcing their filesystems forward to the ext4 format.
- The disabling of the VM86 feature described last week appears to have been a bit premature; some complaints have made it clear that it's a feature that would be missed. So VM86 will likely come back before the 4.3 kernel is released. Linus had an interesting idea, though: setting the mmap_min_addr parameter to a non-zero value effectively makes VM86 unusable for DOS emulation, so it would be reasonable to disable VM86 in that case. The kernel's default setting is 4,096, and most distributions use a value at least that high, so the end result would be to disable VM86 on the vast majority of systems where it cannot be used anyway.
Other interesting, user-visible activity in the last week includes:
- The user-space page-fault handling patch
set has been merged at last. The main use case for this feature
is live migration of virtualized guests, but others probably exist as
well. See Documentation/vm/userfaultfd.txt for more
information.
- The ambient capabilities work has been
merged, changing the way capability inheritance is managed. See this
commit message for lots of details.
- Support for IPv6 is now built into the kernel by default. Tom Herbert
justified this change in the
changelog by saying: "
IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel.
" - The networking layer now has "lightweight tunnel" support. In
the networking pull request, Dave
Miller said: "
I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know?
" Indeed it may be a while before any of us know, since this feature appears to be quite thoroughly undocumented. A bit of information does appear in this merge commit, though. - Equally undocumented is the virtual routing domains feature, which
allows the splitting of the kernel's routing tables into disjoint
planes. It appears to be a virtualization feature. See the
merge commit for some information.
- The identifier locator addressing
feature is aimed at communication within data centers where tasks can
migrate from one machine to another.
- The discard_max_bytes parameter associated with block devices
is now writable. Administrators who are concerned about massive
latencies caused by large discard operations can tweak this parameter
downward, causing those operations to be split into smaller
operations.
- The Open vSwitch subsystem has gained a new module providing access to
the kernel's network connection-tracking mechanism.
- The new "overflow scheduler" in the IP virtual server subsystem
"
directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight
" - The MIPS architecture has gained support for the user-space probes (uprobes) mechanism.
- There is a new ptrace() operation
(PTRACE_O_SUSPEND_SECCOMP) that can be used to suspend secure
computing (seccomp) filtering. This operation can only be invoked by
a process with CAP_SYS_ADMIN in the initial namespace; it is
intended to make it possible to checkpoint processes running in the
seccomp mode.
- The Smack security module has gained the ability to associate labels
with IPv6 addresses.
- The SELinux security module has a new ability to check
ioctl() calls on a per-command basis.
- Audit rules can now target the actions of a process based on which
executable it is running.
- New hardware support includes:
- Audio:
Cirrus Logic CS4349 codecs,
Option GTM601 UMTS modem audio codecs,
InvenSense ICS-43432 I2S MEMS microphones,
Realtek ALC298 codecs, and
STI SAS codecs.
- DMA:
NXP LPC18xx/43xx DMA engines,
Allwinner A10 DMA controllers,
ZTE ZX296702 DMA engines, and
Analog Devices AXI-DMAC DMA controllers.
- Media:
Toshiba TC358743 HDMI to MIPI CSI-2 bridges,
Renesas JPEG processing units,
Sony Horus3A and Ascot2E tuners,
Sony CXD2841ER DVB-S/S2/T/T2/C demodulators,
STM LNBH25 SEC controllers,
NetUP Universal DVB cards, and
STMicroelectronics C8SECTPFE DVB cards.
- Miscellaneous:
NXP LPC SPI flash interfaces,
IBM CXL-attached flash accelerator SCSI controllers,
ZTE ZX GPIO controllers,
LG LG4573 TFT liquid crystal displays,
Freescale DCU graphics adapters,
NXP LPC178x/18xx/408x/43xx realtime clocks,
NXP LPC178x/18xx/408x/43xx I2C controllers,
Zynq Ultrascale+ MPSoC realtime clocks,
Renesas EMEV2 IIC controllers,
Atmel SDMMC controllers, and
Intel OPA Gen1 InfiniBand adapters.
- Multi-function devices:
Wolfson Microelectronics WM8998 controllers and
Dialog Semiconductor DA9062 power-management ICs.
- Networking:
Teranetics TN2020 PHYs,
Sypnopsys DWC Ethernet QOS v4.10a controllers,
Mellanox Technologies switches,
Microchip LAN78XX-based USB Ethernet adapters,
Samsung S3FWRN5 NCI NFC controllers, and
Fujitsu Extended Socket network devices.
- Pin control: Freescale i.MX6UL pin controllers, UniPhier PH1-LD4, PH1-Pro4, PH1-sLD8, PH1-Pro5, ProXstream2, and PH1-LD6b SoC pin controllers, Qualcomm SSBI PMIC pin controllers, and Qualcomm QDF2xxx pin controllers.
- Audio:
Cirrus Logic CS4349 codecs,
Option GTM601 UMTS modem audio codecs,
InvenSense ICS-43432 I2S MEMS microphones,
Realtek ALC298 codecs, and
STI SAS codecs.
Changes visible to kernel developers include:
- The handling of block I/O errors has been simplified. There is a new
bi_error field in struct bio; when something goes
wrong an error code will be stored there. The two older
error-handling methods (clearing BIO_UPTODATE and passing
errors to bi_end_io()) have been removed.
- The patch sets adding atomic logic
operations and relaxed atomic operations have been merged.
- The static-key interface has changed in ways that, one hopes, will
reduce the number of recurrent bugs caused by confusing naming in the
previous API. See Documentation/static-keys.txt for
details.
- The ARM architecture has a new, software-implemented "privileged access
never" mode that prevents kernel code from accessing user-space
addresses. With this mode enabled (the default), only accesses via
the kernel's accessor functions will succeed. ARM64 also supports
this mode, but it's a direct hardware mode in this case.
- There are two new functions for the allocation and freeing of multiple
objects from a slab cache:
bool kmem_cache_alloc_bulk(struct kmem_cache *cache, gfp_t gfp, size_t count, void **objects); void kmem_cache_free_bulk(struct kmem_cache *cache, size_t count, void **objects);
These functions are useful in performance-critical situations (networking, for example) where the fixed costs of allocation and freeing need to be amortized across a large number of objects.
- Module signing now uses the PKCS#7 message format.
One change that results is that openssl-devel library (or equivalent)
must be installed to build the kernel with signing enabled.
- The memremap() mechanism for the remapping of device-hosted memory has been merged. Also merged is the "struct page provider" patch set (described in this article) that creates page structures for nonvolatile memory as needed.
The merge window is set to remain open through September 13, but the pace has clearly slowed. It is probably fair to say that we have seen the bulk of the changes that will go into the 4.3 kernel. That said, tune in next week for a summary of any remaining changes that slip in before the merge window closes.
Identifier locator addressing
Companies that run huge data centers have an obvious incentive to get the most performance possible out of each of their vast number of machines. Virtualization and live migration help by allowing tasks to be moved between machines so that each can be run at full utilization, but there is a problem: how do cooperating jobs find each other as they are moved across the data center? Numerous solutions to this problem exist; the 4.3 kernel will have another one, in the form of a technology called identifier locator addressing, or ILA.ILA, which will work only with IPv6, is built on a simple idea: each task in the data center is assigned a unique identifier that is not tied to any specific location in the net. That identifier is built into that task's IPv6 network address; the networking subsystem then does the necessary magic to route packets properly between them, changing the routing as needed as the task moves between machines.
The details of how ILA works can be found in this draft RFC, written by Tom Herbert, who also happens to be the author of the ILA patches merged into the mainline for 4.3. In short, ILA splits the 128-bit IPv6 network address space into two 64-bit fields; one contains the identifier, the other the locator. The identifier is, as described above, a unique number identifying the task in the center. With 64 bits to play with, ILA can identify enough tasks to work in even the biggest data center — for the foreseeable future, at least. The identifier is not tied in any way to any specific physical machine in the data center. The locator, instead (stored in the upper 64 bits of the IPv6 address), uniquely identifies a physical interface on the network; a packet with an ILA address can be routed across the network using just the locator field.
A task wishing to communicate with another does not know that locator, though; all it knows is the identifier of the task it needs to talk to. This task will put a special "standard identifier representation" (SIR) prefix into the locator field, while the destination task's identifier goes into the lower 64 bits. The resulting "SIR address," which does not correspond to any actual system on the net, indicates to the networking subsystem that the address uses ILA and that the true locator must be filled in at transmission time. In practice, this SIR address will likely be obtained via a DNS lookup and need not be constructed by the task at all, of course.
The task will then open a network connection to the SIR address for the service it needs to contact. The networking stack cannot route the SIR address as-is, though, since that address doesn't correspond to any specific target on the net. Instead, it must find the real machine hosting the task with the given identifier and replace the SIR prefix with a proper locator corresponding to that system. It is thus almost like performing an ARP lookup on the identifier portion of the address. Once the real destination has been placed into the locator field, the packet can be sent on its way. The receiving system will, prior to handing the packet to the application, convert the ILA address back to a SIR address by putting the SIR prefix back into the locator field.
The SIR address will be used for the duration of the connection; it will continue to work even if the addressed task is migrated in the middle. That naturally means that the identifier lookup and SIR-prefix replacement must be done on each outgoing packet. It's worth noting that SIR addresses can be used for both endpoints of a connection, but it's not mandatory. The end result of all this should be a low-overhead mechanism for virtualization of network addresses within a data center. There is no encapsulation or other trickery required; it essentially comes down to a single address-translation step.
There is one little catch, of course: the kernel must somehow keep up with the proper locator value for each identifier of interest. As documented in the networking merge commit, the table of translations can be maintained by way of some extensions to the ip command. In practice, of course, nobody who needs a technology like ILA is going to mess around with ip commands; there will, instead, be some sort of central job-management system that maintains that mapping. How mappings (and changes) will be propagated through a data center is not addressed by the code in the kernel; that's a task for higher-level software. The good news is that mappings are not expected to change all that often (task migration is expensive, so it shouldn't be done more often than is strictly necessary), so the identifier-to-locator mapping can be effectively cached most of the time.
The ILA implementation in 4.3 appears to be a bit of a work in progress. It works, but it suffers a 10% performance penalty with respect to routing without ILA. The source of the slowdown seems to be known, and Tom has promised that it will be dealt with in a forthcoming patch set. There are also difficulties in the interaction with IPvlan [PDF] that should also be fixed in the future. Meanwhile, the core of the new feature is in the mainline and available for those who would like to play with it.
Among other things, ILA is a sign that IPv6 is finally coming into its own. It was not that long ago that IPv6 would not have been considered for performance-sensitive settings like data centers; it is easy enough to use an isolated IPv4 network and avoid the performance issues and application compatibility issues that came with IPv6. But most of those issues have been resolved, and the pressure to move toward IPv6 continues to increase. As technologies like ILA come along and make use of the greatly expanded IPv6 address space, IPv6 may increasingly come to look like the more fully featured alternative.
The LPC Android microconference, part 1
The Linux Plumbers Android microconference was held in Seattle on August 20th and looked at a number of topics needing coordination between various players in the Android ecosystem. It was split up into two separate sessions; this summary covers the first three-hour session. Topics covered the state of the staging tree, USB gadgets and ConfigFS, running mainline on consumer devices, partitions and customization, a single binary image for multiple devices, Project Ara, and kdbus.
The microconference started after lunch, following the Graphics microconference, where there was also quite a bit of Android-related discussion. The Android microconference was held in two parts: in the afternoon and evening. In total, it went on for about six and a half hours until 8pm.
State of staging
After a brief outline of the schedule by microconference lead Karim Yaghmour, the sessions started with a review of the Android-related code in the staging tree and a quick discussion of what to do with it all, led by staging maintainer Greg Kroah-Hartman. Ashmem, timed_gpio and timed_output, the low-memory-killer, the sync framework, and ION were all listed, and one by one they were addressed.
Ashmem has close parallels with memfd, which is really only missing the memory-unpinning feature of ashmem. Memory unpinning, which marks memory as eligible to be discarded by the kernel, was unsuccessfully submitted upstream via the "volatile range" patches. While that effort has stalled out, it seems like ashmem going upstream as-is would be somewhat duplicative, and adding something like memory unpinning to memfd (generically or via an ioctl()/shrinker interface similar to what ashmem uses) would be preferred. The outcome of the discussion was to leave ashmem in staging for now and to revisit it again next year.
For the timed_gpio and timed_output drivers, it was claimed that no one was using them, and that the LED-trigger infrastructure should be sufficient to replace them. The Android developers mentioned that the Android vibrator hardware abstraction layer (HAL) was still using them, but shrugged about removing them. Everyone seemed to be OK with dropping it from the kernel, if only to see if anyone would complain, but many worried that vendors would just re-add it instead of moving to LED-triggers.
The mempressure notifier functionality was merged upstream to replace the low-memory-killer back in 3.10, and the Google developers did work to create a user-space low-memory killer daemon that uses it. Unfortunately, it was found not to perform suitably on real devices and the Google developers fell back to using the low-memory-killer again. Not enough debugging has been done to determine why exactly the mempressure notifiers aren't working well. So work is needed, and removing the in-kernel low-memory-killer would be premature. That said, the in-kernel low-memory-killer isn't problem-free, and has been getting bug reports and patches to change its heuristics from various different vendors. Since there is not a clear maintainer, there was a call for who might review submitted patches. Riley Andrews and John Stultz volunteered themselves.
Next up was the sync framework. As had been discussed in the Graphics microconference (Etherpad notes), sync points/fences still needed to be integrated with much of the atomic mode-setting work that was recently merged upstream. There is a need for an open-source user space before changes are merged into DRM; specifically, folks would like to see an open-source HWComposer implementation for Android that uses KMS/DRM (though there is some dispute over if that exists already or not), where that sort of sync integration would take place. So, at the moment, while it's clear it's on a number of folks' to-do lists, it's not clear that anyone is working on de-staging the code yet, so it will remain in place for now.
Finally, the ION memory allocator was discussed. As noted in the Graphics microconference, a few different proposals have been made, but there is no clear decision on what to do with them or which to prioritize. Since no real resolution was found through that discussion, folks explained there would be a mini-BOF the next day to try to hash things out. The core issue with ION is that, while it is used to solve the problem of how to allocate memory to share between different devices with different constraints, it requires that the applications using it have a detailed understanding of the constraints of the hardware, which makes it not very useful for writing applications that support different devices. There is also the problem that there aren't good examples of upstream users that need ION, but it continues to be needed for a number of out-of-tree solutions (which are difficult to merge without ION-like functionality properly upstream). New patches are being submitted against ION that add new features and try to establish things like device tree bindings, which Kroah-Hartman has been stalling for now. It shows that the longer it takes for an alternative solution to be merged, the more entrenched ION potentially becomes.
Kroah-Hartman also asked why nothing new has been added to staging, and it was explained that, while there is a fair amount of out-of-tree code used by Android, the remaining parts are not as independent as what has historically been put into staging. For the most part, what is left is stuff that modifies existing in-kernel code in a way that isn't generic enough or suited yet for upstream.
So the main outcome of the discussion was that timed_gpio would be dropped from staging, and that users should move to the LED-trigger infrastructure, or complain loudly.
USB gadgets and ConfigFS: status & future
The next topic was the "ConfigFS gadget driver" (slides [ODP]), which is a gadget driver that is controlled using ConfigFS. The Android developers are starting to use this driver as they are migrating away from the Android gadget driver, which was a forerunner to the ConfigFS gadget (see the summary from Plumbers 2013 and these slides [PDF] for further background). Andrzej Pietrasiewicz, who is the ConfigFS gadget maintainer from Samsung, outlined why runtime dynamic gadget functionality is useful: because it allows various functionality to be used in different combinations, unlike the statically configured gadgets that previously had to be defined at compile time. Pietrasiewicz then covered a bit of detail on how one configures the ConfigFS gadget to support various "functions" through the ConfigFS filesystem interface. One issue he brought up was that since ConfigFS modifications happen in multiple steps, there isn't one single moment when resources are allocated and bound in the kernel. Instead, allocations and binds are done at various different points as the gadget is configured and enabled. Pietrasiewicz sees this as somewhat problematic, and thinks that all the resources needed for a function should be generated when the directory is created in the ConfigFS mount.
At this point, Badhri Jagan Sridharan, a developer on the Android team who is working on migrating Android to use the ConfigFS gadget, agreed that resource allocation during the mkdir() operation would make more sense and require less time to switch between configurations. He had a few questions about Android's migration to ConfigFS, and asked if there were any guidelines for allocating instances versus functions and when to bind. Pietrasiewicz replied that it all depends on the function, and that there aren't any guidelines currently.
There were some further questions on details, but the session was running over time, so Sridharan and Pietrasiewicz carried on further conversation in the hallway track.
Barriers to running mainline on form-factor devices
The next session was a hurried presentation (slides [PDF]) by John Stultz from Linaro. The benefits of running mainline on off-the-shelf consumer devices (i.e. form-factor devices), while not particularly compelling to end users, are mostly for kernel and android developers. It provides a way to do continuous testing so that upstream issues can be caught early. It also brings more awareness of the unique aspects of mobile environments to upstream developers. The main barriers to doing this break down to hardware-related and software-related issues.
For hardware, the core requirements are an unlockable bootloader and access to a serial UART, which usually requires breaking the case open and soldering. Google has an interesting solution with its headphone debug UART cables on Nexus devices, but not all vendors implement it. A serial alternative mode for USB-C would be nice, but still requires standardization. Binary blobs, while not actually hardware, do effectively constrain the hardware that can be used with upstream kernels.
On the software side, the out-of-tree Android patchset has classically been a blocker but, since 3.4, enough of the Android tree has been merged upstream that limited testing can be done. Also, over the last few Android common kernels there's been steady decline in the amount of code out of tree. There are still a few substantial features out of tree but, for the most part, it's all fairly easily forward-ported so its not as much of a problem as before.
The biggest issue right now, as Tim Bird and others have been talking about, is probably lagging vendor system-on-chip (SoC) support upstream. For the Nexus device kernels that Google releases, we're looking at 1-2 million lines of out-of-tree code, and non-Google trees are apparently as bad as 3 million lines. Even so, a few vendors have been getting much better at upstreaming recently, so hopefully with this year's device releases we'll see if things have improved.
Given the above constraints, the 2013 Nexus 7 seems like a good device to target, and work is in progress. Currently the device is booting from MMC, has a working serial port, USB ConfigFS gadget support for ADB, and MTP support is there as well (with some quirks). GPIO button support for power and volume buttons as well as touch-panel input seem to be working as well. All of this is with around 32 patches, most of which are simple device-tree changes or Android build-integration changes.
Display support was hoped to have been done by now, but isn't; however, Bjorn Anderson and Rob Clark have that already working on the Sony Z3 (which is a related device). Apparently Anderson also has WiFi and the modem working on his device as well. Thanks are due to the folks who have been actually writing and upstreaming the device support required. So far, this has mostly been an integration effort, not a development one, which is due to everyone who did the hard work to make that possible.
There are still a number of areas that need work, such as display, battery charging, power management, WiFi, sensors, camera, and so on. And it's interesting that those issues are common sticking points for most SoCs, which the upstream community should take to heart. What we have upstream may be incomplete or too difficult to work with; we really need more folks working in these areas.
Benefits are already being seen from the effort, as it's helped crystallize which out-of-tree patches are most critical to get merged, and has provided a testing ground for the ConfigFS gadget transition.
Android, partitions, and customization
The next session (slides [PDF]) was by Rom Lemarchand from the Google Android team and covered some of the changes the team made for the Android One effort to ship common kernels and disk images for multiple devices. In the ideal world, there would be one user-space image per architecture, the bootloader would detect the device, populate the device-tree tables so that a common kernel with no out-of-tree patches would boot, and all would be well. But, in reality, every device has customized user space and kernels; vendor kernels have huge diffs from mainline and lots of functionality is being pushed out to proprietary user-space logic or even to trusted OS environments.
For the Android One effort, it was decided to have a single kernel and user space for every device "family", which is basically a set of devices using the same SoC with minor changes in hardware like sensors or cameras. For Android One, the /system partition should only have truly generic architecture-specific code. Since it previously contained all binaries and assets for specific devices, a common /system partition is achieved by keeping the device-specific data in separate partitions.
The /vendor partition that was introduced with the Nexus 9 provides a way for SoC-specific drivers to be included, like graphics drivers or core power-management libraries. Android One introduced a /odm partition that is meant to contain device-specific drivers like the sensor HAL, etc.
Finally, the /oem partition was introduced to store things like background graphics, ringtones, and other OEM-level customization for the device. This allows the different partitions to be updated independently (though the /oem partition is not verified and thus is not able to be updated in the current scheme), and allows each partner involved to do the customization wanted at the right level.
It was asked if the /odm partition could contain kernel modules. Lemarchand said that it could and the immediate follow-up question was who signs those modules. He clarified that while the original design manufacturer (ODM) signs the /odm partition and can do over-the-air (OTA) updates independently, any kernel modules included in that partition have to be signed by Google to be loaded since the kernel uses signed modules. Thus vendors can't create their own signed kernel modules.
Mark Gross from Intel also brought up a pain point that he has: in order to test with new kernel modules, he has to re-flash the system partition with those modules, which causes dm-verity to fail. He'd like to see a untrusted partition for kernel modules, since those are already signed. It was asked if being able to load custom modules remotely via ADB would help; Mark said that it would be interesting, but probably not helpful since the whole kernel and all the modules would need to be updated and tested together.
Running a single Android binary image on multiple devices
The next talk (slides [PDF]), which continued on the theme, was by Samuel Ortiz and covered the work done by Intel's Open Source Technology Centre on the Intel Reference Design for Android (IRDA) effort. He said that IRDA was similar to Android One, but for x86 tablets. The goal was to minimize changes to the Android Open Source Project (AOSP) code and to support fairly small hardware differences like changes to GPS, WiFi, and Bluetooth.
The first major issue the project ran into is that, with the current build system design, changing any single device results in changes in the user-space images. And there are many issues in trying to solve this. For instance, kernel modules can help, but Android's insmod lacks the module-dependency handling normally found in modprobe. There's no easy way to dynamically select different HALs, so for devices that have different types of GPSes, you have to have different HALs for each one. For subsystems like WiFi, there are different wpa_supplicant binaries needed to support different vendors. And some things require configuration values or permission files generated at build time. So the desire is to find some way for this to all be done dynamically at boot time, rather than at build time.
IRDA's approach is to use an autodetection daemon that uses ACPI tables to detect the hardware on the device, then, for that hardware, it triggers predefined actions. The project modified libhardware, so that it talks to the daemon to determine which HAL should be used for the device. For some hardware, like GPUs, the drivers need to be in specific locations, so its image includes multiple HAL drivers in different directories, then its daemon bind mounts the appropriate HAL directory into the right place in the file tree. It also includes a FUSE filesystem driver for /etc/permissions/ that generates the right XML permissions files for the device at runtime.
He ran through some examples of how this works for specific devices, and also outlined some areas where the most trouble was seen. Dynamic HAL selection is probably the biggest issue and something that would help would be to have better reference HALs in AOSP, as most vendors are shipping majorly hacked up implementations, usually based on ancient versions. The project has worked on improving Bluetooth support upstream so that the AOSP reference HAL can work more generically and there isn't the need for vendor-specific Bluetooth HALs. Solving kernel-module-dependency resolution would be nice to have as well.
Yaghmour, of Opersys, asked if this worked when devices disappeared and reappeared at runtime, or if it was mostly for boot time only. Samuel clarified that it could be used with dynamic changes at runtime, but various XML permissions files need to be removed. Yaghmour noted that parts of the framework are unhappy when devices disappear.
It was asked which of the reference HALs Ortiz saw as worst overall. To that, he replied "the binary ones" and wished that the reference HALs were more robust, allowing hardware vendors to not waste time with implementing bad custom HALs. He called out GPS, Bluetooth, NFC, and Graphics specifically as being bad and noted he couldn't point to any one AOSP reference HAL being widely used. Though Brown noted that the Audio HAL was fairly good and, while vendors do tend to copy it, it's reused for the most part, so that's progress.
Dmitry Shmidt from Google asked if the probing and autodetection impacted boot time. Ortiz replied that it varies but can be on the order of 500ms or so, which, for the Android boot time, is quite a small percentage.
Adapting Android for Ara
Karim Yaghmour was up next with his talk (slides [PDF]) about Project Ara and how it's adapting Android for the project's needs. After clarifying that he doesn't speak for anyone but himself, he began by covering the overall goals of Ara, which are to introduce more variety into the hardware ecosystem and to provide a platform for hardware developers that is something like what the mobile app platform is for software developers.
Yaghmour provided an overview of some of the most interesting technologies being used by Ara, starting with the MIPI UniPro interconnect that is used as the bus to connect the modules. He also covered the endoskeleton frame, the contactless capacitive connectors used by the modules, as well as the Electro-Permanent-Magnets (EPMs). The endoskeleton connects all the modules and basically acts as a UniPro switch, allowing for even direct module-to-module communication. The EPMs are currently used to hold the modules to the endoskeleton, but the news of the day was that the EPMs weren't working out for the project. Each module can either consume or provide charge, allowing for battery modules to be included with other module functionality, and allowing for multiple batteries to be used together. Finally, each module can have a custom printed cover, which allows for even further user-directed design.
Yaghmour noted that UniPro supports a number of standard protocols, but for Ara there was quite a lot that needed to be developed. So the project created Greybus, which provides the protocol that handles setting up the switch, module notifications, power-management, and standard class protocols for a number of different devices (camera, audio, Bluetooth, GPS, etc.). Since initially there weren't devices implementing these class protocols, it also provides "bridged PHY" native protocols like i2c, USB, UART, SDIO, and so on, which can be run over Greybus. Yaghmour also noted that, while hardware isn't out yet, there is the gbsim simulator for Greybus, which folks can check out now if they're interested.
There were some questions on how Greybus handles dynamically adding buses that cannot be probed like i2c, and to what extent the existing drivers needed to be changed. After some confusion about the question, Yaghmour said that the module provides a manifest of what it contains, and the related Greybus driver creates the platform device based on that knowledge of the module, so that the existing driver can be reused with as little change as possible. It was also noted that the plan is that the devices would use the class drivers, so the bridged PHYs are sort of a short-term solution until class-based hardware is available.
Yaghmour then talked about how Ara was extending Android to handle dynamic device changes at runtime. He clarified that, after some various proposals were run by the Android team, the Ara project settled on adding the dynamic device handling in the HAL layer, so that changes to the framework layer could be minimized. It will then work on each HAL layer to try to extend it so the framework can handle when devices are unplugged. For the most part, this means it supports the HAL interfaces for devices even if the device isn't present. So, for the camera when the camera isn't loaded, it provides a "no camera" screen to the interface. This allows iterative changes through each subsystem, which will likely have more success than trying to change all the subsystems at once.
There was a comment from Marcel Holtmann, the Bluetooth maintainer from Intel, that this seems problematic since, for some functionality like Bluetooth, the pairing is bound to the physical hardware address of the radio. Yaghmour agreed and noted that if you replace your Bluetooth module with a different module, you would have to re-pair your devices.
There was also a question of where the sensors live; Yaghmour answered that they are usually paired with other devices. An example is that the rotation sensor is included with the screen, which was useful because it allowed a known orientation in space (i.e. which way is up), whereas if it were on other modules, it could be inserted in different slots or with a different orientation, so the system would have to figure out the module orientation relative to the display to understand the sensor data.
Yaghmour then gave a brief recap of the previous sessions, noting that there are a number of different approaches being used to try to make Android more generic and to allow it to support more devices without custom builds. The first, by Lemarchand, handled supporting different devices at flashing time by using a combination of generic and hardware-specific partitioning. The second, by Ortiz, talked about supporting different devices at boot time from a single disk image, using bootloader/firmware data to do bind mounts and FUSE filesystems. And, finally, his talk covered handling different devices at runtime through extending the HAL layer to support hotplugging standard classes of devices.
Integrating kdbus in Android
The last talk (slides [PDF]) of the afternoon session was from Pierre Langlois from ARM about his summer project of trying to run Android's libbinder over kdbus. Langlois covered some potential benefits of the project, which included alleviating the concern of duplicating work and maintainership with two new interprocess communication (IPC) mechanisms being introduced into the kernel, the curiosity of whether it could work, and what each side might be able to learn from trying to run libbinder over kdbus. Being a summer project, Langlois did limit the scope of his work to just see if it was possible to make libbinder run over kdbus; security implications or performance were not a focus.
He provided a basic outline of how binder works, with a simple example of how one might add two numbers together using binder. He also covered how the remote services are named, registered, and found through a special process called the ServiceManager that all processes register with. Langlois then dived into details of the binder kernel driver, and how it maintains per-process memory pools, worker threads, reference counts, and dispatches data from one process to another. The kernel driver also has a special call to identify the ServiceManager process. All of the binder kernel interfaces are abstracted out by the libbinder library.
He then covered an overview of kdbus and what it provides, noting that kdbus keeps track of the service naming internally in a registry, which can be probed by processes to find services, and can also can provide notifications. His assessment was that it was sufficient to implement binder transactions, so Langlois implemented libkdbinder, which is a drop-in replacement for libbinder that is implemented using kdbus.
There are some conceptual differences that had to be resolved. Since kdbus provides a service-name registry, Langlois got rid of the ServiceManager component in Android and provided compatibility calls into the kdbus service-list functionality. Also, since binder normally handles thread pools to service requests, and kdbus does not manage worker threads, libkdbinder needed to spawn off and manage the threads itself. With these issues worked out, he was able to successfully parcel up binder transactions, send them with kdbus synchronous messages, and unparcel the returned results.
So far, he has a working proof of concept that passes the binder test cases found in AOSP. This progress actually surprised some of the Android developers, who quickly asked if it supported some of the more obscure binder features. Langlois noted that, while the really obscure weak-references feature, which doesn't actually seem to be used in binder, isn't implemented, most of the other functionality is there.
He then mentioned that he did try booting Android with libkdbus to see if it would work, but ran into one unexpected problem: kdbus seemed to fail when passing ashmem file descriptors over it. It seems that kdbus only allows for sealed memfd file descriptors to be sent, which is more restrictive than what Android uses for ashmem file descriptors shared over binder. Folks in the room were a little confused, as they thought it really ought to work, but there wasn't anyone who could speak authoritatively on why this issue was a problem. Langlois thought that probably kdbus needs to be extended to handle ashmem file descriptors, but had run out of time to debug the issue further. He suspects that after the ashmem issue is resolved, there will probably be other blockers that pop up, but in the end he thinks getting it working to feature parity is entirely doable.
For future work, Langlois sees reviewing the security implications as a high priority. There were also questions of allowing for more than one bus, since running binder names on a desktop kdbus might cause problems. Folks in the room seemed confident that it would be easily doable to implement a separate binder kdbus instance.
Andrews, who is one of the binder maintainers from Google, was a little skeptical that performance parity would be possible, as binder is well-optimized for its use case. One of his current tasks is reworking the binder kernel driver to break apart some of the locking that is causing lock contention on devices with a higher CPU count. One thing he did note is that in doing his rework, he's planning to greatly extend the binder test cases, so that he can be confident he doesn't break anything, and that work will likely provide better test cases for libkdbinder to be tested against.
The code isn't yet released, but Langlois and Serban Constantinescu from ARM were working to get the code released soon and possibly plan to hand the project off to Linaro. Stultz pressed that getting this implementation out and being able to provide analysis of the performance difference or any restrictions (like the ashmem issue) that kdbus has would be important. Since, if there any changes that might be needed in kdbus, it would be particularly useful to know about them soon, as it could affect the conversation with regard to kdbus getting merged.
And with that, the first session was over, and folks took a break. The second session will be summarized in an upcoming article.
[Thank you to all the presenters for their discussions, Karim Yaghmour for organizing and running the microconference, and Rom Lemarchand for helping get so many of the Google Android team to attend.]
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Device driver infrastructure
Filesystems and block I/O
Memory management
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
Bringing Git workflows to Debian with dgit
When introducing his talk at DebConf 2015 in Heidelberg, Ian Jackson said he was there to "plug" dgit, which is a system that lets users treat the Debian archive as a remote Git repository. In fact, of course, dgit has already proven itself popular among Debian Maintainers (DMs) and other users, since it allows Git-based workflows for patching and uploading Debian packages—and does so without disrupting Debian's existing infrastructure. But there are still quite a few DMs who do not use dgit for the packages they maintain, so Jackson made a case that the tool enriches Debian as a whole the more it is used.
Debian volunteers take on a variety of different roles, he said, which boil down to "package maintainer" (i.e., DM) and "everything else" (a category that covers those people doing bug squashing, downstream projects and derivatives, users, and those doing non-maintainer uploads or NMUs). Dgit offers different advantages for DMs than for others, he said, so he addressed the groups separately.
Dgit for non-maintainers
For people in the "everyone else" group, the point of dgit is that they can access the archive like a repository. A user can clone any package in any suite (e,g, "stable," "unstable," or "experimental") and will get a source tree that is identical to the output of dpkg-source -x. This result is the same regardless of any choices made by the package maintainer (e.g., preferred packaging format, Git workflow, etc.). As Jackson addressed from time to time, adhering to that principle of identical output affects how dgit behaves in a number of key situations.
![Ian Jackson [Ian Jackson at DebConf]](https://static.lwn.net/images/2015/09-debconf-jackson-sm.jpg)
With package source fetched through dgit, Jackson said, the user can then work with the code exactly as they would with any other Git project: local commits, cherry-picking changes from other branches, resetting, cleaning, rebasing, and all of the other "gittish stuff" is supported. Actually, he added, git log and git blame are a bit different, but he would explain why during the talk.
For typical tasks, such as creating patches, users would use Git itself. Users can push their changes to any Git server that they have access to—although pushing to the dgit server does not automatically pass changes through to the Debian archive. Only a user with the proper permissions—namely, a DM or a Debian Developer (DD)—can do that. That rule guarantees that what is fetched with dgit is always identical to what resides in the archive.
But because a source tree fetched with dgit can then be pushed to any other remote Git server, downstream projects and derivative distributions can use dgit and do away with manually wrangling all of Debian's source packages—downloading and importing them into a version-control system in particular.
It is important to understand, he said, that dgit does not replace any build infrastructure (though it does provide wrappers for several common Debian package-building tools). Even DMs using dgit still have to perform builds before they upload a new binary package to the archive. As Debian improves support for source-only uploads, that may change, but for now there is a distinction between source uploads and binary uploads, and dgit does what it can to support the build process.
Behind the scenes, he explained, dgit provides a set of Git repositories that is parallel to the archive, although it runs on a different server. When a privileged user runs dgit push on a package, two things happen. First, dgit tags and pushes to the remote dgit server. Then it performs a traditional package upload to the archive—except that one additional field is added to the package's .dsc source-control file: the git commit hash.
After that push, whenever any user does a dgit clone or dgit fetch, dgit looks for the hash field in the archive's source package. If there is one, dgit uses the corresponding commit from the Git history on the dgit server to complete the operation. If there is no commit-hash field, that means the package's most recent upload (and, perhaps, many or even all uploads) did not come through dgit, so dgit imports the package into Git. If necessary, it stitches the newly imported version into any existing Git history in the dgit server.
The dgit server only stores changes pushed to the server using dgit. As mentioned, when a DM does a dgit push, the altered package is uploaded to the archive. When an unprivileged user doing bug-fixing runs dgit push, something different happens: dgit takes the user's sequence of commits and turns it into an ordered sequence of patches that the package maintainer can use. That behavior, Jackson said, means that it always remains up to the DM's discretion whether or not to use dgit. Those that do not use dgit still get patches that they can incorporate into their workflow, and the history of other users' work is still available to the public. On the other hand, he cautioned, users creating patches with dgit must not also submit their patches some other way, lest the DM (and dgit) get confused.
Sadly, he said, when DMs do not use dgit for a package, dgit's history for the package in question will clearly not include everything in the DM's history, so some potentially useful information is lost. Dgit attempts to work around this choice; it looks in each source package for an X-Vcc-Git header, which a maintainer might use to indicate that they are working from some other Git server. If the header is found when a user clones a package, dgit adds the indicated server as a remote in the user's Git configuration.
But, even then, dgit still bases its repository contents on the source package in the archive. That preserves the principle that dgit mirrors the archive, and it covers those situations when the Git server listed in the header drops offline or simply does not exist. In addition, he said, there are quite a few maintainer trees specified in the X-Vcs-Git header that only contain packaging data (like the debian/ directory or a set of incoming patches).
Dgit for maintainers
Quite a few users like dgit and would love for more DMs to start using it, too, Jackson said. It makes the maintainer's history visible to the users in a uniform fashion, whereas relying on X-Vcs-Git means that each user much learn each DM's "special snowflake workflow." But, he said, there are several other reasons why DMs could benefit from using dgit.
First, it has the potential to simplify maintenance tasks. Dgit makes users' branches and patches readily accessible for merging, and when both the user and the DM are using dgit, one can more readily count on their Git histories being in sync. Second, using Dgit automatically publishes the DM's Git history online (at browse.dgit.debian.org), saving the DM the additional overhead of publishing their history.
Dgit also ensures that the source the maintainer uploads to the Debian archive is exactly the same as contents of their Git HEAD. And it can spare DMs some additional tests and sanity checks on the .dsc and other package control files.
Jackson then explained how dgit integrates with the various Debian packaging workflow tools in use by DMs. The simplest case is for a DM using native or source-format 1.0 packages. Those packaging options require no changes to the source tree fetched with dgit, so DMs using them can adopt dgit immediately.
For packages in the 3.0 "quilt" format, things are more complicated, because the quilt source-tree format might differ from the Git tree. If the DM uses git-buildpackage, then what dgit produces is essentially what Jackson called a "patches-applied packaging branch without a .pc directory." In other words, all changes made against the upstream source have been applied in the source tree and are also included in a patch that is kept in the debian/patches/ directory. Jackson said he had been collaborating with git-buildpackage maintainer Guido Guenther to bridge the gap between what git-buildpackage expects and what dgit currently produces.
If the DM uses git-dpm (the other main tool for working with quilt), however, the outlook is less rosy. The big hurdle there is that git-dpm ignores .gitignore files in the package when it performs the build, which means that those files are then lost when the resulting package is uploaded to the archive. That breaks the cardinal "always be identical to the archive" rule of dgit, Jackson said, but so far he has not been able to convince git-dpm maintainer Bernhard Link to change git-dpm's behavior. If Link cannot be convinced, Jackson said, he may have to add a git-dpm–specific workaround.
Jackson closed out the session by listing a few items still on his to-do list for dgit. One outstanding problem, for example, is that many packages currently include files that are not in the DM's Git branch—such as autotools output. There is no one-size-fits-all solution to handling these extra files, since maintainers' workflows vary, but he thinks it is solvable. There are also new potential uses for dgit, he said. The server already manages access control for dgit users, for instance, so perhaps it could offload some of that responsibility from the Alioth project-hosting server. There is also a rumor that Ubuntu is interested in running its own dgit server, which could bring new developers and patches to the project.
From the outside, dgit may appear to go to a lot of trouble to unite two disparate ways of working with software: the Debian archive provides a central, world-readable store of source packages, while Git is aimed at enabling multiple remote developers to work in a distributed network. But Jackson reminded the audience what it gets out of the deal: anyone can download a Debian source package and see both the original product of the upstream developers and every patch applied by Debian. That is a valuable record to preserve; dgit simply makes it accessible from within the world's leading version-control software, too.
[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]
Brief items
Distribution quotes of the week
Distribution News
Debian GNU/Linux
Debian stable releases
Debian 8.2, the second update to the stable distribution v8 "jessie", has been released.The oldstable distribution, Debian 7 "wheezy", has been updated to v7.9.
These updates mainly add corrections for security problems and other serious issues.
Newsletters and articles of interest
Distribution newsletters
- Debian Project News (September 2)
- DistroWatch Weekly, Issue 626 (September 7)
- openSUSE weekly review (September 10)
- Ubuntu Weekly Newsletter, Issue 433 (September 6)
Page editor: Rebecca Sobol
Development
Easier Python string formatting
Some languages pride themselves on providing many ways to accomplish any given task. Python, instead, tends to focus on providing a single solution to most problems. There are exceptions, though; the creation of formatted strings would appear to be one of them. Despite the fact that there are (at least) three mechanisms available now, Python's developers have just adopted a plan to add a fourth. With luck, this new formatting mechanism (slated for Python 3.6) will improve the traditionally cumbersome string-formatting facilities available in Python.Like many interpreted languages, Python is used heavily for string processing tasks. At the output end, that means creating formatted text. Currently, there are three supported ways to get the same result:
'The answer is %d' % (42,) 'The answer = {answer}'.format(answer = 42) s = string.Template('The answer is $answer') s.substitute(answer=42)
The traditional "%" operator suffers from some interesting lexical traps and only supports a small number of types. The format() string method is more flexible, but is somewhat verbose, and the Template class seems to combine the shortcomings of the previous two methods and throws in yet another syntax to boot. All three methods require a separation between the format string and the values that are to be formatted into it, increasing verbosity and, arguably, decreasing readability, while other languages have facilities that do not require that separation.
f-strings
Other languages, such as Perl and Ruby, have more concise string-formatting operations. With the debut of the string interpolation mechanism described in PEP 498, Python will have a similar facility. This PEP introduces a new type of string, called an "f-string" ("formatted string") denoted by an "f" character before the opening quote:
f'This is an f-string'
F-strings thus join the short list of special string types in Python; others include r'raw' and b'byte' strings. The thing that makes an f-string special is that it is evaluated as a particular type of expression when it is executed. Thus, to replicate the above examples:
answer = 42 f'The answer is {answer}'
As can be seen, f-strings obtain the value to be formatted directly from the local (and global) namespace; there is no need to pass it in as a parameter to a formatting function or operator. Beyond that, though, what appears between the brackets can be an arbitrary expression:
answer = 42 f'The answer is not {answer+1}' f'The root of the answer is {math.sqrt(answer)}'
So formatted output can be created with expressions of just about any complexity. These expressions might even have side effects, though one suspects that would rarely be a good idea.
Under the hood, the execution of f-strings works by evaluating each expression found in curly brackets, then invoking the __format__() method on each result. So the following two lines would have an equivalent effect:
f'The answer is {answer}' 'The answer is ' + answer.__format__()
A format string to be passed to __format__() can be appended to the expression with a colon, thus, for example:
f'The answer is {answer:%04d}'
One can also append "!s" to pass the value to str() first, "!r" to use repr(), or "!a" to use ascii(). So, once again, the following two lines would do the same thing:
f'The answer is {answer:%04d!r}' 'The answer is ' + repr(answer).__format__('%04d')
That is the core of the change. There are other details, of course; see the PEP for the full story. The PEP was accepted by Python benevolent dictator for life Guido van Rossum on September 8, so, unless something goes surprisingly wrong somewhere, f-strings will be a part of the Python 3.6 release.
Where next?
PEP 498 was somewhat controversial over the course of its development. There were a number of concerns about how f-strings fit into the Python worldview in general, but there was also a specific concern: security. In particular, Nick Coghlan expressed concerns that f-strings would make it easy to write insecure code. Examples would be usage like:
os.system(f'cat {file}') SQL.run(f'select {column} from {table}')
In either case, if any of the values substituted into the strings are supplied by the user, the result could be the compromise of the whole system. The problem is not that f-strings make it possible to incorporate untrusted data into trusted strings — that can just as easily be done with existing string-formatting mechanisms. And the problem is certainly not that f-strings make string formatting easier in general; Nick's specific concern is that f-strings will be the easiest way to put strings together, while more secure methods remain harder. Using an f-string to format an SQL query will be easier to code (and to read later) than properly escaping the parameters, so developers will be drawn toward the insecure alternative.
His suggestion, as described in PEP 501, is to make the secure way as easy to use as the insecure way. The result is "i-strings"; they look a lot like f-strings in that the syntax is nearly identical:
i'The answer is {answer}'
There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job. To see the difference, consider the two lines below, which have equivalent effect:
print(f'The answer is {answer}') print(format(i'The answer is {answer}'))
The key to Nick's proposal is that format() can be replaced with another formatting function that knows how to escape dangerous characters in the intended usage scenario. Thus:
os.system(sh(i'cat {file}')) SQL.run(sql(i'select {column} from {table}'))
The sh() formatter would ensure that no shell metacharacters get through, while sql() would prevent SQL-injection attacks. These formatters would be easy enough to use that developers would not be tempted to bypass them. Just as importantly, static analysis software could easily distinguish between safe and unsafe string usage for a given API, making it possible to automatically detect when the wrong type of string is being used.
PEP 501 has been through a long series of revisions, involving significant
changes, since first being posted. At times the syntax was rather more
complicated, prompting Guido to ask:
"Have I died and gone to Perl?
". Nick's proposal had
originally been intended as an alternative to PEP 498, but, over time,
Nick warmed to the f-string approach and
came out in favor of its adoption. PEP 501 remains outstanding,
though, and will likely be pursued as an extension to f-strings.
That work, too, could conceivably happen in time for the 3.6 release, which is planned to happen in late 2016. Given its volatile history thus far, chances are that the end result will look somewhat different from what has been proposed to date. However it turns out, though, Python should no longer have to defer to other languages when it comes to the ease of creating formatted output.
Brief items
Quotes of the week
Samba 4.3.0 released
Samba 4.3.0 is out. This release has a lot of new features, including a reworked logging system, a new FileChangeNotify subsystem, better trusted domains support, SMB 3.1.1 support, and more.systemd v226 is available
Version 226 of systemd has been released. Notable changes include support for the kernel's unified control-group hierarchy, predictable interface names for virtio devices, support for filtering out kernel threads when counting a control group's tasks with cgtop, and several new features in networkd's DHCP implementation.
Newsletters and articles
Development newsletters from the past week
- What's cooking in git.git (September 9)
- Git Rev News (September 9)
- LLVM Weekly (September 7)
- OCaml Weekly News (September 8)
- OpenStack Community Weekly Newsletter (September 4)
- Perl Weekly (September 7)
- PostgreSQL Weekly News (September 6)
- Python Weekly (September 3)
- Python Weekly (September 10)
- Ruby Weekly (September 3)
- Ruby Weekly (September 10)
- This Week in Rust (September 7)
- Tor Weekly News (September 4)
- Tor Weekly News (September 10)
- Wikimedia Tech News (September 7)
A closer look at the world's first open digital cinema camera (Opensource.com)
Opensource.com takes a look at the AXIOM Beta camera, a new professional digital image capturing platform. "The goal of the AXIOM camera, and the global-community-driven apertus° project, is to create a variety of powerful, affordable, open source licensed and sustainable digital cinema tools. The apertus° project was started by filmmakers who felt limited by the available proprietary tools. AXIOM Beta will provide full and open documentation, the ability to add new features and change the behavior of existing features, and the option to add custom accessories." AXIOM Beta is intended primarily for software and hardware developers.
Shah: QEMU Maintainers on the 2.4 Release
On his blog, QEMU developer Amit Shah gathered up information on the recent QEMU 2.4 release from the maintainers. It takes the form of a video made at KVM Forum, as well as some email comments from those who were not present. "Many contributors to the QEMU and KVM projects meet at the annual KVM Forum conference to talk about new features, new developments, what changed since the last conference, etc. The QEMU project released version 2.4 just a week before the 2015 edition of KVM Forum. I thought that was a good opportunity to gather a few developers and maintainers, and get them on video where we can see them speak about the improvements they made in the 2.4 release, and what we can expect in the 2.5 release."
Page editor: Nathan Willis
Announcements
Brief items
Linux Plumbers Conference 2016 call for organizers
It's time to figure out who will be organizing the Linux Plumbers Conference in 2016, which is planned to be held in Santa Fe, New Mexico, at the beginning of November, alongside the Kernel Summit. Interested organizers should put together a bid and submit it to the Linux Foundation's Technical Advisory Board by October 5; see this page for details on how the process works. "This is your chance to put your stamp on one of our community's most important gatherings in a year when we will be celebrating 25 years of the Linux kernel."
Articles of interest
FSFE Newsletter - September 2015
The September edition of the Free Software Foundation Europe newsletter covers the User Data Manifesto 2.0, compulsory routers, new German Coordinators, Fellowship interview with Neil McGovern, and several other topics.The Free Software Foundation: 30 years in (Opensource.com)
Jono Bacon interviews John Sullivan, executive director of the FSF, at Opensource.com. "What we have been focusing on now are the challenges I highlighted in the first question. We are in desperate need of hardware in several different areas that fully supports free software. We have been talking a lot at the FSF about what we can do to address this, and I expect us to be making some significant moves to both increase our support for some of the projects already out there—as we having been doing to some extent through our Respects Your Freedom certification program—and possibly to launch some projects of our own. The same goes for the network service problem. I think we need to tackle them together, because having full control over the mobile components has great potential for changing how we relate to services, and decentralizing more and more services will in turn shape the mobile components."
Calls for Presentations
Netdev 1.1 conference
Netdev 1.1 (year 1, conference 1) will take place February 10-12, 2016 in Seville, Spain. "Netdev 1.1 is a community-driven conference geared towards Linux netheads. Linux kernel networking and user space utilization of the interfaces to the Linux kernel networking subsystem are the focus. If you are using Linux as a boot system for proprietary networking, then this conference _may not be for you_." The call for proposals closes December 1.
CFP Deadlines: September 11, 2015 to November 10, 2015
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
September 12 | October 10 | Poznańska Impreza Wolnego Oprogramowania | Poznań, Poland |
September 15 | November 9 November 11 |
PyData NYC 2015 | New York, NY, USA |
September 15 | November 14 November 15 |
NixOS Conference 2015 | Berlin, Germany |
September 20 | October 26 October 28 |
Samsung Open Source Conference | Seoul, South Korea |
September 21 | March 8 March 10 |
Fluent 2016 | San Francisco, CA, USA |
September 25 | December 5 December 6 |
openSUSE.Asia Summit | Taipei, Taiwan |
September 27 | November 9 November 11 |
KubeCon | San Francisco, CA, USA |
September 28 | November 14 November 15 |
PyCon Czech 2015 | Brno, Czech Republic |
September 30 | November 28 | Technical Dutch Open Source Event | Eindhoven, The Netherlands |
September 30 | November 7 November 8 |
OpenFest 2015 | Sofia, Bulgaria |
September 30 | December 27 December 30 |
32. Chaos Communication Congress | Hamburg, Germany |
October 1 | April 4 April 6 |
Web Audio Conference | Atlanta, GA, USA |
October 2 | October 29 | FOSS4G Belgium 2015 | Brussels, Belgium |
October 2 | December 8 December 9 |
Node.js Interactive | Portland, OR, USA |
October 15 | November 21 | LinuxPiter Conference | Saint-Petersburg, Russia |
October 26 | April 11 April 13 |
O’Reilly Software Architecture Conference | New York, NY, USA |
October 30 | January 30 January 31 |
Free and Open Source Developers Meeting | Brussels, Belgium |
October 30 | January 21 January 24 |
SCALE 14x - Southern California Linux Expo | Pasadena, CA, USA |
October 31 | November 20 November 22 |
FUEL GILT Conference 2015 | Pune, India |
November 2 | December 4 December 5 |
Haskell in Leipzig | Leipzig, Germany |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: September 11, 2015 to November 10, 2015
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
September 10 September 13 |
International Conference on Open Source Software Computing 2015 | Amman, Jordan |
September 10 September 12 |
FUDcon Cordoba | Córdoba, Argentina |
September 11 September 13 |
vBSDCon 2015 | Reston, VA, USA |
September 15 September 16 |
verinice.XP | Berlin, Germany |
September 16 September 18 |
PostgresOpen 2015 | Dallas, TX, USA |
September 16 September 18 |
X.org Developer Conference 2015 | Toronto, Canada |
September 19 September 20 |
WineConf 2015 | Vienna, Austria |
September 21 September 25 |
Linaro Connect San Francisco 2015 | San Francisco, CA, USA |
September 21 September 23 |
Octave Conference 2015 | Darmstadt, Germany |
September 22 September 24 |
NGINX Conference | San Francisco, CA, USA |
September 22 September 23 |
Lustre Administrator and Developer Workshop 2015 | Paris, France |
September 23 September 25 |
LibreOffice Conference | Aarhus, Denmark |
September 23 September 25 |
Surge 2015 | National Harbor, MD, USA |
September 24 | PostgreSQL Session 7 | Paris, France |
September 25 September 27 |
PyTexas 2015 | College Station, TX, USA |
September 28 September 30 |
Nagios World Conference 2015 | Saint Paul, MN, USA |
September 28 September 30 |
OpenMP Conference | Aachen, Germany |
September 29 September 30 |
Open Source Backup Conference 2015 | Cologne, Germany |
September 30 October 2 |
Kernel Recipes 2015 | Paris, France |
October 1 October 2 |
PyConZA 2015 | Johannesburg, South Africa |
October 2 October 3 |
Ohio LinuxFest 2015 | Columbus, OH, USA |
October 2 October 4 |
PyCon India 2015 | Bangalore, India |
October 5 October 7 |
LinuxCon Europe | Dublin, Ireland |
October 5 October 7 |
Qt World Summit 2015 | Berlin, Germany |
October 5 October 7 |
Embedded Linux Conference Europe | Dublin, Ireland |
October 8 | OpenWrt Summit | Dublin, Ireland |
October 8 October 9 |
CloudStack Collaboration Conference Europe | Dublin, Ireland |
October 8 October 9 |
GStreamer Conference 2015 | Dublin, Ireland |
October 9 | Innovation in the Cloud Conference | San Antonio, TX, USA |
October 10 | Programistok | Białystok, Poland |
October 10 | Poznańska Impreza Wolnego Oprogramowania | Poznań, Poland |
October 10 October 11 |
OpenRISC Conference 2015 | Geneva, Switzerland |
October 14 October 16 |
XII Latin American Free Software | Foz do Iguacu, Brazil |
October 17 | Central Pennsylvania Open Source Conference | Lancaster, PA, USA |
October 18 October 20 |
2nd Check_MK Conference | Munich, Germany |
October 19 October 23 |
Tcl/Tk Conference | Manassas, VA, USA |
October 19 October 22 |
ZendCon 2015 | Las Vegas, NV, USA |
October 19 October 22 |
Perl Dancer Conference 2015 | Vienna, Austria |
October 21 October 22 |
Real Time Linux Workshop | Graz, Austria |
October 23 October 24 |
Seattle GNU/Linux Conference | Seattle, WA, USA |
October 24 October 25 |
PyCon Ireland 2015 | Dublin, Ireland |
October 26 | Korea Linux Forum | Seoul, South Korea |
October 26 October 28 |
Kernel Summit | Seoul, South Korea |
October 26 October 28 |
OSCON | Amsterdam, The Netherlands |
October 26 October 28 |
Samsung Open Source Conference | Seoul, South Korea |
October 27 October 30 |
OpenStack Summit | Tokyo, Japan |
October 27 October 29 |
Open Source Developers' Conference | Hobart, Tasmania |
October 27 October 30 |
PostgreSQL Conference Europe 2015 | Vienna, Austria |
October 29 | FOSS4G Belgium 2015 | Brussels, Belgium |
October 30 | Software Freedom Law Center Conference | New York, NY, USA |
November 3 November 5 |
EclipseCon Europe 2015 | Ludwigsburg, Germany |
November 5 November 8 |
mini-DebConf | Cambridge, UK |
November 5 November 7 |
systemd.conf 2015 | Berlin, Germany |
November 6 November 8 |
Dublin blockchain hackathon | Dublin, Ireland |
November 6 November 8 |
Jesień Linuksowa 2015 | Hucisko, Poland |
November 7 November 8 |
OpenFest 2015 | Sofia, Bulgaria |
November 7 November 8 |
PyCON HK 2015 | Hong Kong, Hong Kong |
November 7 November 9 |
PyCon Canada 2015 | Toronto, Canada |
November 8 November 13 |
Large Installation System Administration Conference | Washington, D.C., USA |
November 9 November 11 |
PyData NYC 2015 | New York, NY, USA |
November 9 November 11 |
KubeCon | San Francisco, CA, USA |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol