LWN.net Weekly Edition for April 14, 2016
What's new in MythTV 0.28
The MythTV project released its latest stable version, 0.28, on April 11. While there are a few entirely new features worthy of users' attention, most of the changes are incremental improvements. But the improved components include services like Universal Plug and Play (UPnP) support, where MythTV has lagged behind other open-source media centers like Kodi, and the external API, which will hopefully make MythTV more developer-friendly. MythTV remains the most widely used open-source digital video recorder (DVR) but, as cord-cutting trends increase, it will need to offer more functionality to continue attracting users.
It has been just over two and a half years since the last major MythTV release, 0.27. From a functional standpoint, the most significant change to MythTV's core DVR and video-playback features is that the new release has migrated to FFmpeg 3.0 (released in February 2016), thus enabling hardware-accelerated video playback of Google's VP9 codec and providing a much better AAC encoder. VP9 is still not terribly widespread, but the new MythTV release also enables hardware acceleration for H.265 which, like VP9, targets ultra-high-definition video (e.g., 4K resolution).
UPnP and media browsing
UPnP is a specification meant to allow "smart" media-playback devices to discover compatible servers on the local network and automatically find the servers' audio, video, and photo collections. The MythTV back-end server has included UPnP support since version 0.20, but that support has never been particularly stunning. The 0.28 brings it up to speed, providing full compatibility with the current (2014) version of the standard and adding quite a few improvements to the experience of browsing the available media from a UPnP device.
For instance, UPnP supports an "album art" metadata feature; MythTV will now pick up plausibly named album-art images in music folders (e.g., album.png), and it will generate thumbnail images of videos and recorded TV programs (in multiple sizes, which is beneficial for those who may use both a smartphone and a smart TV at varying times). It also allows users to search through a media collection by length, by user-applied rating, and by several general metadata fields (e.g., director, studio, year, season, maturity rating, and so on). When you put those features together, it makes for a considerably more pleasant experience than the old UPnP support offered, which tended to present the user with a flat list of filenames, each accompanied by the same generic icon (designating "music" or "video," for example). Finally, rewind and fast-forward were broken for a number of UPnP client devices in earlier releases; these should all function correctly now.
While more and more devices ship with UPnP support these days (I personally know there are at least four devices in my home that can act as a UPnP client; all of them were purchased for their other functionality), MythTV also acts as a media-center front-end in its own right, providing the "couch ready" playback interface. Among other features, the MythTV front-end provides the same functionality as a UPnP client: browsing video, audio, and image collections. The image-browsing feature, MythGallery, was rewritten for the 0.28 release, updating the user interface to conform to the latest menu and theme updates and allowing multiple front-end devices to access the same image collection simultaneously.
The audio player component, MythMusic, received updates as well, including a lyrics-display option, a significantly refreshed collection of streaming audio services, some new visualizations, better support for metadata fields in FLAC files, and initial support for retrieving track metadata through the MusicBrainz service. Both MythMusic and MythGallery have also been updated to support MythTV's Storage Groups feature. Storage Groups allow MythTV back-ends to transparently use a variety of underlying file servers and disks in a single pool, and allow the user to share the pool between several back-ends.
Linux users accustomed to Logical Volume Management and network-attached storage may not find the feature particularly novel, since that type of functionality is provided at the operating-system level, but it can be useful at the application level, too. One could set up a separate Storage Group for recordings that resides on a machine attached to a TV, while keeping music in another Storage Group closer to a different part of the house. Or one might prefer to keep different types of media in different filesystems or configured for different back-up options. In prior releases, such choices were limited by the fact that MythMusic and MythGallery did not use the Storage Groups feature at all.
Alternative interfaces
The new release includes some less visible work that is likely to be good for MythTV in the long run. The primary example is the continued development of the MythTV Services API, an API framework introduced in version 0.25 that allows external applications to access and even configure a MythTV installation. The Services API is currently used by just a handful of applications (such as remote-control apps for Android phones), but it is a big improvement over the old, XML-based API.
Version 0.28 of the Services API introduces an entirely new Image service (for working with still-image collections, like MythGallery) and includes updates to several others. The DVR service received the most attention; it is now possible to manage almost every facet of a set of recording rules through the API. That means that external applications can provide the proper hooks to set up and manage scheduled recordings, not just display the schedule and play already-recorded items. The Guide service, which hooks into the electronic program guide, was also enhanced; applications can now filter and group channels (which is particularly useful for showing guide data on small screens). And the Frontend service, which controls playback, gained one important new feature: a generic SendKey command, which enables developers to fully customize the playback commands sent. Since MythTV key bindings are configurable, providing only a fixed set of key commands in the Frontend service was a serious limitation.
The last new feature worth pointing out is a still-in-development experiment: a complete rewrite of the MythTV web interface. The new interface is called WebFrontend, and it will eventually replace the existing interface, MythWeb. WebFrontend runs from a built-in web server, while MythWeb required configuring a separate Apache server. Although the design is still in flux, it aims to simplify the setup and configuration tasks presented in the interface, and to provide a better online-video-playback service.
For those who have used MythWeb, this is good news. Although it is functional enough to get by, the old interface could hardly be described as smooth. Furthermore, it had only rudimentary configuration and monitoring options—most implemented by providing direct access to the MythTV back-end's database tables. If something went wrong with a recording, one might be just as likely to make it worse as to fix it by poking around the database in MythWeb. Hopefully, the project will also take this opportunity to make WebFrontend more secure than MythWeb; strong authentication and TLS (neither of which were implemented for MythWeb) would be a welcome start.
Back in March, I expressed some criticism of MythTV for its complexity and awkward management features. It is hard to say at the outset whether or not version 0.28 improves that situation any. MythWeb is a legacy feature that is ripe for removal (as is the old XML-based API now supplanted by the Services API). At the same time, large features like MythMusic and MythGallery seem to still be undergoing periodic rewrites that do not make them any simpler. But perhaps improved UPnP support offers some hope. After all, if a MythTV back-end is perfectly usable through some other, UPnP-based client application, then there is less for the user to worry about. On the whole, though, each new MythTV release still makes progress. It might be slower than some users would like, but it is moving in the right direction.
This is why we can't have safe cancellation points
Signals have been described as an "unfixable design" aspect of Unix. A recent discussion on the linux-kernel mailing list served to highlight some of the difficulties yet again. There were two sides to the discussion, one that focused on solving a problem by working with the known challenges and the existing semantics, and one that sought to fix the purportedly unfixable.
The context for this debate is the pthread_cancel(3) interface in the Pthreads POSIX threading API. Canceling a thread is conceptually similar to killing a process, though with significantly different implications for resource management. When a process is killed, the resources it holds, like open file descriptors, file locks, or memory allocations, will automatically be released.
In contrast, when a single thread in a multi-threaded process is terminated, the resources it was using cannot automatically be cleaned up since other threads might be using them. If a multi-threaded process needs to be able to terminate individual threads — if for example it turns out that the work they are doing is no longer needed — it must keep track of which resources have been allocated and where they are used. These resources can then be cleaned up, if a thread is canceled, by a cleanup handler registered with pthread_cleanup_push(3). For this to be achievable, there must be provision for a thread to record the allocation and deallocation of resources atomically with respect to the actual allocation or deallocation. To support this Pthreads introduces the concept of "cancellation points".
These cancellation points are optional and can be disabled with a call to pthread_setcanceltype(3). If the cancel type is set to PTHREAD_CANCEL_ASYNCHRONOUS then a cancellation can happen at any time. This is useful if the thread is not performing any resource allocation or not even making any system calls at all. In this article, though, we'll be talking about the case where cancellation points are enabled.
On cancellation points and their implementation
From the perspective of an application, a "cancellation point" is any one of a number of POSIX function calls such as open(), and read(), and many others. If a cancellation request arrives at a time when none of these functions is running, it must take effect when the next cancellation-point function is called. Rather than performing the normal function of the call, it must call all cleanup handlers and cause the thread to exit.If the cancellation occurs while one of these function calls is waiting for an event, the function must stop waiting. If it can still complete successfully, such as a read() call for which some data has been received but a larger amount was requested, then it may complete and the cancellation will be delayed until the next cancellation point. If the call cannot complete successfully, the cancellation must happen within that call. The thread must clean up and exit and the interrupted function will not return.
From the perspective of a library implementing the POSIX Pthreads API, such as the musl C library (which was the focus of the discussions), the main area of interest is the handling of system calls that can block waiting for an event, and how this interacts with resource allocation. Assuming that pthread_cancel() is implemented by sending a signal, and there aren't really any alternatives, the exact timing of the arrival of the cancellation signal can be significant.
- If the signal arrives after the function has checked for any pending
cancellation, but before actually making a system call that might
block, then it is critical that the system call is not made at all.
The signal handler must not simply return but must arrange to
perform the required cleanup and exit, possibly using a mechanism
like longjmp().
- If the signal arrives during or immediately after a system call that performs some sort of resource allocation or de-allocation, then the signal handler must behave differently. It must let the normal flow of code continue so that the results can be recorded to guide future cleanup. That code should notice if the system call was aborted by a cancellation signal and start cancellation processing. The signal handler cannot safely do that directly; it must simply set a flag for other code to deal with.
There are quite a number of system calls that can both wait for an event and allocate resources; accept() is a good example as it waits for an incoming network connection and then allocates and returns a file descriptor describing that connection. For this class of system calls, both requirements must be met: a signal arriving immediately before the system call must be handled differently than a signal arriving during or immediately after the system call.
There are precisely three Linux system calls for which the distinction
between "before" and "after" is straightforward to manage:
pselect(), ppoll(), and epoll_pwait(). Each of
these takes a
sigset_t argument that lists some signals that are normally
blocked before the system call is entered. These system calls will
unblock the listed signals, perform the required action, then block them
again before returning to the calling thread. This behavior
allows a caller to block the cancellation signal, check if a signal has
already arrived, and then proceed to make the system call
without any risk of the signal being delivered just before the system
call actually starts. Rich Felker, the primary author of musl, did
lament
that if all system calls took a sigset_t and used it
this way, then implementing cancellation points correctly would be
trivial. Of course, as he acknowledged, "this is obviously not a
practical change to make.
"
Without this ability to unblock signals as part of every system call, many implementations of Pthread cancellation are racy. The ewontfix.com web site goes into quite some detail on this race and its history and reports that the approach taken in glibc is:
ENABLE_ASYNC_CANCEL(); ret = DO_SYSCALL(...); RESTORE_OLD_ASYNC_CANCEL(); return ret;
where ENABLE_ASYNC_CANCEL() directs the signal handler to terminate the thread immediately and RESTORE_OLD_ASYNC_CANCEL() directs it to restore the behavior appropriate for the pthread_setcanceltype() setting.
If the signal is delivered before or during the system call this works correctly. If, however, the signal is delivered after the system call completes but before RESTORE_OLD_ASYNC_CANCEL() is called, then any resource allocation or deallocation performed by the system call will go unrecorded. The ewontfix.com site provides a simple test case that reportedly can demonstrate this race.
A clever hack
The last piece of background before we can understand the debate about signal handling is that musl has a solution for this difficulty that is "clever" if you ask Andy Lutomirski and "a hack" if you ask Linus Torvalds. The solution is almost trivially obvious once the problem is described as above so it should be no surprise that the description was developed with the solution firmly in mind.
The signal handler's behavior must differ depending on whether the signal arrives just before or just after a system call. The handler can make this determination by looking at the code address (i.e. instruction pointer) that control will return to when the handler completes. The details of getting this address may require poking around on the stack and will differ between different architectures but the information is reliably available.
As Lutomirski explained when starting the thread, musl uses a single code fragment (a thunk) like:
cancellable_syscall: test whether a cancel is queued jnz cancel_me int $0x80 end_cancellable_syscall:
to make cancellable system calls. ("int $0x80" is the traditional way to enter the kernel for a system call by behaving like an interrupt). If the signal handler finds the return address to be at or beyond cancellable_syscall but before end_cancellable_syscall, then it must arrange for termination to happen without ever returning to that code or letting the system call be performed. If it has any other value, then it must record that a cancel has been requested so that the next cancellable system call can detect that and jump to cancel_me.
This "clever hack" works correctly and is race free, but is not perfect. Different architectures have different ways to enter a system call, including sysenter on x86_64 and svc (supervisor call) on ARM. For 32-bit x86 code there are three possibilities depending on the particular hardware: int $0x80 always works but is not always the fastest. The syscall and sysenter instructions may be available and are significantly faster. To achieve best results, the preferred way to make system calls on a 32-bit x86 CPU is to make an indirect call through the kernel_vsyscall() entry point in the "vDSO" virtual system call area. This function will use whichever instruction is best for the current platform. If musl tried to use this for cancellable system calls it would run into difficulties, though, as it has no way to know where the instruction is, or to be certain that any other instructions that run before the system call are all located before that instruction in memory. So musl currently uses int $0x80 on 32-bit x86 systems and suffers the performance cost.
Cancellation for faster system calls
Now, at last, we come to Lutomirski's simple patch that started the thread of discussion. This patch adds a couple of new entry points to the vDSO, the important one for us is pending_syscall_return_address, which determines if the current signal happened during kernel_vsyscall handling and reports the address of the system call instruction. The caller can then determine if the signal happened before, during, or after that system call.
Neither Linus nor Ingo Molnar liked this approach, though their
exact reasons weren't made clear. Part of the reason may have been that
the semantics of cancellation appear clumsy so it is hard to justify much
effort to support them. According to
Molnar, "it's a really bad interface to rely on
".
Even Lutomirski expressed
surprise that musl "didn't take the approach of 'pthread
cancellation is not such a great idea -- let's just not support
it'.
" Szabolcs Nagy's succinct
response "because of standards
" seemed to settle that
issue.
One clear complaint
from Molnar was that there was "so much complexity
" and it is
true that the code would require some deep knowledge to fully understand.
This concern is borne out by the fact that Lutomirski, who has that
knowledge, hastily withdrew
his first and second
attempts. While complexity is best avoided where possible, complexity
should not be, by itself, itself a justification for keeping something out
of Linux.
Torvalds and Molnar contributed both by exploring the issues to flesh out the shared understanding and by proposing extra semantics that could be added to the Linux signal facility so that a more direct approach could be used.
Molnar proposed "sticky signals" that could be enabled with an extra flag when setting up a signal handler. The idea was that if the signal is handled other than while a system call is active, then the signal remains pending but is blocked in a special new way. When the next system call is attempted, it is aborted with EINTR and the signal is only then cleared. This change would remove the requirement that the signal handler must not allow the system call to be entered at all if the signal arrives just before the call, since the system call would now immediately exit.
Torvalds's proposal was similar but involved "synchronous" signals. He saw the root problem being that signals can happen at any time and this is what leads to races. If a signal were marked as "synchronous" then it would only be delivered during a system call. This is exactly the effect achieved with pselect() and friends and so could result in a race-free implementation.
The problem with both of these approaches is that they are not selective in the correct way. POSIX does not declare all system calls to be cancellation points and, in fact, does not refer to system calls at all. It is only certain API functions that are defined as cancellation points and, as Torvalds clearly agreed that being able to use the faster system call entry made available in the vDSO was important, but neither he nor Molnar managed to provide a workable alternative to the solution proposed by Lutomirski.
Felker made his feelings on the progress of the discussion quite clear:
It is certainly important to get the best design, and exploring alternatives to understand why they were rejected is a valid part of the oversight provided by a maintainer. When that leads to the design being improved, we can all rejoice. When it leads to an understanding that the original design, while not as elegant as might be hoped, is the best we can have, it shouldn't prevent that design from being accepted. Once Lutomirski is convinced that he has all the problems resolved, it is to be hoped that a re-submission results in further progress towards efficient race-free cancellation points. Maybe that would even provide the incentive to get race-free cancellation points in other libraries like glibc.
LXD 2.0 is released
LXD is a relatively new entrant in the container-management arena; the project started roughly a year and a half ago. It provides a REST-based interface to Linux containers as implemented by the LXC project. LXD made its 2.0 release on April 11, which is the first production-ready version.
At its heart, LXD is a daemon that provides a REST API to manage LXC containers. It is called a "hypervisor" for containers and seeks to replicate the experience of using virtual machines but without the overhead of hardware virtualization. LXC containers are typically "system containers" that look similar to an OS running on bare metal or a virtual machine, unlike Docker (and other) container systems that focus on "application containers". The intent is to build a more user-friendly approach to containers than what is provided by LXC.
The REST API is the only way to talk to the LXD daemon. Unless it is
configured to listen for remote connections, it simply opens a Unix socket
for local communication. Then the lxc command-line tool can be used
to configure both the daemon and any containers that will be run on the
system. For remote connections, TLS 1.2 with a "very limited
set of allowed ciphers
" is used.
The easiest ways to get started with LXD are all based on Ubuntu systems, which is not surprising given that Canonical is the main sponsor of the project. There are provisions for other distributions (Gentoo, presently) and for building the Go code from source, though. There is also an online demo that can be used try out LXD from a web browser.
The LXD daemon uses a number of kernel technologies to make the containers
it runs more secure. For example, it uses namespaces and, in particular, user namespaces to separate the container
users from those of the system at large. As outlined in lead developer
Stéphane Graber's Introduction
to LXD (which is part of his still-in-progress series
on LXD) one of the core design principles "was to make it as safe as possible while allowing modern Linux distributions to run inside it unmodified
".
Beyond namespaces, AppArmor is used to add restrictions on mounts, files, sockets, and ptrace() access to prevent containers from interfering with each other. Seccomp is used to restrict certain system calls. In addition, Linux capabilities are used to prevent containers from loading kernel modules and other potentially harmful activities.
While control groups (cgroups) are used to prevent denial-of-service attacks from within containers, they can also be used to parcel up the resources (e.g. CPU, memory) of the system among multiple containers. Another entry in Graber's series shows the kinds of limits that can be imposed on containers for disk, CPU, memory, network I/O, and block I/O.
A container in LXD consists of a handful of different pieces. It has a root filesystem, some profiles that contain configuration information (e.g. resource limits), devices (e.g. disks, network interfaces), and some properties (e.g. name, architecture). The root filesystems are all image-based, which is something of a departure from the template-based filesystems that LXC uses. The difference is that instead of building the filesystem from a template when the container is launched (and possibly storing the result), LXD uses a pre-built filesystem image that typically comes from a remote image server (and is then cached locally).
These images are generally similar to fresh distribution images like those used for VMs. LXD is pre-configured with three remote image servers (for Ubuntu stable, Ubuntu daily builds, and a community-run server that has other Linux distributions). The images themselves are identified with an SHA-256 hash, so a specific image or simply the latest Ubuntu stable or daily build can be requested. Users can add their own remote LXD image servers (either public or private) as well.
Profiles provide a way to customize the container configuration and devices. A container can use multiple profiles, which are applied in order, with later profiles potentially overwriting earlier configuration entries. In addition, local container configuration is applied last for configuration entries that only apply to a single container, so they do not belong in the profiles. By default, LXD comes with two profiles, one that simply defines an "eth0" network device and a second that is suitable for running Docker images.
LXD uses Checkpoint/Restore In Userspace (CRIU) to allow snapshotting containers, either to restore them later on the same host or to migrate them elsewhere to be restored. These container snapshots look much the same as regular containers, but they are immutable and contain some extra state information that CRIU needs to restore the running state of the container.
LXD needs its own storage back-end for containers and images. Given the announcement that Canonical will be shipping ZFS with Ubuntu 16.04, it will not come as a surprise that the recommended filesystem for LXD is ZFS. But other options are possible, as described in another post in the series. In particular, Btrfs and the logical volume manager (LVM) can be used.
LXD can scale beyond just a single system running multiple containers; it can also be used to handle multiple systems each running LXD. But for huge deployments, with many systems and thousands of containers, there is an OpenStack plugin (nova-lxd) that provides a way for the OpenStack Nova compute-resource manager to treat LXD containers like VMs. That way, LXD can be integrated into OpenStack deployments.
As befits the "production-ready" nature of the release, LXD 2.0 has a stable API. Until June 2021, all of the existing interfaces will be maintained; any additions will be done using extensions that clients can discover. In addition, there will be frequent bug-fix releases, with backports from the current development tree.
There is a fair amount of competition in the container-management (or orchestration) world these days. Kubernetes, Docker Swarm, Apache Mesos, and others are all solving similar problems. LXD looks like it could have a place in that ecosystem, especially given the strong support it is receiving from Canonical and Ubuntu. For those looking for a container manager, taking a peek at LXD 2.0 may be well worth the time.
Security
A new stable security tree
The history of the stable kernel tree goes all the way back to some mailing list discussions in 2005. It has clearly been a great success that now spans multiple supported kernel versions, some of which are supported for two years as part of the long-term support initiative and some that are supported for far longer than that. But there have always been worries in some quarters that following the stable kernels—with fairly frequent updates including many patches—might introduce more bugs than would be fixed (at least bugs relevant to a particular user). So, a "security fix only" stable kernel has some appeal, which is exactly what Sasha Levin announced on April 11.
Levin described his "linux-stable security tree" (which is available
as a Git
tree) as "a derivative tree from the regular stable tree that would
contain only commits that fix security vulnerabilities
".
The repository contains branches and -security tags for all stable series
starting with 3.0 (though some of those are now unsupported), which gives
an idea of what kinds of fixes he is targeting.
Levin
plans to support all of the maintained stable
kernels with a -security variant. He went on to outline the reasons
behind his plan:
Given this, a few projects preferred to delay important kernel updates, and a few even stopped updating the tree altogether, exposing them to critical vulnerabilities.
This project provides an easy way to receive only important security commits, which are usually only a few in each release, and makes it easy to incorporate them into existing projects.
One major problem, of course, is defining what constitutes a "security
fix". Greg Kroah-Hartman, who is the founder of the stable trees and still
maintains several of them, wondered
how Levin would "define 'security vulnerabilities'
". He noted
that everyone's needs are different, so they should either do the
cherry-picking themselves or simply rely on one of the stable trees: "It's not that much churn at all, given the % of
the mainline tree
". He also pointed out that there are fixes that
have gone into the stable trees that weren't recognized as security flaws
until long after those trees were released. Anyone who was cherry-picking
fixes might well have missed one or more of those and been exposed to
vulnerabilities for far longer than necessary.
Levin replied that the security tree would
carry patches for "anything that qualified for a CVE plus anything
exploitable
by a local unprivileged user
". He noted that the security
trees would simply be "a subset of [its]
corresponding -stable tree
". But Kroah-Hartman expressed further concern about the goals of
the project:
Putting up a tree like this isn't going to cause people to update their trees any quicker. If they haven't been doing it now, they aren't going to do it with this tree, as their workloads don't allow them to take updated kernel trees.
In short, it's not the fact that we have stable trees that are "too big" for them to take, it's the fact that they just can't take and test updates properly.
But there are plenty of patches that end up in the stable trees that actually introduce bugs, Levin said. He pointed to the number of "Fixes:" tags that refer to commits in the stable tree as an indicator of that. Furthermore:
There are also fixes for CVEs that have not made it into the stable
tree, he said, typically because they were not sent to the stable
maintainers when
they were merged and a CVE was requested later once the security
implications were known. While Levin did not say so directly, it would
seem that the idea is that the maintainers of this new tree could help
ensure those fixes get into the stable tree, since that provides the path
for them to reach the stable security tree as well. But, he acknowledged
that the tree would not be able to achieve "100% success rate,
but this is the same story as the stable tree and the patch selection
there
".
Beyond that, though, Levin said that reviewing a few cherry-picked fixes, rather than the 100+ fixes all over the tree that often come with a stable release will help certain projects:
The stable kernels do tend to have a lot of patches. 4.5.1 had 238, the 4.4 series has had 1004 patches through 4.4.7, and the 3.14 series contains 4580 patches through 3.14.66. The trees in Levin's repository show markedly fewer patches.
Willy Tarreau, who had been maintaining the 2.6.32 stable kernel until its recent end of life, was worried that the use of "security" in the name of the tree will lead users astray:
Tarreau suggested an alternate strategy to create an "easily
searchable
" list of mainline
commits that were security fixes. Those could perhaps be put into a tree:
"We could even amend the commit messages to mention the CVE id once
it's
assigned, and sometimes 'relies on this preliminary patch'.
" But it
would be "tough work
" and difficult to find someone interested
in doing it. In the end, though, he believes the underlying problem boils
down to "more of an issue educating our users than an issue
with the code we distribute
".
Kroah-Hartman agreed with Tarreau about the
education problem, but also complained that there are "users that
just don't want
to upgrade as it's 'hard' for them
". There is no real difference in
taking 200 patches in an update versus 100, he said, so the "small subset"
argument doesn't make sense to him. But Levin had previously suggested
that stable-security tree releases might carry on the order of five
patches, so it's clear that the two have clashing visions of how it will
all play out.
The one participant in the discussion who is strictly a user of the stable
kernels (rather than a maintainer and user) was Eddie Chapman. He said that the quality of the stable kernels has
always been excellent in his experience; he has run into regressions only
rarely and uses the stable tree for "virtually all my own
projects, as well as many clients' projects
". He agreed with much
of what
Kroah-Hartman and Tarreau said in the thread and warned: "IMO you deserve everything you get if you only applied the fixes
in Sasha's new tree and ignored the stable releases completely.
"
With that said, though, Chapman is in favor of the new tree. He recognizes that the line between a bug fix and a security fix is blurry, but any assistance in narrowing down which fixes are applied is welcome.
So anything which improves visibility, which this certainly does, is a good thing in my opinion.
But Tarreau is skeptical that the tree will generally be used in the right way. The security fixes will only appear in it if they are already in the stable tree, which means that fixes that are missed by the stable maintainers (which is common for him, he said) won't be present.
Chapman agreed that there is a risk, but
"I think it is better than no repository
". Users who are
blindly applying fixes are already in trouble, "so things can't
really effectively get any worse
", he said. Blindly applying a
small set of fixes from the stable-security tree will perhaps help in a
relatively small way: "Their situation might
then be upgraded from 100% screwed to maybe only 70% screwed
".
That's where the conversation trailed off, at least for now. In the end, the proof will be in the pudding. If Levin can provide a tree that is useful to some projects—and perhaps help get some additional fixes into the stable tree itself—it will be a success. If, on the other hand, it ends up causing more headaches than it solves, it seems likely that it will just go away at some point. Only time will tell.
Brief items
Security quotes of the week
This problem was also demonstrated by Fang [Binxing, architect of the China's infamous Great Firewall], with his internet connection cutting out twice while trying to access Facebook and Google. The problem was so bad that he ended up resorting to using Baidu to find a screenshot of a Google homepage.
The incident was so embarrassing that Fang ended up ducking out of a planned Q and A session that was supposed to take place after the speech, though not before exhorting his listeners to do he said and not as he did.
The privately held company, valued at more than $60 billion, said the agencies requested information on trips, trip requests, pickup and dropoff areas, fares, vehicles, and drivers.
HTTPS Everywhere: Encryption for All WordPress.com Sites
WordPress has announced free HTTPS for all custom domains hosted on WordPress.com. "The Let’s Encrypt project gave us an efficient and automated way to provide SSL certificates for a large number of domains. We launched the first batch of certificates in January 2016 and immediately starting working with Let’s Encrypt to make the process smoother for our massive and growing list of domains. For you, the users, that means you’ll see secure encryption automatically deployed on every new site within minutes. We are closing the door to un-encrypted web traffic (HTTP) at every opportunity."
The "Badlock" vulnerability
The details for the "Badlock" vulnerability in thePlease update your systems. We are pretty sure that there will be exploits soon. Engineers at Microsoft and the Samba Team worked together during the past months to get this problem fixed."
New vulnerabilities
cairo: denial of service
Package(s): | cairo | CVE #(s): | CVE-2016-3190 | ||||
Created: | April 12, 2016 | Updated: | April 13, 2016 | ||||
Description: | From the openSUSE bug report:
A vulnerability was found in cairo. A maliciously crafted file can cause out of bounds read in fill_xrgb32_lerp_opaque_spans function in cairo, thus crashing the software. | ||||||
Alerts: |
|
ImageMagick: multiple vulnerabilities
Package(s): | imagemagick | CVE #(s): | |||||
Created: | April 12, 2016 | Updated: | April 13, 2016 | ||||
Description: | From the Debian advisory:
Several vulnerabilities were discovered in ImageMagick, a program suite for image manipulation. This update fixes a large number of potential security problems such as null-pointer access and buffer-overflows that might lead to memory leaks or denial of service. | ||||||
Alerts: |
|
imlib2: two vulnerabilities
Package(s): | imlib2 | CVE #(s): | CVE-2016-3994 CVE-2011-5326 | ||||||||||||||||||||||||||||
Created: | April 13, 2016 | Updated: | April 14, 2016 | ||||||||||||||||||||||||||||
Description: | From the Red Hat bugzilla:
CVE-2016-3994: A vulnerability was found in a way imlib2 processes GIF files. A specially crafted file could cause the imlib2 to crash, or even expose some of the host memory. CVE-2011-5326: A vulnerability was found in imlib2. Attempting to draw a 2x1 radi ellipse with imlib_image_draw_ellipse() will result in a floating point exception. | ||||||||||||||||||||||||||||||
Alerts: |
|
kernel: multiple vulnerabilities
Package(s): | kernel | CVE #(s): | CVE-2016-3136 CVE-2016-2187 CVE-2016-3140 CVE-2016-3138 CVE-2016-2185 CVE-2016-2188 CVE-2016-2186 CVE-2016-3137 CVE-2016-2184 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | April 11, 2016 | Updated: | April 13, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat bugzilla:
CVE-2016-3136: Kernel crash occurs when presented a buggy USB device which requires mct_u232 driver, causing null pointer dereference. CVE-2016-2187: Kernel crash occurs when presented a buggy USB device which requires gtco driver, causing null pointer dereference. CVE-2016-3140: Kernel crash occurs when presented a buggy USB device which requires digi_acceleport driver, causing null pointer dereference. CVE-2016-3138: Kernel crash occurs when presented a buggy USB device which requires cdc_acm driver, causing null pointer dereference. CVE-2016-2185: Kernel crash occurs when presented a buggy USB device which requires ati_remote2 driver, causing null pointer dereference. CVE-2016-2188: Kernel crash occurs when presented a buggy USB device which requires iowarrior driver, causing null pointer dereference. CVE-2016-2186: Kernel crash occurs when presented a buggy USB device which requires powermate driver, causing null pointer dereference. CVE-2016-3137: Kernel crash occurs when presented a buggy USB device which requires cypress_m8 driver, causing null pointer dereference. CVE-2016-2184: Kernel crash occurs when presented a buggy USB device which requires snd_usb_audio driver, causing null pointer dereference. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
kernel: denial of service
Package(s): | kernel | CVE #(s): | CVE-2015-1339 | ||||||||||||
Created: | April 12, 2016 | Updated: | April 13, 2016 | ||||||||||||
Description: | From the openSUSE bug report:
Kernel memory leak in the CUSE driver using stress-ng was found. It is possible for privileged attacker to cause a local DoS via memory exhaustion by repeatedly opening /dev/cuse for reading. | ||||||||||||||
Alerts: |
|
kernel: two vulnerabilities
Package(s): | kernel | CVE #(s): | CVE-2016-2143 CVE-2016-3139 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | April 13, 2016 | Updated: | April 13, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the SUSE advisory:
- CVE-2016-2143: On zSeries a fork of a large process could have caused memory corruption due to incorrect page table handling. (bnc#970504) - CVE-2016-3139: A malicious USB device could cause a kernel crash in the wacom driver. (bnc#970909). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
libmaxminddb: multiple vulnerabilities
Package(s): | libmaxminddb | CVE #(s): | |||||||||||||
Created: | April 7, 2016 | Updated: | April 13, 2016 | ||||||||||||
Description: | From the Red Hat bugzilla entry:
There were found several segmentation faults caused by missing bounds checking and missing verification of data type. Upstream patch: https://github.com/maxmind/libmaxminddb/commit/51255f113fe3c7b63ffe957636a7656a3ff9d1ff | ||||||||||||||
Alerts: |
|
libreswan: denial of service
Package(s): | libreswan | CVE #(s): | CVE-2016-3071 | ||||||||
Created: | April 13, 2016 | Updated: | April 19, 2016 | ||||||||
Description: | From the Libreswan advisory:
The Libreswan Project found a bug in the default proposal set for IKEv2. This code, introduced in version 3.16, includes the AES_XCBC integrity algorithm. It wrongly assumes that the NSS cryptographic library supports this algorithm. As a result, the IKE daemon crashes and restarts when the aes_xcbc transform is selected. No remote code execution is possible. | ||||||||||
Alerts: |
|
python-pillow: buffer overflow
Package(s): | python-pillow | CVE #(s): | CVE-2016-3076 | ||||||||||||
Created: | April 11, 2016 | Updated: | April 14, 2016 | ||||||||||||
Description: | From the Fedora advisory:
This update fixes an integer overflow in Jpeg2KEncode.c causing a buffer overflow. | ||||||||||||||
Alerts: |
|
python-rsa: unspecified
Package(s): | python-rsa | CVE #(s): | |||||||||
Created: | April 7, 2016 | Updated: | April 18, 2016 | ||||||||
Description: | From the Fedora advisory:
Long-unfixed security vulnerabilities: https://bugzilla.redhat.com/show_bug.cgi?id=1170702 [ Though the bug report doesn't make it at all clear what is being fixed. ] | ||||||||||
Alerts: |
|
samba: multiple vulnerabilities
Package(s): | samba | CVE #(s): | CVE-2015-5370 CVE-2016-2110 CVE-2016-2111 CVE-2016-2112 CVE-2016-2113 CVE-2016-2114 CVE-2016-2115 CVE-2016-2118 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | April 13, 2016 | Updated: | June 10, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat advisory:
* Multiple flaws were found in Samba's DCE/RPC protocol implementation. A remote, authenticated attacker could use these flaws to cause a denial of service against the Samba server (high CPU load or a crash) or, possibly, execute arbitrary code with the permissions of the user running Samba (root). This flaw could also be used to downgrade a secure DCE/RPC connection by a man-in-the-middle attacker taking control of an Active Directory (AD) object and compromising the security of a Samba Active Directory Domain Controller (DC). (CVE-2015-5370) Note: While Samba packages as shipped in Red Hat Enterprise Linux do not support running Samba as an AD DC, this flaw applies to all roles Samba implements. * A protocol flaw, publicly referred to as Badlock, was found in the Security Account Manager Remote Protocol (MS-SAMR) and the Local Security Authority (Domain Policy) Remote Protocol (MS-LSAD). Any authenticated DCE/RPC connection that a client initiates against a server could be used by a man-in-the-middle attacker to impersonate the authenticated user against the SAMR or LSA service on the server. As a result, the attacker would be able to get read/write access to the Security Account Manager database, and use this to reveal all passwords or any other potentially sensitive information in that database. (CVE-2016-2118) * Several flaws were found in Samba's implementation of NTLMSSP authentication. An unauthenticated, man-in-the-middle attacker could use this flaw to clear the encryption and integrity flags of a connection, causing data to be transmitted in plain text. The attacker could also force the client or server into sending data in plain text even if encryption was explicitly requested for that connection. (CVE-2016-2110) * It was discovered that Samba configured as a Domain Controller would establish a secure communication channel with a machine using a spoofed computer name. A remote attacker able to observe network traffic could use this flaw to obtain session-related information about the spoofed machine. (CVE-2016-2111) * It was found that Samba's LDAP implementation did not enforce integrity protection for LDAP connections. A man-in-the-middle attacker could use this flaw to downgrade LDAP connections to use no integrity protection, allowing them to hijack such connections. (CVE-2016-2112) * It was found that Samba did not validate SSL/TLS certificates in certain connections. A man-in-the-middle attacker could use this flaw to spoof a Samba server using a specially crafted SSL/TLS certificate. (CVE-2016-2113) * It was discovered that Samba did not enforce Server Message Block (SMB) signing for clients using the SMB1 protocol. A man-in-the-middle attacker could use this flaw to modify traffic between a client and a server. (CVE-2016-2114) * It was found that Samba did not enable integrity protection for IPC traffic by default. A man-in-the-middle attacker could use this flaw to view and modify the data sent between a Samba server and a client. (CVE-2016-2115) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
xen: information disclosure
Package(s): | xen | CVE #(s): | CVE-2016-3158 CVE-2016-3159 | ||||||||||||||||||||||||||||||||||||||||||||||||
Created: | April 11, 2016 | Updated: | April 13, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat bugzilla:
There is a workaround in Xen to deal with the fact that AMD CPUs don't load the x86 registers FIP (and possibly FCS), FDP (and possibly FDS), and FOP from memory (via XRSTOR or FXRSTOR) when there is no pending unmasked exception. However, this workaround does not cover all possible input cases. This is because writes to the hardware FSW.ES bit, which the current workaround is based on, are ignored; instead, the CPU calculates FSW.ES from the pending exception and exception mask bits. Xen therefore needs to do the same. Note that part of said workaround was the subject of XSA-52. A malicious domain may be able to leverage this to obtain sensitive information such as cryptographic keys from another domain. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 4.6-rc3, released on April 10. "The biggest single patch is the resurrection of the olpc_dcon staging driver, that wasn't so obsolete after all. There was a missed opportunity there, since the resurrection of that driver missed Easter by a week. We'll do better in the comedic timing department next time, I promise."
Stable updates: 4.5.1, 4.4.7, and 3.14.66 were released on April 12.
Quotes of the week
It's a _horrible_ idea.
If you think that "design with extensions in mind" is a good idea, you're basically saying "I don't know what I might want to do".
The linux-stable security tree project
Sasha Levin has announced the creation of the "linux-stable security tree" project. The idea is to take the current stable updates and filter out everything that isn't identified as a security fix. "Quite a few users of the stable trees pointed out that on complex deployments, where validation is non-trivial, there is little incentive to follow the stable tree after the product has been deployed to production. There is no interest in 'random' kernel fixes and the only requirements are to keep up with security vulnerabilities."
Kernel development news
Tracepoints with BPF
One of the attractive features of tracing tools like SystemTap or DTrace is the ability to load code into the kernel to perform first-level analysis on the trace data stream. Tracing can produce vast amounts of data, but that data can often be reduced considerably by some simple processing — incrementing histogram buckets, for example. Current kernels have a wealth of tracepoints, but they lack the ability to perform arbitrary processing of trace events in kernel space before exporting the result. It would appear, though, that this situation will change as the result of a set of patches targeted for the 4.7 release.It should come as no surprise to regular LWN readers at this point that the technology being used for the loading of code into the kernel is the BPF virtual machine. BPF allows code to be executed in kernel space under tight constraints; among other things, it can only access data that is explicitly provided to it and it cannot contain loops; thus, it is guaranteed to run within a bounded time. BPF code can also be translated to native code with the in-kernel just-in-time compiler, making it fast to run. This combination of attributes has helped BPF to move beyond the networking stack and make inroads into a number of kernel subsystems.
Every BPF program loaded into the kernel has a specific type assigned to it; that type restricts the places where the program may be run. The patch set from Alexei Starovoitov creates a new type (BPF_PROG_TYPE_TRACEPOINT) for programs intended to be attached to tracepoints. Those programs can then be loaded into the kernel with the bpf() system call. Actually attaching a program to a tracepoint is done by opening the tracepoint file (in debugfs or tracefs), reading the tracepoint ID, then using the PERF_EVENT_IOC_SET_BPF ioctl() command. That command exists in current kernels to allow BPF programs to be attached to kprobes; the patch set extends it to do the right thing depending on the type of BPF program passed to it.
When a tracepoint with a BPF program attached to it fires, that program will be run. The "context" area passed to the program is simply the tracepoint data as it would be passed to user space, except that the "common" fields are not accessible. As an example, the patch set includes a sample that attaches to the sched/sched_switch tracepoint, which fires when the scheduler switches execution from one process to another. The format file for that tracepoint (found in the tracepoint directory in debugfs or tracefs) provides the following data:
field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1;signed:0; field:int common_pid; offset:4; size:4; signed:1; field:char prev_comm[16]; offset:8; size:16; signed:1; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[16]; offset:40; size:16; signed:1; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1;
Any program that accesses tracepoint data is expected to read this file to figure out which data is available and where it is to be found; failure to do so risks trouble in the future should the data associated with this tracepoint change. An in-kernel BPF program cannot read this file, so another solution must be found. That solution is for the developer to read the format file and turn it into a C structure; there is a tool (called tplist) that will do this job. The patch set contains the following structure, which was generated with tplist:
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */ struct sched_switch_args { unsigned long long pad; char prev_comm[16]; int prev_pid; int prev_prio; long long prev_state; char next_comm[16]; int next_pid; int next_prio; };
The pad field exists because the first four fields (those common to all tracepoints) are not accessible to BPF programs. The rest, however, can be accessed by name in a C program (which will be compiled to BPF and loaded into the kernel). This program will likely extract the data of interest from this structure, process it in its own special way, and store the result in a BPF map; user space can then access the result.
As with the other BPF program types, the helper code supplied with the kernel uses section names as a directive for what should be done with a specific program. So a program meant to be attached to a tracepoint should be explicitly placed in a section called "tracepoint/name", where "name" is the name of the tracepoint of interest. So, for the sample program, the section name is "tracepoint/sched/sched_switch".
The mechanism works, and, importantly, a tracepoint-attached BPF program is quite a bit more efficient than placing a kprobe and attaching a program there. There are already tools in development (argdist, for example) that will create BPF programs for specific tasks; argdist will create a program to make a histogram of the values of a given tracepoint field. All told, it looks like a useful advance in the kernel's instrumentation.
There is a potential catch, though: the old issue of tracepoints and ABI stability. Tracepoints expose the inner workings of the kernel, which suggests that they must change if the kernel does. Changing tracepoints can, however, break applications that use them; this is an issue that has come up many times in the past. It is also why certain subsystems (the virtual filesystem layer, for example) do not allow tracepoints at all: the maintainers are worried that they might be unable to make important changes because they may break applications dependent on those tracepoints.
For user-space programs, the issue has been mitigated somewhat by providing library code to access tracepoint data. An application that uses these utilities should be portable across multiple kernel versions. BPF programs, though, do not have access to such libraries and will break, perhaps silently, if the tracepoints they are using are changed. ABI concerns have stalled the merging of this capability in the past, but there was little discussion of ABI worries this time around. Alexei maintains that the interface available to BPF programs is the same as that seen by user-space programs, so there should be no new ABI worries. Whether the BPF interface truly brings no new ABI issues is something that will have to be seen over the coming years.
And it does appear that we will have the chance to see how that plays out; David Miller has applied the patches to the net-next tree, meaning that they should reach the mainline in the 4.7 merge window. Users wanting more visibility into what's happening inside the kernel will likely be happy to have it.
Toward less-annoying background writeback
It's an experience many of us have had: write a bunch of data to a relatively slow block device, then try to get some other work done. In many cases, the system will slow to a crawl or even appear to freeze for a while; things do not recover until the bulk of the data has been written to the device. On a system with a lot of memory and a slow I/O device, getting things back to a workable state can take a long time, sometimes measured in minutes. Linux users are understandably unimpressed by this behavior pattern, but it has been stubbornly present for a long time. Now, perhaps, a new patch set will improve the situation.That patch set, from block subsystem maintainer Jens Axboe, is titled "Make background writeback not suck." "Background writeback" here refers to the act of flushing block data from memory to the underlying storage device. With normal Linux buffered I/O, a write() call simply transfers the data to memory; it's up to the memory-management subsystem to, via writeback, push that data to the device behind the scenes. Buffering writes in this manner enables a number of performance enhancements, including allowing multiple operations to be combined and enabling filesystems to improve layout locality on disk.
So how is it that a performance-enhancing technique occasionally leads to such terrible performance? Jens's diagnosis is that it has to do with the queuing of I/O requests in the block layer. When the memory-management code decides to write a range of dirty data, the result is an I/O request submitted to the block subsystem. That request may spend some time in the I/O scheduler, but it is eventually dispatched to the driver for the destination device. Getting there requires passing through a series of queues.
The problem is that, if there is a lot of dirty data to write, there may end up being vast numbers (as in thousands) of requests queued for the device. Even a reasonably fast drive can take some time to work through that many requests. If some other activity (clicking a link in a web browser, say, or launching an application) generates I/O requests on the same block device, those requests go to the back of that long queue and may not be serviced for some time. If multiple, synchronous requests are generated — page faults from a newly launched application, for example — each of those requests may, in turn, have to pass through this long queue. That is the point where things appear to just stop.
In other words, the block layer has a bufferbloat problem that mirrors the issues that have been seen in the networking stack. Lengthy queues lead to lengthy delays.
As with bufferbloat, the answer lies in finding a way to reduce the length of the queues. In the networking stack, techniques like byte queue limits and TCP small queues have mitigated much of the bufferbloat problem. Jens's patches attempt to do something similar in the block subsystem.
Taming the queues
Like networking, the block subsystem has queuing at multiple layers. Requests start in a submission queue and, perhaps after reordering or merging by an I/O scheduler, make their way to a dispatch queue for the target device. Most block drivers also maintain queues of their own internally. Those lower-level queues can be especially problematic since, by the time a request gets there, it is no longer subject to the I/O scheduler's control (if there is an I/O scheduler at all).
Jens's patch set aims to reduce the amount of data "in flight" through all of those queues by throttling requests when they are first submitted. To put it simply, each device has a maximum number of buffered-write requests that can be outstanding at any given time. If an incoming request would cause that limit to be exceeded, the process submitting the request will block until the length of the queue drops below the limit. That way, other requests will never be forced to wait for a long queue to drain before being acted upon.
In the real world, of course, things are not quite so simple. Writeback is not just important for ensuring that data makes it to persistent storage (though that is certainly important enough); it is also a key activity for the memory-management subsystem. Writeback is how dirty pages are made clean and, thus, available for reclaim and reuse; if writeback is impeded too much, the system could find itself in an out-of-memory situation. Running out of memory can lead to other user-disgruntling delays, along with unleashing the OOM killer. So any writeback throttling must be sure to not throttle things too much.
The patch set tries to avoid such unpleasantness by tracking the reason behind each buffered-write operation. If the memory-management subsystem is just pushing dirty pages out to disk as part of the regular task of making their contents persistent, the queue limit applies. If, instead, pages are being written to make them free for reclaim — if the system is running short of memory, in other words — the limit is increased. A higher limit also applies if a process is known to be waiting for writeback to complete (as might be the case for an fsync() call). On the other hand, if there have been any non-writeback requests within the last 100ms, the limit is reduced below the default for normal writeback requests.
There is also a potential trap in the form of drives that do their own write caching. Such drives will indicate that a write request has completed once the data has been transferred, but that data may just be sitting in a cache within the drive itself. In other words, the drive, too, may be maintaining a long queue. In an attempt to avoid overfilling that queue, the block layer will impose a delay between write operations on drives that are known to do caching. That delay is 10ms by default, but can be tweaked via a sysfs knob.
Jens tested this work by having one process write 100MB each to 50 files while another process tries to read a file. The reading process will, on current kernels, be penalized by having each successive read request placed at the end of a long queue created by all those write requests; as might be expected, it performs poorly. With the patches applied, the writing processes take a little longer to complete, but the reader runs much more quickly, with far fewer requests taking an inordinately long period of time.
This is an early-stage patch set; it is not expected to go upstream in the near future. Patches that change memory-management behavior can often cause unexpected problems with different workloads, so it takes a while to build confidence in a significant change, even after the development work is deemed to be complete (which is not the case here). Indeed, Dave Chinner has already reported a performance regression with one of his testing workloads. The tuning of the queue-size limits also needs to be made automatic if possible. There is clearly work still to be done here; the patch set is also likely to be a subject of discussion at the upcoming Linux Storage, Filesystem, and Memory-Management Summit. So users will have to wait a bit longer for this particular annoyance to be addressed.
Static code checks for the kernel
At the 2016 Embedded Linux Conference in San Diego, Arnd Bergmann presented a session on what he called a "lighter topic," his recent efforts to catch and fix kernel bugs through static tests. Primarily, his method involved automating a large number of builds, first to catch compilation errors that caused build failures, then to catch compiler warning messages. He has done these builds for years, progressively fixing the errors and then the warnings for a range of kernel configurations.
There are two motives for this particular side project, he said: to help automate the testing of the many pull requests seen in the arm-soc tree (for which the sheer number of SoCs presents a logistical challenge), and to put significant code-refactoring work to the test. Previously, he explained, he had attempted to review every pull request in arm-soc and fix every regression, but that quickly proved too time-consuming to be done manually. Testing each patch automatically first reduced the time required. As for refactoring, he noted that he was a veteran of the big kernel lock removal days and was now helping out with the effort to implement year-2038 compliance. In both cases, the refactoring touched hundreds of separate drivers, which can mean a glut of regressions.
Broadly speaking, he said, there are two approaches to testing scores of builds. One can either record all known warnings and send an email whenever a new warning appears, or one can try to eliminate all known warnings. Bergmann has opted for the second approach, running a near-constant stream of kernel builds, and creating a patch for every compiler warning he sees. At present, he reported, there are about 500 such patches, most of them tiny. He is currently automating builds with a script he wrote that creates a random kernel configuration and attempts a build. He is averaging 50 builds a day, almost all for 32-bit ARM, with occasional forays into 64-bit ARM and, rarely, other architectures.
Getting to this current state has taken some time. In
2011, he began by fixing all of the failures produced by running make defconfig (that is, "default
configuration") and make allmodconfig (that is, configuring as many symbols
to "module" as possible) builds in the arm-soc tree. By 2012, those
failures were eliminated, so he set out to eliminate all compiler
warnings
produced by defconfig builds. By 2013, those warnings had been
eliminated, and he began running his build tests with
make randconfig—which creates a randomized kernel
configuration. In 2013, he had eliminated all randconfig
failures, and turned to eliminating the allmodconfig
warnings. He began chipping away at randconfig warnings in
mid-2014. Although that process is not yet complete, he has also
begun to run build tests using the Clang compiler instead of GCC,
which, as one would expect, generates entirely different errors and
warnings.
The most common bugs he discovers with randconfig builds are missing dependency statements, he said, which cause necessary parts of the kernel to not get built. In particular, he cited missing Netfilter dependencies and ALSA codec dependencies as common, although he also noted that x86 developers seem to forget that, at least on ARM, I2C can be configured as a module and thus needs to be listed as a dependency if it is needed. The ALSA problems suggest that we need a better way to express codec dependencies, he said, although he conceded that kernel configurations are confusing in plenty of ways. For example, he showed this patch he had written:
--- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -7,7 +7,9 @@ config OPENVSWITCH depends on INET depends on !NF_CONNTRACK || \ (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \ - (!NF_NAT || NF_NAT))) + (!NF_NAT || NF_NAT) && \ + (!NF_NAT_IPV4 || NF_NAT_IPV4) && \ + (!NF_NAT_IPV6 || NF_NAT_IPV6))) select LIBCRC32C select MPLS select NET_MPLS_GSO
and asked "what does it even mean for it to depend on NF_NAT or not NF_NAT?" The answer, he said, is that the test is being used to set an "is it a module or not" dependency for later usage, but it is hardly surprising that such syntax leads to bugs.
After "modules, modules, and more modules," Bergmann said, the next most common class of bugs he catches is uninitialized variables. He noted that Rusty Russell has written about how uninitialized variables are useful for error catching, but argued that they cause plenty of other errors. He showed a few examples, noting that often the flow of the code may mean that a reference to an uninitialized variable can never be reached, but he writes patches anyway to eliminate the warning. He also pointed out Steven Rostedt's patch to override if (for tracing purposes), saying it totally confused GCC, but that it helps to uncover quite a few bugs.
Next, Bergmann discussed some of the other code-checking tools available for kernel development, like scripts/checkpatch.pl and Sparse. Checkpatch looks for basic coding-style issues, he said, so while it is beneficial for submitting patches, it is not particularly valuable to run against existing code.
Sparse, however, makes use of annotations in the kernel source, therefore it can catch problems that GCC, with its lack of "domain-specific knowledge," simply cannot. Its big drawback is that it generates a lot of false positives. From the audience, Darren Hart noted that he uses Sparse regularly, but finds it problematic because it runs on complete files, rather than on patches alone. Therefore it tends to generate a lot of warnings that, upon inspection, were present in the original file and not the patch. Mauro Carvalho Chehab replied that some subsystem maintainers made an effort to remove all Sparse warnings in order to eliminate that particular problem, though far from all.
Bergmann also said he makes use of some extra GCC warnings to catch additional bugs. Kernel builds can employ a sort of "graduated" warning level thanks to work by Michael Marek: the W=12 switch includes all warnings from W=1 and W=2; W=123 adds the W=3 warnings as well. Using make W=1 is generally useful, he said, with W=12 adding little of value in a lot more noise, and W=123 being clear overkill, mostly due to an "explosion" of false positives. In the arm-soc tree, for instance, W=1 generates 631 instances of the most common warning, W=12 tops out at 94,235 for its top offender, and W=123 generates 782,719. The additional warnings of greatest interest to Bergmann include missing headers and missing prototypes. Bergmann also noted that he has recently run build tests with GCC 6, with promising results among the new warnings—so far, he has written 32 patches based on GCC 6 warnings. Most have already been applied.
Bergmann touched briefly on his experiments looking for build errors and warnings with Clang. That effort requires support from the LLVMLinux project, of course, and at the moment the patch set necessary to even compile the kernel with Clang is broken for mainline. But, since January (when he started his experiments), he has found "tons of new warnings." He eliminated the build errors found with Clang on randconfig builds, but has not yet tackled writing patches for the warnings. Clang also has a built-in static analyzer, he noted, which can produce rather nice-looking output and for which you can write your own checks, but he has not yet had the time to work with it.
Moving a bit further afield, he mentioned the proprietary Coverity scanning tool, for which Dave Jones has done "some amazing work" to record and annotate the known findings (which is necessary because Coverity requires manual categorization of the bugs it finds). The downside from Bergmann's perspective, though, is that Coverity is x86-only. He also pointed the audience to Julia Lawall's Coccinelle, which can do sophisticated pattern matching. He has worked with it for his own static checking, he said, though he has found it "really slow." Consequently, it is not a tool he would use in his own work, though he admitted he may be doing something wrong.
Another tool Bergmann does not use regularly, but that he cited for its "surprisingly good" warnings, is Dan Carpenter's Smatch. Carpenter has used it to catch thousands of bugs, he said, and pointed audience members to Carpenter's recent blog post for further information. Next, Bergmann highlighted the convenience of the 0day build bot maintained by Fengguang Wu; in addition to monitoring public Git trees, it recently started testing patch submissions and generating patches. And, finally, he noted the kernelci.org build-and-boot testing infrastructure. The most interesting part of the project for Bergmann is that the service is ARM-centric and the build farm includes a wide variety of machines.
By that point in the session, time had run out, so there was not much opportunity for the audience to ask questions. Nevertheless, it was surely an informative look at how static code checking benefits the arm-soc tree, where the ever-expanding list of supported hardware makes for a daunting maintainer workload. Furthermore, as Bergmann pointed out more than once, there are benefits to squashing warnings in addition to compilation errors, regardless of what code one is testing.
[The author would like to thank the Linux Foundation for travel assistance to attend ELC 2016.]
Patches and updates
Kernel trees
Architecture-specific
Build system
Core kernel code
Device drivers
Device driver infrastructure
Filesystems and block I/O
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
OpenBMC, a distribution for baseboard management controllers
The Intelligent Platform Management Interface (IPMI) is a set of system-management-and-monitoring APIs typically implemented on server motherboards via an embedded system-on-chip (SoC) that functions completely outside of the host system's BIOS and operating system. While it is intended as a convenience for those who must manage dozens or hundreds of servers in a remote facility, IPMI has been called out for its potential as a serious hole in server security. At the 2016 Embedded Linux Conference in San Diego, Tian Fang presented Facebook's recent work on OpenBMC, a Linux distribution designed to replace proprietary IPMI implementations with an open-source alternative built around standard facilities like SSH.
To recap, IPMI SoCs are known as baseboard
management controllers (BMCs). The BMC is connected to most of
the standard buses on the motherboard, so it can monitor temperature
and fan sensors, storage devices and expansion cards, and even access
the network (through its own virtual network interface that includes a
separate MAC address). But BMCs almost invariably ship with a
proprietary IPMI implementation in binary-blob form (though, in most
cases, that blob is running on Linux), which is limited
in functionality to what the vendor chooses. Furthermore, as Matthew
Garrett outlined quite memorably at
linux.conf.au in 2015, IPMI is riddled with poor security and, thus,
leaves servers vulnerable to all sorts of attacks. Once the BMC has
been compromised, the attacker has direct access to essentially every
part of the server.
Most server vendors, Fang said at the start of his talk, provide their own BMCs. The "white box" server or generic server motherboard markets, however, are dominated by ARM-based BMC SoCs from ASPEED. So when Facebook started working on the latest machine design in its Open Compute Project (OCP)—which creates and releases open hardware designs for white-box data-center machines—it went with an ASPEED BMC as well. But, in keeping with OCP's philosophy, it decided to build its own Linux distribution to run on the BMC, rather than buy a proprietary IPMI stack.
The resulting distribution is called OpenBMC. It was first deployed in 2013 with the Wedge, a "top-of-rack network switch" design from OCP, although Fang noted that, from a design standpoint, the team looked at the switch as "just another server." OpenBMC has subsequently been deployed on several of Wedge's successor designs.
Currently, OpenBMC runs on only a few ASPEED SoCs, and Fang pointed out that the distribution is tailored to the specific OCP machines used by the project, even though similar SoCs may be found in non-OCP servers. The supported SoCs from ASPEED include the common AST2300 and AST2400 (both operating at 400MHz), plus the newer AST2500 (at 800MHz and adding support for PCIe Host) that is now being tested (but has not yet been put into production servers). All of these SoCs come with a large set of GPIO pins (232 in the AST2500) and multiple independent I2C controllers, so that they can monitor many hardware features simultaneously. But, Fang said, OpenBMC provides "standard Linux server" tools rather than a traditional IPMI interface; it exposes those hardware-monitoring features through lm-sensors. Administrators can connect to the OpenBMC instance using key-based SSH authentication, rather than IPMI's authentication scheme, and Facebook has written a REST-ful web service for monitoring multiple machines remotely. Among the other services included are DHCP, Link Layer Discovery Protocol (LLDP) utilities, Python, and various hardware-monitoring tools.
Fang then outlined the process of porting OpenBMC to a new board. The system uses the U-Boot bootloader and an embedded Linux distribution created with Yocto. The Yocto build includes three layers. The first is a common layer (meta-openbmc) that defines all of the packages used in OpenBMC. On top of that comes a general BMC layer that enables the bootloader, C library, and kernel recipes for a particular SoC vendor (at present, meta-aspeed). At the top comes a machine-specific layer (e.g., meta-wedge) containing the specific kernel, bootloader, and sensor-software configurations, and also enables the recipes for Facebook's custom user-space monitoring programs. For example, meta-aspeed includes recipes that define the GPIO tables for the various ASPEED SoCs, while meta-wedge configures each of those GPIOs correctly for the Wedge server (which pins are voltage monitors, which are used for fan control, and so on).
He showed several example recipes (which are available in the session slides [PDF]), including a GPIO configuration, an lm-sensors configuration, and a libpal example. Libpal is a network-packet assembly library; Facebook uses it to generate a small set of IPMI messages to provide compatibility with other monitoring tools. He also briefly discussed the REST-ful web service developed by Facebook. The feature set does not veer much (if any) from the ordinary—administrators connect and request server status, which is returned in JSON—although he did point out that the service works with cURL and can, therefore, be scripted, in addition to working through the browser. The notable feature, of course, is that the service runs on the BMC rather than on the server's CPU.
There are currently thousands of OCP Wedge switches (and its successors) in Facebook data centers running OpenBMC, Fang said. OpenBMC allows Facebook to run more recent kernel releases than one would typically find in a vendor-provided IPMI stack; the latest deployments use 4.1. That said, upgrading a BMC involves flashing a new root filesystem, kernel, and bootloader to the SoC's EEPROM, a process that can occasionally go awry. But the BMC on the Wedge and other OCP switches sits in a removable socket, so it can be pulled and replaced if necessary.
Fang cited two issues as the focus of current OpenBMC development. The first is driver stability. Vendor board-support packages, he said, typically focus only on board bring-up, while thousands of machines running different workloads encounter a wide variety of bugs. "When you deploy thousands of machines every 'edge case' becomes common." The second is improving the tooling provided in OpenBMC. He noted that more development tools are being ported to the distribution, along with the Chef provisioning system and additional monitoring programs.
The session ended with a brief question-and-answer period. Fang reported that, so far, no non-ASPEED BMCs are supported, but that the OpenBMC team would love to work with any BMC company that was interested. He also said that Facebook has started work on a BMC development board that will hopefully serve as a low-cost entry point for interested developers.
But the biggest question was whether Facebook would be pushing any of its OpenBMC work into the mainline kernel. From the audience, Grant Likely mentioned that Linaro would like to see patches, to which Fang replied that the OpenBMC team is interested, but that it does not consider most of the code to be in a ready-to-upstream, fully working state. ARM SoC tree co-maintainer Arnd Bergmann, also in the audience, encouraged Fang and others working on OpenBMC not to wait for some point when everything is "complete," but to start sending in patches incrementally, even if they will require additional revisions and effort. "'Working' has never been a requirement for upstream," he said, which the audience found amusing. "The main thing we want to see is that the code is maintainable and that someone is interested in working on it."
Fang seemed agreeable to that suggestion, so perhaps it is only a matter of time before the mainline kernel will run on at least some portion of the BMCs found in today's data centers and network closets. It may only be a start, and only support one BMC vendor, but that itself is still noteworthy progress in comparison to the unappealing security and software-freedom issues that plague so many systems via the IPMI framework.
[The author would like to thank the Linux Foundation for travel assistance to attend ELC 2016.]
Brief items
Distribution quote of the week
I am confident they are forcing me to wake up at this ungodly hour simply because they have hearts of pure ice. (Other than that, they're nice guys.)
CoreOS "Ignition" released
CoreOS has announced the release of its "Ignition" provisioning tool. "At the the most basic level, Ignition is a tool for manipulating disks during early boot. This includes partitioning disks, formatting partitions, writing files, and configuring users." It runs as the first process — before systemd — to get the system into the proper shape before the ordinary boot process takes over.
OpenStack Mitaka released
OpenStack Mitaka has been released. "OpenStack Mitaka, the 13th release of the most widely deployed open source software for building clouds, now offers greater manageability and scalability as well as an enhanced end-user experience. The Mitaka release was designed and built by an international community of 2,336 developers, operators and users from 345 organizations. OpenStack has become the cloud platform of choice for enterprises and service providers, as an integration engine to manage bare metal, virtual machines, and container orchestration frameworks with a single set of APIs." More information can be found in the release notes. There is also a press release available.
Newsletters and articles of interest
Distribution newsletters
- DistroWatch Weekly, Issue 656 (April 11)
- Linux Mint Monthly News (March)
- Tails report (March)
- Ubuntu Kernel Team - Weekly Newsletter (April 5)
- Ubuntu Weekly Newsletter, Issue 461 (April 10)
Page editor: Rebecca Sobol
Development
Python looks at paths
The Python core developers cannot be accused of insufficient discussion when considering changes. A recent topic, which was something of a spin-off from the to_file() discussion we covered last week, spanned multiple threads in two different mailing lists—with several hundred posts in total. The question at hand concerned filesystem paths and the pathlib module that was added to the standard library for Python 3.4 on a provisional basis. Pathlib provides many useful features, but is not integrated with the rest of the standard library, making it difficult to use—changing that in one way or another is currently under consideration.
Pathlib is an object-oriented interface to filesystem paths that provides semantics that are appropriate to the underlying operating system (OS) where the program is running. PEP 428 described the motivations and goals for including the third-party pathlib module into the standard library; it was approved in November 2013.
The idea is to provide OS-independent ways to manipulate paths. For example, the / operator can be used to build up paths:
from pathlib import Path p = Path('foo') q = p / 'bar' / 'baz'On Unix systems, that will create a path object that resolves to foo/bar/baz, while on Windows it will be foo\bar\baz.
But there is a problem. Path objects are not strings and cannot be used where strings are expected. Furthermore, since pathlib was adopted provisionally, the rest of the language has not been changed to support it. Thus, path objects cannot be passed to built-ins like open() and friends, nor to standard library routines that might be expected to support them.
p-strings?
The discussion of adding a to_file() method for a quick and easy way to write a file from a string noted the presence of the write_text() method for path objects, but the need to import pathlib made it less "quick and easy" than some had hoped. Initially, that led Koos Zevenhoven to propose the addition of "p-strings", which would be similar in some ways to the f-strings for formatted output that are described in PEP 498 (and will appear in Python 3.6, which is scheduled for the end of 2016). P-strings would create path objects from strings:
p"targetfile.txt".write_text(string_to_write)That code would create a path object and write a string to the file it represents, much as the proposed to_file() interface would have done.
But, in that post to the python-ideas mailing list, Zevenhoven acknowledged
that making pathlib.Path a subclass of
the built-in str (string) class would probably solve most of the
compatibility issues between strings and paths that were being discussed.
If functions like open() could just treat a passed-in path object
as a string, there would be no need for programmers using pathlib to
manually call str(path) before passing a path to those routines.
Several were in favor of subclassing str for paths, noting that other
third-party path libraries had done so over the years—seemingly with no ill
effects. Michel Desmoulin asked: "Can somebody defend, with practical examples, the imperative to
refuse to inherit from str ?
"
Brett Cannon took up that request by
describing the reasons he thinks it does not make sense for path objects to
be a subclass of strings (also as a blog
post). He agreed that it would solve some short to medium term
compatibility problems, which
might indicate that its "practicality beats purity
" (as The Zen of Python
spells out). But there is another entry in that document that people may be
forgetting, he said, "explicit is better than implicit
".
Part of the problem that he sees is that while all paths are strings, not all strings are paths. That goes beyond strings that result in "file not found" errors; there are strings that are illegal for paths at all. He likened the choice to the compatibility break made for Unicode in Python 3:
For Cannon, the "explicit" principle overrides the "practicality" argument, though others see it differently. He did suggest that getting support for pathlib objects into various places in Python would help alleviate the current push for "path strings"; he has opened a bug for importlib with the aim of doing just that.
The subclassing and p-string ideas were discussed at length, but there was a fairly clear divide among the participants. Guido van Rossum noticed that and wondered if it came from different kinds of use cases for the language:
The future of pathlib
Several other ideas were batted around in python-ideas before most of the
conversation moved over to the python-dev mailing list. Cannon moved the
discussion there with a question:
"When should pathlib stop being provisional?
" Van Rossum replied that it should either be removed or
made non-provisional by the time 3.6 is released. Given the discussion,
though, keeping it means enhancing pathlib in some fashion—either by making
Python and the standard library more pathlib-friendly or by making pathlib
objects inherit from the string class. Van Rossum made it clear that he is not unwilling to
determine that pathlib is a failed experiment and remove it from the
standard library for 3.6 and beyond.
But there is a mechanism that can be used by libraries and built-in functions to ease the pain of supporting both string-based and path-based arguments. In versions 3.4.5 and 3.5.2 (both upcoming maintenance releases), pathlib objects will have a new path attribute that provides a string representation of the path. That allows code like the following to transparently handle both path and string arguments:
arg = getattr(arg, 'path', arg)That code will extract a path attribute from the argument or simply return the argument if it is missing that attribute (e.g. a string). While it may be a little ugly looking, code like that sprinkled around in the right places would go a long way toward making pathlib objects "just work".
Nick Coghlan proposed broadening that idea some:
His wording was perhaps a bit over the top—Van Rossum jokingly called it "a classic 'serious programming'
solution (objects, abstractions, serialization, all big important
words)
"—but the idea behind it was sound. Coghlan was suggesting adding
a new protocol so that objects could define their own path-as-string
representation using a "private" attribute (which, for Python, means it begins
and ends with double underscores). He suggested __path__, which
would also be called "dunder path" in Python circles. In another
message, he acknowledged that bikeshedding
was possible (or, perhaps, likely) and that __fspath__ might make
a better name.
Coghlan pointed out that there weren't that many places that needed to support this new protocol and that those changes didn't necessarily need to come all at once. It turned out to be a fairly popular solution to the quandary—modulo some amount of wrangling over details. Ethan Furman stepped up to help implement the changes, as did Cannon.
Those details include bikeshedding on the name, as predicted, though __fspath__ or __fspathname__ seem to be the clear leaders at this point. There was a question of whether the new member should be a method or an attribute (i.e. a function or a value), which seems to have resolved in favor of a method. The idea of a helper function to replace the getattr() mechanism described above also came up. It would return the value produced by anything that implements the __fspath__() protocol or simply the argument itself if it doesn't, which would allow libraries and other packages to support those kinds of objects relatively easily. The idea was popular; the only real question was where it should live (as a built-in, in pathlib, or elsewhere). The os module seems the most popular choice at this point.
Consensus
Throughout the discussion (also spanning multiple threads), summaries of the rough consensus were posted by Furman and Cannon. The most recent at the time of this writing were posted within minutes of each other on April 11 and show remarkable progress with only a few open issues. Van Rossum has put Cannon and Chris Angelico in charge of drafting a proposal, Cannon reported. It will probably be done as an addendum to PEP 428. That addendum will also clearly spell out the reasons that pathlib objects do not inherit from the string class, which was a complaint about the PEP as it stands.
Both Cannon and Furman agree that the remaining sticking points have to do with whether __fspath__() and os.fspath() can return byte arrays (i.e. the Python bytes type) as well as strings. For working with filesystems that support more than just names in the default encoding, it may be difficult to avoid handling byte arrays, sometimes without the language knowing the encoding (which is part of why bytes exists). For now, it appears that allowing them to return bytes is favored.
While it was a massive and scattered discussion, it proceeded rather smoothly. Van Rossum noted that there was some hostility toward pathlib that concerned him, but none of that leaked into the tone of the discussion. Opponents and advocates alike participated without rancor and tried to find a solution that would best solve the problems. Being able to peer into this kind of "design in the open" is one of the benefits of open-source projects. Meanwhile, a semi-unloved piece of the language has gotten the attention it needs—pathlib seems likely to stay for Python 3.6 and beyond.
Brief items
Quote of the week
WordPress 4.5 released
Version 4.5 of the WordPress blogging platform has been released. Notable changes include a new, less obtrusive pop-up tool for inserting hyperlinks while editing a post, live previews for mobile and tablet layouts, and a faster image resizer.
LXD 2.0 is available
Version 2.0 of the LXD container hypervisor has been released. Based on LXC 2.0, this is "
the first production-ready release of LXD
", serving as "
essentially a system wide daemon offering a REST API to interact with containers.
" The announcement notes that, among other features, LXD supports a variety of storage backends and live migration via Checkpoint/Restore in Userspace (CRIU).
Newsletters and articles
Development newsletters from the past week
- What's cooking in git.git (April 7)
- What's cooking in git.git (April 12)
- OCaml Weekly News (April 12)
- Perl Weekly (April 11)
- PostgreSQL Weekly News (April 10)
- Python Weekly (April 7)
- Ruby Weekly (April 7)
- This Week in Rust (April 11)
- Wikimedia Tech News (April 11)
Hutterer: Why libinput doesn't have a lot of config options
Peter Hutterer writes about the cost of configuration options. "You see, whenever you write 'it's just 5 lines of code to make this an option', what I think is 'once the patch is reviewed and applied, I'll spend two days to write test cases and documentation. I'll need to handle any bug reports related to this, and I'm expected to make sure this option works indefinitely. Any addition of another feature may conflict with this option, so I need to make sure the right combination is possible and test cases are written.' So your work ends after writing a 5 line patch, my work as maintainer merely starts."
Page editor: Nathan Willis
Announcements
Brief items
Let's Encrypt is no longer "beta"
The Let's Encrypt project, which is working to enable encrypted communications across the web, has announced that it has gained more sponsors and no longer considers itself to be in a "beta" state. "Since our beta began in September 2015 we’ve issued more than 1.7 million certificates for more than 3.8 million websites. We’ve gained tremendous operational experience and confidence in our systems. The beta label is simply not necessary any more."
FSFE: Joint Statement on the Radio Lockdown Directive
The Free Software Foundation Europe joined 23 organizations in proposing measures to EU institutions and EU member states to avoid negative implications on users' rights and Free Software imposed by the EU Radio Equipment Directive 2014/53/EU. "The Radio Lockdown Directive that will be applicable in the EU since 13 June 2016 threatens software freedom, users' rights, fair competition, innovation, environment, and volunteering – without comparable benefits for security. It introduces disproportionate ‘essential requirement’ in the form of forcing device manufacturers to prove radio regulatory compliance for every possible software able to run on every product using the radio frequency spectrum. In practice, this means that in the future only particular software authorised by the manufacturers can be installed on any device connecting through wireless and mobile networks or GPS: e.g. routers, mobile phones, WiFi cards and the laptops they are built in, or almost all devices including network functionality."
Articles of interest
Moglen: How Should the Free Software Movement View the Linux Foundation?
Eben Moglen opines on the role of the Linux Foundation, and on GPL enforcement in general. "LF will be as favorable to copyleft as its members are. Copyleft licensing is easy for businesses to doubt: required sharing of work that could be instead 'owned' by the capital investors seems to be mere loss in conventional calculations. I have spent most of my adult lifetime not telling businesses that copyleft was in their interest, but educating them about copyleft and others’ experience with it, in order to allow them to draw their own conclusions. Experience has taught me that this process, though uncertain and unscalable, is absolutely crucial to the attainment of the free software movement’s fundamental objectives. It is, however, all too easily destroyed by any form of overly aggressive copyleft enforcement that fully confirms businesspeople’s skepticism."
FSF: Interpreting, enforcing and changing the GNU GPL, as applied to combining Linux and ZFS
Richard Stallman looks at the GPL and how it is incompatible with the CDDL (Common Development and Distribution License), which is the license used by ZFS. "Likewise, the copyright holders of ZFS (the version that is actually used) can give permission to use it under the GNU GPL, version 2 or later, in addition to any other license. This would make it possible to combine that version with Linux without violating the license of Linux. This would be the ideal resolution and we urge the copyright holders of ZFS to do so. Some copyright holders choose not to enforce their licenses in specific situations. That enables users to operate as if permission were granted. However, this does not alter the meaning of the GNU GPL, and does not cause uses that the GPL disallows to either suddenly or slowly become permitted by the GPL. Such acquiescence is not the case in regard to linking Linux and ZFS; indeed, some Linux copyright holders have said they consider this copyright infringement. We have explained above the reasons why that is so."
Calls for Presentations
LPC Call for Refereed-Track Proposals
The Call for Refereed-Track Proposals for the Linux Plumbers Conference is open until September 1. LPC will take place November 2-4 in Santa Fe, NM. "Refereed track presentations are 50 minutes in length and should focus on a specific aspect of the "plumbing" in the Linux system. Examples of Linux plumbing include core kernel subsystems, core libraries, windowing systems, management tools, device support, media creation/playback, and so on. The best presentations are not about finished work, but rather problems, proposals, or proof-of-concept solutions that require face-to-face discussions and debate."
CFP Deadlines: April 14, 2016 to June 13, 2016
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
April 15 | June 27 July 1 |
12th Netfilter Workshop | Amsterdam, Netherlands |
April 15 | June 22 June 26 |
openSUSE Conference 2016 | Nürnberg, Germany |
April 24 | August 20 August 21 |
Conference for Open Source Coders, Users and Promoters | Taipei, Taiwan |
April 26 | August 22 August 24 |
LinuxCon NA | Toronto, Canada |
April 27 | August 12 August 14 |
GNOME Users and Developers European Conference | Karlsruhe, Germany |
April 30 | June 11 | TÜBIX 2016 | Tübingen, Germany |
April 30 | October 27 October 28 |
Rust Belt Rust | Pittsburgh, PA, USA |
April 30 | August 25 August 26 |
The Prometheus conference | Berlin, Germany |
May 1 | August 24 August 26 |
KVM Forum 2016 | Toronto, Canada |
May 2 | June 24 June 25 |
devopsdays Silicon Valley 2016 | Mountain View, CA, USA |
May 6 | October 26 October 27 |
All Things Open | Raleigh, NC, USA |
May 6 | July 13 July 15 |
LinuxCon Japan | Tokyo, Japan |
May 8 | August 12 August 16 |
PyCon Australia 2016 | Melbourne, Australia |
May 15 | July 2 July 9 |
DebConf16 | Cape Town, South Africa |
May 15 | September 1 September 8 |
QtCon 2016 | Berlin, Germany |
May 15 | June 11 June 12 |
Linuxwochen Linz | Linz, Austria |
May 16 | October 31 November 2 |
O’Reilly Security Conference | New York, NY, USA |
May 23 | October 17 October 19 |
O'Reilly Open Source Convention | London, UK |
May 23 | August 20 August 21 |
FrOSCon - Free and Open Source Software Conference | Sankt-Augustin, Germany |
May 24 | August 18 August 21 |
Camp++ 0x7e0 | Komárom, Hungary |
May 24 | November 9 November 11 |
O’Reilly Security Conference EU | Amsterdam, Netherlands |
May 25 | October 5 October 7 |
International Workshop on OpenMP | Nara, Japan |
May 29 | September 20 September 23 |
PyCon JP 2016 | Tokyo, Japan |
May 30 | September 13 September 16 |
PostgresOpen 2016 | Dallas, TX, USA |
June 3 | June 24 June 25 |
French Perl Workshop 2016 | Paris, France |
June 4 | July 30 July 31 |
PyOhio | Columbus, OH, USA |
June 5 | September 26 September 27 |
Open Source Backup Conference | Cologne, Germany |
June 5 | September 9 September 10 |
RustConf 2016 | Portland, OR, USA |
June 10 | August 25 August 26 |
Linux Security Summit 2016 | Toronto, Canada |
June 11 | October 3 October 5 |
OpenMP Conference | Nara, Japan |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
PGConf US 2016
PGConf US will take place April 18-20 in New York. The Postgres community will also celebrate the database’s 20th birthday with a party open to the broader New York tech community on April 19.Live Kernel Patching Microconference Accepted into LPC 2016
There will be a Live Kernel Patching Microconference at the Linux Plumbers Conference, which will take place November 2-4 in Santa Fe, NM.Events: April 14, 2016 to June 13, 2016
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
April 15 April 18 |
Libre Graphics Meeting | London, UK |
April 15 April 17 |
PyCon Italia Sette | Firenze, Italia |
April 15 April 17 |
Akademy-es 2016 | Madrid, Spain |
April 16 | 15. Augsburger Linux Info Tag | Augsburg, Germany |
April 18 April 19 |
Linux Storage, Filesystem & Memory Management Summit | Raleigh, NC, USA |
April 18 April 20 |
PostgreSQL Conference US 2016 | New York, NY, USA |
April 20 April 21 |
Vault 2016 | Raleigh, NC, USA |
April 21 April 24 |
GNOME.Asia Summit | Delhi, India |
April 23 | DevCrowd 2016 | Szczecin, Poland |
April 23 April 24 |
LinuxFest Northwest | Bellingham, WA, USA |
April 25 April 27 |
Cuba International Free Software Conference | Havana, Cuba |
April 25 April 29 |
OpenStack Summit | Austin, TX, USA |
April 26 | Open Source Day 2016 | Warsaw, Poland |
April 26 April 28 |
Open Source Data Center Conference | Berlin, Germany |
April 28 May 1 |
Mini-DebCamp & DebConf | Vienna, Austria |
April 28 April 30 |
Linuxwochen Wien 2016 | Vienna, Austria |
April 30 | Linux Presentation Day 2016.1 | many cities, Germany |
May 1 June 29 |
Open Source Innovation Spring | Paris, France |
May 2 May 5 |
FOSS4G North America | Raleigh, NC, USA |
May 2 May 3 |
PyCon Israel 2016 | Tel Aviv, Israel |
May 9 May 13 |
ApacheCon North America | Vancouver, Canada |
May 10 May 12 |
Samba eXPerience 2016 | Berlin, Germany |
May 14 May 15 |
Community Leadership Summit 2016 | Austin, TX, USA |
May 14 May 15 |
Open Source Conference Albania | Tirana, Albania |
May 16 May 19 |
OSCON 2016 | Austin, TX, USA |
May 17 May 21 |
PGCon - PostgreSQL Conference for Users and Developers | Ottawa, Canada |
May 24 May 25 |
Cloud Foundry Summit | Santa Clara, CA, USA |
May 26 | NLUUG - Spring conference 2016 | Bunnik, The Netherlands |
May 28 June 5 |
PyCon 2016 | Portland, OR, USA |
June 1 June 2 |
Apache MesosCon | Denver, CO, USA |
June 4 June 5 |
Coliberator 2016 | Bucharest, Romania |
June 11 June 12 |
Linuxwochen Linz | Linz, Austria |
June 11 | TÜBIX 2016 | Tübingen, Germany |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol