LWN.net Weekly Edition for February 25, 2016
Trouble at Linux Mint — and beyond
When the Linux Mint project announced that, for a while on February 20, its web site had been changed to point to a backdoored version of its distribution, the open-source community took notice. Everything we have done is based on the ability to obtain and install software from the net; this incident was a reminder that this act is not necessarily as safe as we would like to think. We would be well advised to think for a bit on the implications of this attack and how we might prevent similar attacks in the future.It would appear that the attackers were able to take advantage of a WordPress vulnerability on the Linux Mint site to obtain a shell; from there, they were able to change the web site's download page to point to their special version of the Linux Mint 17.3 Cinnamon edition. It also appears that the Linux Mint site was put back on the net without being fully secured; the attackers managed to compromise the site again on the 21st, restoring the link to the corrupted download. Anybody who downloaded this distribution anywhere near those two days will want to have a close look at what they got.
The Linux Mint developers have taken a certain amount of grief for this episode, and for their approach to security in general. They do not bother with security advisories, so their users have no way to know if they are affected by any specific vulnerability or whether Linux Mint has made a fixed package available. Putting the web site back online without having fully secured it mirrors a less-than-thorough approach to security in general. These are charges that anybody considering using Linux Mint should think hard about. Putting somebody's software onto your system places the source in a position of great trust; one has to hope that they are able to live up to that trust.
It could be argued that we are approaching the end of the era of amateur distributions. Taking an existing distribution, replacing the artwork, adding some special new packages, and creating a web site is a fair amount of work. Making a truly cohesive product out of that distribution and keeping the whole thing secure is quite a bit more work. It's not that hard to believe that only the largest and best-funded projects will be able to sustain that effort over time, especially when faced with an increasingly hostile and criminal net.
There is just one little problem with that view: it's not entirely clear that the larger, better-funded distributions are truly doing a better job with security. It probably is true that they are better able to defend their infrastructure against attacks, have hardware security modules to sign their packages, etc. But a distribution is a large collection of software, and few distributors can be said to be doing a good job of keeping all of that software secure.
So, for example, we have recently seen this article on insecure WordPress packages in Debian, and this posting on WebKit security in almost all distributions. Both are heavily used pieces of software that are directly exposed to attackers on the net; one would hope that distributors would be focused on keeping them secure. But that has not happened; the projects and companies involved simply have not found the resources to stay on top of the security of these packages.
It is not hard to see how this could be a widespread problem. When users evaluate distributions, the range of available packages tends to be an important criterion. Distributors thus have an incentive to include as many packages as they can — far more than they can support at a high level. The one exception might be the enterprise distributions which, one would hope, would be more conservative in the packages they choose to provide for their customers. But such distributions tend to ship old software (which can have problems of its own) and are often accompanied by add-on repositories filling in the gaps — and possibly introducing security problems of their own.
The situation is seemingly getting murkier rather than better. Some projects try to get users to install their software directly rather than use the distribution's packages. That might lead to better support for that one package, but it adds another moving part to the mix and shorts out all of the mechanisms put in place to get security updates to users. Language-specific packages are often more easily installed from a repository like CPAN or PyPI, but these organizations, too, do not issue security advisories and almost certainly do not have the resources in place to ensure that they are not distributing packages with known vulnerabilities. Many complex applications support some form of plugins and host repositories for them; the attention to security there is mixed at best. Projects like Docker host repositories of images for download. Public hosting sites deliver a lot of software, but are in no position to guarantee the security of that software. And so on.
Combine all this with a net full of bad actors who are intent on installing malware onto users' systems, and the stage is set for a lot of unhappiness. Indeed, it is surprising that there have not been more incidents than we have seen so far. There can be no doubt that we will see more of them in the future.
As a community, we take a certain amount of pride in the security of our software. But, regardless of whether that pride is truly justified, we are all too quick to grab software off some server on the net and run it on our systems — and to encourage others to do the same. As a community, we are going to have to learn to do a better job of keeping our infrastructure secure, of not serving insecure software to others, and of critically judging the security of providers before accepting software from them. It is not fun to feel the need to distrust software given to the community by our peers, but the alternatives may prove to be even less fun than that.
New video and music features in Kodi 16
Version 16.0 of the Kodi free-software media center was released on February 21. Unlike some previous releases that debuted major new features (such as Android support in 13.0 or H.265 video support in 14.0), this new release appears comparatively quiet on the surface. Under the hood, though, there are several changes that will make life easier for users, particularly where user-interface issues and system maintenance are concerned.
The installation procedure is a rather cut-and-dried affair at this point. Official builds are provided by the project for several Ubuntu-based distributions, as are community-maintained builds for Fedora, Debian, and OpenELEC. There are also packages for download for Android (both ARM and x86), a variety of Raspberry-Pi–specific distributions, Windows, iOS, OS X, BSD, and for an ever-growing list of consumer hardware devices like Amazon's Fire TV. Apart from the iOS and consumer-hardware cases (which may involve jailbreaking steps), little is required to get up and running.
The Kodi user interface, likewise, has stabilized over the past few release cycles, and navigation is more streamlined and logical than it was in years past. There are still UI inconsistencies to be found, such as whether the "close" button appears on the right or the left of a pop-over window. And in some instances, multi-page blocks of text (such as the "Description" block in the Add-ons Manager) can only be scrolled by using the mouse, not with arrow or page up/down keys. But, on the whole, media sources and configuration options are reachable in only a few clicks, it is difficult if not impossible to get lost, and the interface is virtually devoid of arcane internal terminology. The latter is a considerable accomplishment indeed, given that it includes explaining video calibration and debug logging.
Speaking of logging, one of the most-highlighted new features in this release is Kodi's new event-logging framework. This is a mechanism that provides an in-application, browseable view of a wide variety of events: adding new media to the library, altering settings, installing add-ons, and so forth. The logged events include errors and warnings, which the release notes highlight as a feature that users have missed in past releases—leaving them unable to troubleshoot problems when newly-added media fails to show up in the library, for example.
An example of the subtle improvements in 16.0 is the revamped Music Library feature. Kodi has developed a reputation for admirable handling of video content (both local and remote), but has let its support for serving as a music manager languish. The 16.0 release marks the start of a renewed effort to rectify the situation. Adding new audio content is now much simpler, and Kodi automatically scans and adds the relevant metadata from the files.
There is also a new framework in place to support advanced audio processing. Though it is not active yet, in future releases it will pave the way for a number of audio features, like multi-channel equalizers, "fake surround sound," and a variety of other effects.
Deeper under the hood, the new release brings two changes to the way skins and other add-ons are stored. The first is that the file layout used within skin add-ons has been changed to match that of other add-on types; this was primarily done to make it easier to migrate settings from one skin to another. The second and perhaps more interesting change is that add-ons can now share image resources. It may take some time for add-on authors to begin taking advantage of the feature, but it will enable skins to, for example, provide a customized look to other add-ons (such as theming the icons of media sources to match the UI).
On the video front, perhaps the most obvious new addition is support for non-linear stretching of 4:3 content to fit on 16:9 displays. The technique employed tries to retain the center of the screen without visible distortion and progressively stretches out the image closer to the sides of the display. Of course, purists still might scoff at anyone deigning to watch Citizen Kane in anything other than the original aspect ratio, but there are surely instances when such elongation is necessary. Users who employ Kodi as a digital video recorder (DVR) will be pleased to note that Kodi's PVR module (from "personal" video recorder) now supports "series" recording rules, which is a staple of most other DVR applications.
Those users with 3D displays (either 3D-capable TVs or virtual-reality headsets) will get to sample a unique new UI feature: Kodi UI "skins" can now employ 3D depth effects, with the default skin ("Confluence") providing an example. The much larger group of users without 3D displays also get some UI improvements, however. Most notably, the "long press" action is now supported in Kodi's remote-control command mapping. That makes it possible to use Kodi with a number of modern, simple remotes—where the recent trend is toward directional arrow keys, a "Select" or "OK" button, and little else.
This style of remote is particularly popular with consumer hardware like the Fire TV; users who control Kodi through other means (such as a wireless keyboard) are unaffected. The long press is bound to Kodi's context-menu action by default, so it pops up a menu of additional commands. Those using Kodi on a Linux box with touchscreen support have yet another UI option, as Kodi now supports multi-touch gestures. Gesture support has been available in the Android and iOS releases for some time; there is a small set of gestures recognized by default, though it is configurable.
Finally, the Android rendering stack has been reworked to cope with 4K displays. In earlier releases, both the Kodi UI and any video content being displayed were rendered to the same surface, using libstagefright. But that made it impossible to render the UI and the video at different resolutions. Rendering the 4K version of the Kodi UI brought interactivity to a crawl on most Android devices, while limiting video playback to 720p or 1080p resolution would defeat the purpose of 4K support. Starting with the 16.0 release, the video stream and the UI are rendered to separate MediaCodec surfaces (rather than libstagefright), thus enabling 4K hardware-accelerated video while keeping the UI at its native, non-4K resolution.
As a project, Kodi relies heavily on the community of add-on and skin developers for implementing new user-facing features. So as the core application matures, there may be fewer big developments in every release cycle. Nevertheless, as the 16.0 release shows, there will always be room for improvements. Some of the new under-the-hood functionality will take time to trickle out as developers update add-ons and skins, but there is certainly enough in the new release for users to be happy with the upgrade.
Systemd vs. Docker
There were many different presentations at DevConf.cz, the developer conference sponsored by Red Hat in Brno, Czech Republic this year, but containers were the biggest theme of the conference. Most of the presentations were practical, either tutorials showing how to use various container technologies like Kubernetes and Atomic.app, or guided tours of new products like Cockpit.
However, the presentation about containers that was unquestionably the most entertaining was given by Dan Walsh, Red Hat's head of container engineering. He presented on one of the core conflicts in the Linux container world: systemd versus the Docker daemon. This is far from a new issue; it has been brewing since Ubuntu adopted systemd, and CoreOS introduced Rocket, a container system built around systemd.
Systemd vs. Docker
"This is Lennart Poettering," said Walsh, showing a picture. "This is Solomon Hykes", showing another. "Neither one of them is willing to compromise much. And I get to be in the middle between them."
Since Walsh was tasked with getting systemd to work with Docker, he detailed a history of code, personal, and operational conflicts between the two systems. In many ways, it was also a history of patch conflicts between Red Hat and Docker Inc. Poettering is the primary author of systemd and works for Red Hat, while Hykes is a founder and CTO of Docker, Inc.
According to Walsh's presentation, the root cause of the conflict is that the Docker daemon is designed to take over a lot of the functions that systemd also performs for Linux. These include initialization, service activation, security, and logging. "In a lot of ways Docker wants to be systemd," he claimed. "It dreams of being systemd."
The first conflict he detailed was about service initialization and restart. In the systemd model, all of this is controlled by systemd; in the Docker world, it is all controlled by the Docker daemon. For example, services can be defined in systemd unit files as "docker run" statements to run them as containers, or they can be defined as "autorestart" containers in the Docker daemon. Either approach can work, but mixing them doesn't. The Docker documentation recommends Docker autorestart, except when mixing containerized services with services not in a container; there it recommends systemd or Upstart.
Where this breaks down, however, is when services running as containers
depend on other containerized services. For regular services, systemd has
a feature called sd_notify
that passes messages about when services are ready, so that services that
depend on them can then be started. However, Docker has a client-server
architecture. docker run and other commands are called in
the client for each user session, but the containers are started and
managed in the Docker daemon (the "server" in this relationship). The client can't send sd_notify status messages because it doesn't actually manage the container service and doesn't know when the services are up, and the daemon can't send them because it wasn't called by the systemd unit file. This resulted in Walsh's team attempting an elaborate workaround to enable sd_notify:
- systemd requests sd_notify from the Docker client
- That client sends an sd_notify message to the Docker daemon
- The daemon sets up a container to do sd_notify
- The daemon gets an sd_notify from the container
- The daemon sends an sd_notify message to the client
- The client sends an sd_notify message to tell systemd that the Docker container is ready
Walsh was unsurprised when the patches to enable this byzantine system were not accepted by the Docker project. sd_notify does work for the Docker daemon itself, so systemd services can depend on the daemon running. But there is still no way to do sd_notify for individual containerized services, so the Docker project still has no reliable way to manage containerized service dependency startup order.
Systemd has a feature called "socket activation", where services start automatically upon receiving a request to a particular network socket. This lets servers support "occasionally needed" services without running them all the time. There used to be support for socket activation of the Docker daemon itself, but the feature was disabled because it interfered with Docker autorestart.
Walsh's team was more interested in socket activation of individual containers. This would have the benefit of eliminating the overhead of "always on" containers. However, the developers realized that they'd have to do something similar to the sd_notify workaround, only they'd be passing around a socket instead of just a message. They didn't even try to implement it.
Linux control groups, or cgroups, let
you define system resource allocations per service, such as CPU, memory,
and I/O limits. Systemd allows defining cgroup limits in the
initialization files, so that you can define resource profiles for services
when they start. With Docker, though, this runs afoul of the client-server
model again. The systemd cgroup settings affect only the client; they do
not affect the daemon process, where the container is actually running.
Instead, each one inherits the cgroup settings of the Docker daemon. Users
can pass cgroup limits by passing
flags to the docker run statement instead, which
works but does not integrate with the overall administrative policies for
the system.
The only success story Walsh had to relate was regarding logging. Docker logs also didn't work with systemd's journald. Logging of container output was local to each container, which would cause all logs to be automatically erased whenever a container was deleted. This was a major failing in the eyes of security auditors. Docker 1.9 now supports the --log‑driver=journald switch, which logs to journald instead. However, using journald is not the default for Docker containers, so the switch needs to be passed each time.
Systemd inside containers
Walsh also wanted to get systemd working in Fedora, Red Hat Enterprise
Linux (RHEL), and CentOS container base images, partly because many
packages require
the systemctl utility in order to install correctly. His first
effort was something called "fakesystemd"
that replaced systemctl with a service that satisfied the
systemctl requirement for packages and did nothing else. This turned out to cause problems for users and he soon abandoned it, but not soon enough to prevent it from being released in RHEL 7.0.
In RHEL 7.1, the team added something called "systemd-container", that was a substantially reduced version of systemd. This still caused problems for users who needed full systemd for their software, and Poettering pressured the container team to change it. As of RHEL 7.2, containers have real systemd with decreased dependencies installed so that it can be a little smaller. Walsh's team is working on reducing these dependencies further.
The biggest problem with not having systemd in the container, according to Walsh, is that it goes "back to the days before init scripts." Each image author creates his or her own crazy startup script for the application inside the container, instead of using the startup scripts crafted by the packagers. He showed how easily service initialization is done inside a container that has systemd available, by showing the three-line Dockerfile that is all that is required to create a container running the Apache httpd server:
FROM fedora
RUN yum -y install httpd; yum clean all; systemctl enable httpd;
CMD [ "/sbin/init" ]
There is a major roadblock to making systemd inside Docker work, though: running a container with systemd inside requires running it with the --privileged flag, which makes it insecure. This is because the Docker daemon requires the "service" application run by the container to always be PID 1. In a container with it, systemd is PID 1 and the application has some other PID, which causes Docker to think the container has failed and shut it down.
Poettering says that PID 1 has special requirements. One of these is killing "zombie" processes that have been abandoned by their calling session. This is a real problem for Docker since the application runs as PID 1 and does not handle the zombie processes. For example, containers running the Oracle database can end up with thousands of zombie processes. Another requirement is writing to syslog, which goes to /dev/null unless you've configured the container to log to journald.
Walsh tried several approaches to make systemd work in non-privileged containers, submitting four different pull requests (7685, 10994, 13525, and 13526) to the Docker project. Each of these pull requests (PRs) was rejected by the Docker maintainers. Arguments around these changes peaked when Jessie Frazelle, a Docker committer, came to DockerCon.EU 2015 with the phrase "I say no to systemd specific PRs" printed on her badge (seen at right).
The future of systemd and containers
The Red Hat container team has also been heavily involved in developing the runC tool of the Open Container Project. That project is the practical output of the Open Container Initiative (OCI), the non-profit council established through the Linux Foundation in 2015 in order to set industry standards for container APIs. The OCI also maintains libcontainer, the library that Docker uses to launch containers. According to Walsh, Docker will eventually need to adopt runC as part of its stack in order to be able to operate on other platforms, particularly Windows.
Using work from runC, Red Hat staff have created a patch set called "oci-hooks" that
adds a lot of the systemd-supporting functionality to Docker. It
makes use of a "hook" that can activate any executables found in a specific
directory between the time the container starts up and when the application is running. Among the things executed by this method is the RegisterMachine hook, which notifies systemd's machinectl on the host that the container is running. This lets users see all Docker containers, as well as runC containers, using the machinectl command:
# machinectl
MACHINE CLASS SERVICE
9a65036e4a6dc769d0e40fa80871f95a container docker
fd493b71a79c2b7913be54a1c9c77f1c container runc
2 machines listed.
The hooks also allow running systemd in non-privileged containers. This PR (17021) was also rejected by the Docker project. Nevertheless, it is being included in the Docker packages that are shipped by Red Hat. So part of the future of Docker and systemd may involve forking Docker.
Walsh also pointed out that cgroups, sd_notify, and socket activation all work out-of-the-box with runC. This is because runC does not use Docker's client-server model; it is just an executable. He does not see the breach between Docker Inc. and Red Hat over systemd healing over in the future. Walsh predicted that Red Hat would probably be moving more toward runC and away from the Docker daemon. According to him, Docker is working on "containerd", its new alternative to systemd, which will take over the functions of the init system.
Given the rapid changes in the Linux container ecosystem in the short time since the Docker project was launched, though, it is almost impossible to predict what the relationship between systemd, Docker, and runC will look like a year from now. Undoubtedly there will be plenty more changes and conflicts to report.
[ Josh Berkus works for Red Hat. ]
Security
The Glibc DNS resolution vulnerability
While the recently disclosed GNU C library (Glibc) DNS bug (CVE-2015-7547) is quite serious, one of the interesting aspects is that the real scope of the problem is not yet known. The ability to exploit this bog-standard buffer overflow is dependent on a number of other factors, so there are no known vectors for widespread code execution—publicly known, anyway. There are certainly millions of vulnerable systems out there, many of which are not likely to ever be patched, but it is truly unclear if that will lead to large numbers of compromised systems.
There are a number of obstacles in the way of an attacker wishing to exploit this bug. First off, a client application must call getaddrinfo() to resolve a domain name and use the AF_UNSPEC address family. That family indicates that either an IPv4 or IPv6 address is acceptable, which is the normal way that getaddrinfo() is called these days. Glibc then does two parallel queries for the A and AAAA records for the domain. It is the buffer handling in this parallel query step where things go awry.
Many systems are not configured to query a local caching nameserver; instead Glibc will make a query to the remote nameserver that was configured (or auto-configured by DHCP or the like) for the system. That means these two queries leave the system and, crucially, replies are received. Typically, DNS replies are short, but they can be as large as 64KB. Glibc allocates 2KB bytes on the stack for the reply, but it has provisions to increase that by allocating a heap buffer for replies that are larger. Unfortunately, if the query needs to be retried, the stack buffer gets used instead of the larger, newly allocated buffer, so roughly 62KB of attacker-controlled data can be written to the stack.
There are still more requirements to make all of this happen, though. Normally, UDP is used to do the query, which is typically limited to 512-byte replies, but a man-in-the-middle (MITM) attacker could send more data. But any server could set the "truncation bit" in the reply to cause the client to switch to TCP for its query. Causing the client to retry is evidently tricky, but can be done. The net result can be as bad as a bunch of attacker data on the stack, but even that may be difficult to turn into code execution due to address-space layout randomization (ASLR) and other defensive measures.
Ostensibly it would seem that an attacker could simply set up a DNS server for their domain that would send malicious responses (while causing retries), then "force" clients into looking up this compromised domain. But there are complications; most notably any caching resolvers between the attacker and victim will reject most or all of the malicious responses because they aren't well-formed. It is unclear, however, whether cache-surviving, malicious responses can be constructed.
In a detailed advisory, Glibc developer Carlos O'Donell of Red Hat indicated that the possibility exists:
Dan Kaminsky followed up on that in his own detailed analysis:
But Kaminsky goes on to posit that "some networks are going to be
vulnerable to some cache traversal attacks sometimes
", under the
theory that attacks only get better over time. The emphasis on the cache
is important. An MITM attacker does not need the malicious
responses to reside in intermediary caching resolvers (and an MITM can do
plenty of other malicious things), but others who might want to exploit
this flaw do need that. If a way is found to get these malicious
responses into caches, CVE-2015-7547 gets a whole lot worse.
The scope of programs affected by the vulnerability is rather surprising as well. As Kaminsky and others have noted, the problem affects many different programs, from sudo and httpd to gpg and ssh—and beyond. Languages like Python, Haskell, JavaScript, and others are also affected. Some of these "memory-safe" languages protect against buffer overflows in programs written in the language, but the runtimes for those languages use Glibc, so flaws at that level can still affect them. And plenty of programs look up domain names for a variety of reasons. As Kaminsky put it:
Clearly the best "mitigation" is to update affected systems, but that may not be possible in many cases. There are an enormous number of Glibc-using devices out there (e.g. some home routers) that rarely, if ever, get updates. Even if updates are released, getting them into the hands of users and onto the devices is decidedly non-trivial. That has some looking for other types of mitigation.
One that is often mentioned is limiting the size of DNS replies. If no reply is large enough to tickle the bug, then devices running the old code won't be affected. That still doesn't solve the MITM problem, but Kaminsky also argued that length-limiting will have other hard-to-diagnose effects, so it should be avoided. There is a reason that DNS has been engineered to allow for larger responses, so it is effectively too late to put that cat back in the bag.
Using a local caching resolver, rather than requiring Glibc to query the network, will also help in environments where that is possible. If cache-traversing responses eventually surface, they can be handled at that level. Both local and remote caching servers can be changed as we learn more over time. Kaminsky described it this way:
Some devices, Android devices in particular, use different C libraries, which are presumably not vulnerable to this particular flaw. There are undoubtedly other vulnerabilities in those, with unknown effects and scope—at least publicly. The bug in Glibc has existed for almost eight years (it was introduced in Glibc 2.9 in May 2008); it is hard to guess what else lurks there—or elsewhere.
It is refreshing to see a security vulnerability disclosed without a name, logo, animated GIF, and hype-ridden web page touting it. Instead we have the disclosure announcements along with some sober analysis of what it all might mean. That used to be the norm and, while it may be a little awkward to use "CVE-2015-7547" rather than some catchy name, it is a welcome change from the hoopla surrounding Heartbleed, GHOST, and others.
Brief items
Security quotes of the week
It’s decided to seek a precedent that would allow it to force every American company to create a backdoor for the Government to snoop on anyone it so pleases.
The logic is outrageous: “People got shot. So we need a backdoor into your phone.”
Passwords were changed under FBI orders that should not have been. San Bernardino officials did not avail themselves of common device management software that could have prevented this entire problem -- software of a sort that most responsible corporations and other organizations already use with company-owned smartphones in employee hands.
Kaminsky: A Skeleton Key of Unknown Strength
Dan Kaminsky looks at the Glibc DNS bug (CVE-2015-7547). "We’ve investigated the DNS lookup path, which requires the glibc exploit to survive traversing one of the millions of DNS caches dotted across the Internet. We’ve found that it is neither trivial to squeeze the glibc flaw through common name servers, nor is it trivial to prove such a feat is impossible. The vast majority of potentially affected systems require this attack path to function, and we just don’t know yet if it can. Our belief is that we’re likely to end up with attacks that work sometimes, and we’re probably going to end up hardening DNS caches against them with intent rather than accident. We’re likely not going to apply network level DNS length limits because that breaks things in catastrophic and hard to predict ways."
New vulnerabilities
chromium: code execution
| Package(s): | chromium-browser | CVE #(s): | CVE-2016-1628 | ||||||||||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||||||||||
| Description: | From the CVE entry:
pi.c in OpenJPEG, as used in PDFium in Google Chrome before 48.0.2564.109, does not validate a certain precision value, which allows remote attackers to execute arbitrary code or cause a denial of service (out-of-bounds read) via a crafted JPEG 2000 image in a PDF document, related to the opj_pi_next_rpcl, opj_pi_next_pcrl, and opj_pi_next_cprl functions. | ||||||||||||||
| Alerts: |
| ||||||||||||||
chromium: code execution
| Package(s): | chromium | CVE #(s): | CVE-2016-1629 | ||||||||||||||||||||||||||||||||||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entry:
Google Chrome before 48.0.2564.116 allows remote attackers to bypass the Blink Same Origin Policy and a sandbox protection mechanism via unspecified vectors. | ||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||
didiwiki: unintended access
| Package(s): | didiwiki | CVE #(s): | CVE-2013-7448 | ||||||||||||
| Created: | February 22, 2016 | Updated: | April 12, 2016 | ||||||||||||
| Description: | From the Debian advisory:
Alexander Izmailov discovered that didiwiki, a wiki implementation, failed to correctly validate user-supplied input, thus allowing a malicious user to access any part of the filesystem. | ||||||||||||||
| Alerts: |
| ||||||||||||||
ffmpeg: denial of service
| Package(s): | ffmpeg | CVE #(s): | CVE-2016-2329 | ||||||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||||||
| Description: | From the CVE entry:
libavcodec/tiff.c in FFmpeg before 2.8.6 does not properly validate RowsPerStrip values and YCbCr chrominance subsampling factors, which allows remote attackers to cause a denial of service (out-of-bounds array access) or possibly have unspecified other impact via a crafted TIFF file, related to the tiff_decode_tag and decode_frame functions. | ||||||||||
| Alerts: |
| ||||||||||
GraphicsMagick: out-of-bounds read flaw
| Package(s): | GraphicsMagick | CVE #(s): | CVE-2015-8808 | ||||||||||||||||||||||||||||||||||||
| Created: | February 24, 2016 | Updated: | February 24, 2016 | ||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
An out-of-bounds read flaw was found in the parsing of GIF files using GraphicsMagick. | ||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||
hamster-time-tracker: two denial of service flaws
| Package(s): | hamster-time-tracker | CVE #(s): | |||||||||
| Created: | February 18, 2016 | Updated: | February 25, 2016 | ||||||||
| Description: | The Red Hat bugzilla entries: 1 and 2 have some more information about two different crashes of the server processes. | ||||||||||
| Alerts: |
| ||||||||||
kernel: privilege escalation
| Package(s): | kernel | CVE #(s): | CVE-2016-1576 CVE-2016-1575 | ||||||||||||||||||||||||||||||||||||||||||||
| Created: | February 23, 2016 | Updated: | February 24, 2016 | ||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Ubuntu advisory:
halfdog discovered that OverlayFS, when mounting on top of a FUSE mount, incorrectly propagated file attributes, including setuid. A local unprivileged attacker could use this to gain privileges. (CVE-2016-1576) halfdog discovered that OverlayFS in the Linux kernel incorrectly propagated security sensitive extended attributes, such as POSIX ACLs. A local unprivileged attacker could use this to gain privileges. (CVE-2016-1575) | ||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||
libssh: insecure ssh sessions
| Package(s): | libssh | CVE #(s): | CVE-2016-0739 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | February 23, 2016 | Updated: | March 24, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian LTS advisory:
Aris Adamantiadis of the libssh team discovered that libssh, an SSH2 protocol implementation used by many applications, did not generate sufficiently long Diffie-Hellman secrets. This vulnerability could be exploited by an eavesdropper to decrypt and to intercept SSH sessions. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||
libssh2: insecure ssh sessions
| Package(s): | libssh2 | CVE #(s): | CVE-2016-0787 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | February 23, 2016 | Updated: | November 23, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian advisory:
Andreas Schneider reported that libssh2, a SSH2 client-side library, passes the number of bytes to a function that expects number of bits during the SSHv2 handshake when libssh2 is to get a suitable value for 'group order' in the Diffie-Hellman negotiation. This weakens significantly the handshake security, potentially allowing an eavesdropper with enough resources to decrypt or intercept SSH sessions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
libxmp: multiple vulnerabilities
| Package(s): | libxmp | CVE #(s): | |||||
| Created: | February 18, 2016 | Updated: | February 24, 2016 | ||||
| Description: | From the Mageia advisory:
The libxmp package has been updated to version 4.3.11, fixing several bugs, including possible crashes when loading corrupted input data. See the upstream changelog for details. | ||||||
| Alerts: |
| ||||||
mariadb: multiple vulnerabilities
| Package(s): | mariadb mysql | CVE #(s): | CVE-2015-4807 CVE-2016-0599 CVE-2016-0601 | ||||||||||||||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||||||||||||||
| Description: | From the CVE entries:
Unspecified vulnerability in Oracle MySQL Server 5.5.45 and earlier and 5.6.26 and earlier, when running on Windows, allows remote authenticated users to affect availability via unknown vectors related to Server : Query Cache. (CVE-2015-4807) Unspecified vulnerability in Oracle MySQL 5.7.9 allows remote authenticated users to affect availability via unknown vectors related to Optimizer. (CVE-2016-0599) Unspecified vulnerability in Oracle MySQL 5.7.9 allows remote authenticated users to affect availability via unknown vectors related to Partition. (CVE-2016-0601) | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
ntp: three vulnerabilities
| Package(s): | ntp | CVE #(s): | CVE-2015-7973 CVE-2015-7975 CVE-2015-7976 | ||||||||||||||||||||||||||||||||||||||||||||
| Created: | February 24, 2016 | Updated: | February 24, 2016 | ||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
It was found that when NTP is configured in broadcast mode, a man-in-the-middle attacker or a malicious client could replay packets received from the broadcast server to all (other) clients. This could cause the time on affected clients to become out of sync over a longer period of time. (CVE-2015-7973) It was found that ntpq did not implement a proper lenght check when calling nextvar(), which executes a memcpy(), on the name buffer. A remote attacker could potentially use this flaw to crash an ntpq client instance. (CVE-2015-7975) The ntpq saveconfig command does not do adequate filtering of special characters from the supplied filename. Note: the ability to use the saveconfig command is controlled by the 'restrict nomodify' directive, and the recommended default configuration is to disable this capability. If the ability to execute a 'saveconfig' is required, it can easily (and should) be limited and restricted to a known small number of IP addresses. (CVE-2015-7976) | ||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||
obs-service-download_files: code injection
| Package(s): | obs-service-download_files | CVE #(s): | |||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||
| Description: | From the openSUSE advisory:
Various code/parameter injection issues could have allowed malicious service definition to execute commands or make changes to the user's file system | ||||||
| Alerts: |
| ||||||
php-horde-horde: cross-site scripting
| Package(s): | php-horde-horde | CVE #(s): | CVE-2015-8807 CVE-2016-2228 | ||||||||||||||||
| Created: | February 22, 2016 | Updated: | February 29, 2016 | ||||||||||||||||
| Description: | From the Red Hat bugzilla:
An XSS vulnerability was found in _renderVarInput_number in Horde/Core/Ui/VarRenderer/Html.php, where input in numeric field wasn't properly escaped. (CVE-2015-8807). A cross-site scripting vulnerability was found in php-horde application framework. No input validation was put in place while searching via the menu bar. (CVE-2016-2228). | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
poco: SSL server spoofing
| Package(s): | poco | CVE #(s): | CVE-2014-0350 | ||||||||
| Created: | February 22, 2016 | Updated: | February 24, 2016 | ||||||||
| Description: | From the CVE entry:
The Poco::Net::X509Certificate::verify method in the NetSSL library in POCO C++ Libraries before 1.4.6p4 allows man-in-the-middle attackers to spoof SSL servers via crafted DNS PTR records that are requested during comparison of a server name to a wildcard domain name in an X.509 certificate. | ||||||||||
| Alerts: |
| ||||||||||
websvn: cross-site scripting
| Package(s): | websvn | CVE #(s): | CVE-2016-2511 | ||||||||||||||||
| Created: | February 24, 2016 | Updated: | March 21, 2016 | ||||||||||||||||
| Description: | From the Debian advisory:
Jakub Palaczynski discovered that websvn, a web viewer for Subversion repositories, does not correctly sanitize user-supplied input, which allows a remote user to run reflected cross-site scripting attacks. | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 4.5-rc5, released on February 20. "Things continue to look normal, and things have been fairly calm. Yes, the VM THP cleanup seems to still be problematic on s390, but other than that I don't see anything particularly worrisome."
Stable updates: 4.3.6 (the final 4.3.x update) and 3.10.97 were released on February 19. The 4.4.3, 3.14.62, and 3.10.98 updates are in the review process as of this writing; they can be expected on or after February 24.
Quotes of the week
Kernel development news
A BoF on kernel network performance
Whether one measures by attendance or by audience participation, one of the most popular sessions at the Netdev 1.1 conference in Seville, Spain was the network-performance birds-of-a-feather (BoF) session led by Jesper Brouer. The session was held in the largest conference room to a nearly packed house. Brouer and seven other presenters took the stage, taking turns presenting topics related to finding and removing bottlenecks in the kernel's packet-processing pipeline; on each topic, the audience weighed in with opinions and, often, proposed fixes.
The BoF was not designed to produce final solutions, but rather to encourage debate and discussion—hopefully fostering further work. Debate was certainly encouraged, to the point where Brouer was not able to get to every topic on the agenda before time had elapsed. But what was covered provides a good snapshot of where network-optimization efforts stand today.
DDoS mitigation
The first to speak was Gilberto Bertin from web-hosting provider CloudFlare. The company periodically encounters network bottlenecks on its Linux hosts, he said, with the most egregious being those caused by distributed denial-of-service (DDoS) attacks. Even a relatively small packet flood, such as two million UDP packets per second (2Mpps), will max out the kernel's packet-processing capabilities, saturating the receive queue faster than it can be emptied and causing the system to drop packets. 2Mpps is nowhere near the full 10G Ethernet wire speed of 14Mpps.
DDoS attacks are usually primitive, and an iptables drop rule targeting each source address should suffice, but CloudFlare has found it insufficient. Instead, the company is forced to offload traffic to a user-space packet handler. Bertin proposed two approaches to solving the problem: using Berkeley Packet Filter (BPF) programs shortly after packet ingress to parse incoming packets (dropping DDoS packets before they enter the receive queue), and using circular buffers to process incoming traffic (thus eliminating many memory allocations).
Brouer reported that he had tested several possible solutions himself, including using receive packet steering (RPS) and dedicating a CPU to handling the receive queue. Using RPS alone, he was able to handle 7Mpps in laboratory tests; by also binding a CPU, the number rose to 9Mpps. Audience members proposed several other approaches; Jesse Brandeburg suggested designating a queue for DDoS processing and steering other traffic away from it. Brouer discussed some tests he had run attempting to put drop rules as early as possible in the pipeline; none made a drastic difference in the throughput. When an audience member asked if BPF programs could be added to the network interface card's (NIC's) driver, David Miller suggested that running drop-only rules against the NIC's DMA buffer would be the fastest the kernel could possibly respond.
There was also a lengthy discussion about how to reduce the overhead caused by memory operations. Brouer reported that memcpy() calls accounted to as much as 43% of the time required to process a received packet. Jamal Hadi Salim asked whether sk_buff buffers could simply be recycled in a ring; Alexander Duyck replied that not all NIC drivers would support that approach. Ultimately, Brouer wrapped up the topic by saying there was no clear solution: latency hides in a number places in the pipeline, so reducing cache misses, using bulk memory allocation, and re-examining the entire allocation strategy on the receive side may be required.
Transmit powers
Brouer then presented the next topic, improving transmit performance. He noted that bulk transmission with the xmit_more API had solved the outgoing-traffic bottleneck, enabling the kernel to transmit packets at essentially full wire speed. But, he said, the "full wire speed" numbers are really achievable only in artificial workloads. For practical usage, it is hard to activate the bulk dequeuing discipline. Since the technique lowers CPU utilization, it would be beneficial to many users if it could be enabled well before one approaches the bandwidth limit.
He suggested several possible alternative means to activate xmit_more, including setting off a trigger whenever the hardware transmit queue gets full, tuning Byte Queue Limits (BQLs), and providing a user-space API to activate bulk sending. He had experimented some with the BQL idea, he reported: adjusting the BQL downward until the bulk queuing discipline kicks in resulted in a 64% increase in throughput.
Tom Herbert was not thrilled about that approach, noting that BQL was, by design, intended to be configuration-free; using it as a tunable feature seems like asking for trouble. John Fastabend asked if a busy driver could drop packets rather than queuing them, thus triggering the bulk discipline. Another audience member proposed adding an API through which the kernel could tell a NIC driver to split its queues. There was little agreement on approaches, although most in attendance seemed to feel that further discussion in this area was well warranted.
The trials of small devices
Next, Felix Fietkau of the OpenWrt project spoke, raising concerns that recent development efforts in the kernel networking space focused too much on optimizing behavior for high-end Intel-powered machines, at the risk of hurting performance on low-end devices like home routers and ARM systems. In particular, he pointed out that these smaller devices have significantly smaller data cache sizes, comparable instruction cache sizes but without smart pre-fetchers, and smaller cache-line sizes. Some of the recent optimizations, particularly cache-line optimizations, can hurt performance on small systems, he said.
He showed some benchmarks of kernel 4.4 running on a 720MHz Qualcomm QCA9558 system-on-chip. Base routing throughput was around 268Mbps; activating nf_conntrack_rtcache raised throughput to 360Mbps. Also removing iptable_mangle and iptable_raw increased throughput to 400Mbps. The takeaway, he said, was that removing or conditionally disabling unnecessary hooks (such as statistics-gathering hooks) was vital, as was eliminating redundant accesses to packet data.
Miller commented that the transactional overhead of the hooks in question was the real culprit, and asked whether or not many of the small devices in question would be a good fit for hardware offloading via the switchdev driver model. Fietkau replied that many of the devices do support offload, but that it is usually crippled in some fashion, such as not being configurable.
Fietkau also presented some out-of-tree hacks used to improve performance on small devices, including using lightweight socket buffers and using dirty pointer tricks to avoid invalidating the data cache.
Caching
Brouer then moved on to the topic of instruction-cache optimization. The network stack, he said, does a poor job of utilizing the instruction cache, since the typical cache size is shorter than the code used to process the average Ethernet packet. Furthermore, even though many packets appearing in the same time window get handled in the same manner, processing each packet individually means each packet hits the same instruction-cache misses.
He proposed several possible ways to better utilize the cache, starting with processing packets in bundles, enabling several to be processed simultaneously at each stage. NIC drivers could bundle received packets, he said, for more optimal processing. The polling routine already processes many packets at once, but it currently calls "the full stack" for each packet. And the driver can view all of the packets available in the receive ring, so it could simply treat them all as having arrived at the same time and process them in bulk. A side effect of this approach, he said, would be that it hides latency caused by cache misses.
A related issue, he said, is that the first cache miss often happens too soon for prefetching, in the eth_type_trans() function. By delaying the call to eth_type_trans() in the network stack's receive loop, the miss can be avoided. Even better, he said, would be to avoid calling eth_type_trans() altogether. The function is used to determine the packet's protocol ID, he said, which could also be determined from the hardware RX descriptor.
Brouer also proposed staging bundles of packets for processing at the generic receive offload (GRO) and RPS layers. GRO does this already, he said, though it could be further optimized. Implementing staged processing for RPS faces one hurdle in the fact that RPS takes cross-CPU locks for each packet. But Eric Dumazet pointed out that bulk enqueuing for remote CPUs should be easily doable. RPS already defers sending the inter-processor interrupt, which essentially amortizes the cost across multiple packets.
TC and other topics
Fastabend then spoke briefly (as time was running short) about the queuing discipline (qdisc) code path in the kernel's traffic control (TC) mechanism. Currently, qdisc takes six lock operations, even if the queue is empty and the packet is transmitted directly. He ran some benchmarks that showed that the locks account for 70–82% of the time spent in qdisc, and thus set out to re-implement qdisc in a lockless manner. He has posted an RFC implementation that reduces the lock count to two; the work is, therefore, not complete yet, but there are other items remaining on the to-do list. One is support for bulk dequeuing, the other is gathering some real-world numbers to determine if the performance improvement is as anticipated.
Brouer then gave a quick overview of the "packet-page" concept: at a very early point in the receive process, a packet could be extracted from the receive ring into a memory page, allowing it to be sent on an alternative processing path. "It's a crazy idea," he warned the crowd, but it has several potential use cases. First, it could be a point for kernel-bypass tools (such as the Data Plane Development Kit) to hook into. It could also allow the outgoing network interface to simply move the packet directly into the transmit ring, and it could be useful for virtualization (allowing guest operating systems to rapidly forward traffic on the same host). Currently, implementing packet-page requires hardware support (in particular, hardware that marks packet types in the RX descriptor), but Brouer reported that he has seen some substantial and encouraging results in his own experiments.
As the session time finally elapsed for good, Brouer also briefly addressed some ideas for reworking the memory-allocation strategy for received packets (as alluded to in the first mini-presentation of the BoF). One idea is to write a new allocator specific to the network receive stack. There are a number of allocations identified as introducing overhead, so there is plenty of room for improvement.
But other approaches are possible, too, he said. Perhaps using a DMA mapping would be preferable, thus avoiding all allocations. There are clear pitfalls, such as needing a full page for each packet and the overhead of clearing out enough headroom for inserting each sk_buff.
Finally, Brouer reminded the audience of just how far the kernel networking stack has come in recent years. In the past two years alone, he said, the kernel has moved from a maximum transmit throughput of 4Mpps to the full wire speed of 14.8Mpps. IPv4 forwarding speed has increased from 1Mpps to 2Mpps on single core machines (and even more on multi-core machines). The receive throughput started at 6.4Mpps and, with the latest experimental patches, now hovers around 12Mpps. Those numbers should be an encouragement; if the BoF attendees are anything to judge by, further performance gains are no doubt on the horizon still.
[The author would like to thank the Netconf and Netdev organizers for travel assistance to Seville.]
Sigreturn-oriented programming and its mitigation
In the good old days (from one point of view, at least), attackers had an easy life; all they had to do was to locate a buffer overrun vulnerability, then they could inject whatever code they liked into the vulnerable process. Over the years, kernel developers have worked to ensure that data that can be written by an application cannot be executed by that application; that has made simple code-injection unfeasible in most settings. Attackers have responded with techniques like return-oriented programming (ROP), but ROP attacks are relatively hard to get right. On some systems, attackers may be able to use the simpler sigreturn-oriented programming (SROP) technique instead; kernel patches have been circulating in an attempt to head off that class of attacks.
Some background
If data on the stack cannot be executed, a buffer overflow vulnerability cannot be used to inject code directly into an application. Such vulnerabilities can, however, be used to change the program counter by overwriting the current function's return address. If the attacker can identify code existing within the target process's address space that performs the desired task, they can use a buffer overflow to "return" to that code and gain control.
Unfortunately for attackers, most programs lack a convenient "give me a shell" location to jump to via an overwritten return address. But it is still likely that the program contains the desired functionality; it is just split into little pieces and scattered throughout the address space. The core idea behind return-oriented programming is to find these pieces in places where they are followed by a return instruction. The attacker, who controls the stack, can not only jump to the first of these pieces; they can also place a return address on the stack so that when this piece executes its return instruction, control goes to another attacker-chosen location — the next piece of useful code. By stringing together a set of these "gadgets," the attacker can create a new program within the target process.
There are various tools out there to help with the creation of ROP attacks. Scanners can pass through an executable image and identify gadgets of interest. "ROP compilers" can then create a program to accomplish the attacker's objective. But the necessary gadgets may not be available, and techniques like address-space layout randomization (ASLR) make ROP attacks harder. So ROP attacks tend to be fiddly affairs, often specific to the system being attacked (or even to the specific running process). Attackers, being busy people like the rest of us, cannot be blamed if they look for easier ways to compromise a system.
Exploiting sigreturn()
Enter sigreturn(), a Linux system call that nobody calls directly. When a signal is delivered to a process, execution jumps to the designated signal handler; when the handler is done, control returns to the location where execution was interrupted. Signals are a form of software interrupt, and all of the usual interrupt-like accounting must be dealt with. In particular, before the kernel can deliver a signal, it must make a note of the current execution context, including the values stored in all of the processor registers.
It would be possible to store this information in the kernel itself, but that might make it possible for an attacker (of a different variety) to cause the kernel to allocate arbitrary amounts of memory. So, instead, the kernel stores this information on the stack of the process that is the recipient of the signal. Prior to invoking the signal handler, the kernel pushes an (architecture-specific) variant of the sigcontext structure onto the process's stack; this structure contains register information, floating-point status, and more. When the signal handler has completed its job, it calls sigreturn(), which restores all that information from the on-stack structure.
Attackers employing ROP techniques have to work to find gadgets that will store the desired values into specific processor registers. If they can call sigreturn(), though, life gets easier, since that system call sets the values of all registers directly from the stack. As it happens, the kernel has no way to know whether a specific sigreturn() call comes from the termination of a legitimate signal handler or not; the whole system was designed so that the kernel would not have to track that information. So, as Erik Bosman and Herbert Bos noted in this paper [PDF], sigreturn() looks like it might be helpful to attackers.
There is one obstacle that must be overcome first, though: an attacker must find a ROP gadget that makes a call to sigreturn() — and few applications do that directly. One way to do that would be to locate a more generic gadget for invoking system calls, then arrange for the appropriate number to be passed to indicate sigreturn(). But in many cases that is unnecessary; for years, the kernel developers conveniently put a sigreturn() call in a place where attackers could find it — at a fixed address that is not subject to ASLR. That address is in the "virtual dynamic shared object" (vDSO) area, a page mapped by the kernel in a known location into every process to optimize some system calls. On other systems, the sigreturn() call can be found in the C library; exploiting that one requires finding a way to leak some ASLR information first.
Bosman and Bos demonstrated that sigreturn() can be used to exploit processes with a buffer overflow vulnerability. Often, the sigreturn() gadget is the only one that is required to make the exploit work; in some cases, the exploit can be written in a system-independent way, able to be reused with no additional effort. More recent kernels have made these exploits harder (the vDSO area is no longer usable, for example), but they are still far from impossible. And, in any case, many interesting targets are running older kernels.
Stopping SROP
Scott Bauer recently posted a patch set meant to put an end to SROP attacks. Once the problem is understood, the solution becomes clear relatively quickly: the kernel needs a way to verify that a sigcontext structure on the stack was put there by the kernel itself. That would ensure that sigreturn() can only be called at the end of a real signal delivery.
Scott's patch works by generating a random "cookie" value for each process. As part of the signal-delivery process, that cookie is stored onto the stack, next to the sigcontext structure. Prior to being stored, it is XORed with the address of the stack location where it is to be stored, making it a bit harder to read back; future plans call for hashing the value as well, making the recovery of the cookie value impossible. Even without hashing, though, the cookie should be secure enough; an attacker who can force a signal and read the cookie off the stack is probably already in control.
The sigreturn() implementation just needs to verify that the cookie exists in the expected location; if it's there, then the call is legitimate and the call can proceed. Otherwise the operation ends and a SIGSEGV signal is delivered to the process, killing it unless the process has made other arrangements.
There are some practical problems with the patch still; for example, it will not do the right thing in settings where checkpoint-restore in user space is in use (a restored process will have a new and different random cookie value, but old cookies may still be on the stack). Such problems can be worked around, but they may force the addition of a sysctl knob to turn this protection off in settings where it breaks things. It also does nothing to protect against ROP attacks in general, it just closes off one relatively easy-to-exploit form of those attacks. But, as low-hanging fruit, it is probably worth pursuing; there is no point in making an attacker's life easier.
DAX and fsync: the cost of forgoing page structures
DAX, the support library that can help Linux filesystems provide direct
access to persistent memory (PMEM), has seen
substantial ongoing development since we covered it nearly 18 months ago. Its main
goal is to bypass the page cache, allowing reads and writes to become
memory copies directly to and from the PMEM, and to support mapping that
PMEM directly into a process's address space with mmap().
Consequently, it was a little surprising to find that one of the challenges
in recent months was the correct implementation of fsync() and
related functions that are primarily responsible for synchronizing the page
cache with permanent storage.
While that primary responsibility of fsync() is obviated by
not caching any data in volatile memory, there is a secondary
responsibility that is just as important: ensuring that all writes that have
been sent to the device have landed safely and are not still in the
pipeline. For devices attached using SATA or SCSI, this involves sending (and
waiting for) a particular command; the Linux block layer provides the
blkdev_issue_flush() API (among a few others) for achieving
this. For PMEM we need something a little different.
There are actually two "flush" stages needed to ensure that CPU writes
have made it to persistent storage.
One stage is a very
close parallel to the commands sent by blkdev_issue_flush(). There
is a subtle distinction between PMEM "accepting" a write and
"committing" a write. If power fails between these events, data could
be lost. The necessary "flush" can be performed transparently by a
memory controller using Asynchronous
DRAM Refresh (ADR)
[PDF], or explicitly by the CPU using, for example,
the new x86_64 instruction PCOMMIT. This can be seen in the wmb_pmem() calls sprinkled
throughout the DAX and PMEM code in Linux; handling this stage is no
great burden.
The burden is imposed by the other requirement: the need to flush
CPU caches to ensure that the PMEM has "accepted" the writes. This
can be avoided by performing
"non-temporal
writes"
to bypass the
cache, but that cannot be ensured when the PMEM is mapped directly into
applications.
Currently, on x86_64 hardware, this requires explicitly flushing each cache
line that might be dirty by invoking the CLFLUSH (Cache Line Flush)
instruction or possibly a newer variant if available (CLFLUSHOPT, CLWB).
An easy approach, referred to in discussions as the "Big
Hammer", is to implement the blkdev_issue_flush() API by
calling CLFLUSH on every address of the entire persistent memory. While
CLFLUSH is not a particularly expensive operation, performing it over
potentially terabytes of memory was seen as worrisome.
The alternative is to keep track of which regions of memory might have
been written recently and to only flush those. This can be expected to
bring the amount of memory being flushed down from terabytes to gigabytes
at the very most, and hence to reduce run time by several orders of magnitude.
Keeping track of dirty memory is easy when the page cache is in use by
using a flag in struct page. Since DAX bypasses the
page cache, there are no page structures for most of PMEM,
so an alternative is needed. Finding that alternative was the focus of most
of the discussions and of the implementation of fsync() support
for DAX, culminating in patch sets posted by Ross Zwisler (original
and fix-ups)
that landed
upstream for 4.5-rc1.
Is it worth the effort?
There was a subthread running through the discussion that wondered
whether it might be best to avoid
the problem rather than fix it. A filesystem does not have to use
DAX simply because it is mounted from a PMEM device. It can selectively
choose to use DAX or not based on usage patterns or policy settings (and,
for example, would never use DAX on directories, as metadata
generally needs to be staged out to storage in a controlled fashion).
Normal page-cache access
could be the default and write-out to PMEM would use non-temporal writes.
DAX would only be enabled while a file is memory mapped with a new
MMAP_DAX flag. In that case, the application would be
explicitly requesting DAX access (probably using the nvml library) and it
would take on the responsibility of calling CLFLUSH as
appropriate. It is
even conceivable that future processors could make cache flushing for a
physical address range much more direct, so keeping track of addresses to
flush would become pointless.
Dan Williams championed this position putting his case quite succinctly:
DAX in my opinion is not a transparent accelerator of all existing apps, it's a targeted mechanism for applications ready to take advantage of byte addressable persistent memory.
He also expressed a concern that fsync() would end up being
painful for large amounts of data.
Dave Chinner didn't agree. He provided a demonstration suggesting that the proposed overheads needed for fsync() would be negligible. He asserted instead:
DAX is a method of allowing POSIX compliant applications get the best of both worlds - portability with existing storage and filesystems, yet with the speed and byte [addressablity] of persistent storage through the use of mmap.
Williams' position resurfaced from time to time as it became clear that
there were real and ongoing challenges in making fsync() work,
but he didn't seem able to rally much support.
Shape of the solution
In general, the solution chosen is to
still use the page cache data structures, but not to store struct page pointers in them. The page cache uses a radix tree that can store a pointer and a few
tags (single bits of extra information) at every page-aligned offset in a
file. The space reserved for the page pointer can be used for anything
else by setting the least significant bit to mark it as an exception.
For example, the tmpfs filesystem uses exception entries to keep track of
file pages that have been written out to swap.
Keeping track of dirty regions of a file can be done by allocating
entries in this radix tree, storing a blank exception entry in place of the
page pointer, and setting the PAGECACHE_TAG_DIRTY tag.
Finding all entries with a tag set is quite efficient, so flushing all the
cache lines in each dirty page to perform fsync() should be
quite straightforward.
As this solution was further explored, it was repeatedly found that some
of those fields in struct page really are useful, so an
alternative needed to be found.
Page size: PG_head
To flush "all the cache lines in each dirty page" you need to know how big the page is — it could be a regular page (4K on x86) or it could be a huge page (2M on x86). Huge pages are particularly important for PMEM, which is expected to sometimes be huge. If the filesystem creates files with the required alignment, DAX will automatically use huge pages to map them. There are even patches from Matthew Wilcox that aim to support the direct mapping for extra-huge 1GB pages — referred to as "PUD pages" after the Page Upper Directory level in the four-level page tables from which they are indexed.
With a struct page the PG_head flag can be
used to determine the page size. Without that, something else is needed.
Storing 512 entries in the radix tree for each huge page would be an
option, but not an elegant option. Instead, one bit in the otherwise
unused pointer field is used to flag a huge-page entry, which is also known as a
"PMD" entry because it is linked from the Page Middle Directory.
Locking: PG_locked
The page lock is central to handling concurrency within filesystems and
memory management. With no struct page there is no page lock.
One place where this has caused
a problem is in managing races between one thread trying to sync a page
and mark it as clean and another thread dirtying that page. Ideally, clean
pages should be removed from the radix tree completely as they are not
needed there, but attempts to do that have, so far, failed to avoid the race.
Jan Kara suggested
that another bit in the pointer field could be used as a bit-spin-lock,
effectively duplicating the functionality of PG_locked. That
seems a likely approach but it has not yet been attempted.
Physical memory address
Once we have enough information in the radix tree to reliably track
which pages are dirty and how big they are, we just need to know where each
page is in PMEM so it can be flushed. This information is generally of
little interest to common code so handling it is left up to the filesystem.
Filesystems will normally attach something to the struct page
using the private pointer. In filesystems that use the
buffer_head library, the private pointer links to
a buffer_head that contains a b_blocknr field
identifying the location of the stored data.
Without a struct page, the address needs to be found some
other way. There are a number of options, several of which have been
explored.
The filesystem could be asked to perform the lookup from file offset to
physical address using its internal indexing tables. This is an
indirect approach and may require the filesystem to reload some indexing
data from the PMEM (it wouldn't use direct-access for that). While the
first patch set used this approach, it did not survive long.
Alternately, the physical address could be stored in the radix tree when the page is marked as dirty; the physical address will already be available at that time as it is just about to be accessed for write. This leads to another question: exactly how is the physical address represented? We could use the address where the PMEM is mapped into the kernel address space, but that leads to awkward races when a PMEM device is disabled and unmapped. Instead, we could use a sector offset into the block device that represents the PMEM. That is what the current implementation does, but it implicitly assumes there is just one block device, or at least just one per file. For a filesystem that integrates volume management (as Btrfs does), this may not be the case.
Finally, we could use the page frame number (PFN), which is a
stable index that is assigned by the BIOS when the memory is discovered.
Wilcox has
patches to move in this direction, but the work is 70%
maybe 50%
done. Assuming that the PFN can be reliably
mapped to the kernel address that is needed for CLFLUSH, this seems
like the best solution.
Is this miniature struct page enough?
One way to look at this development is that a 64-bit miniature struct page has been created for the DAX use case to avoid the cost of a
full struct page. It currently contains a "huge page" flag
and a physical sector number. It may yet gain a lock bit and have a PFN in
place of the sector number. It seems prudent to ask if there is anything
else that might be needed before DAX functionality is complete.
As quoted above, Chinner appears to think that transparent support for full POSIX semantics should be the goal. He went on to opine that:
This is just another example of how yet another new-fangled storage technology maps precisely to a well known, long serving storage architecture that we already have many, many experts out there that know to build reliable, performant storage from... :)
Taking that position to its logical extreme would suggest that anything
that can be done in the existing storage architecture should work with PMEM
and DAX. One such item of functionality that springs to mind is
the pvmove
tool.
When a filesystem is built on an LVM2 volume, it is possible to use
pvmove to move some of the data from one device to another,
to balance the load, decommission old hardware, or start
using new hardware. Similar functionality could well be useful with
PMEM.
There would be a number of challenges to making this work with DAX, but
possibly the biggest would be tearing down memory mappings of a section of
the old memory before moving data across to the new. The Linux kernel has
some infrastructure for memory migration
that would be a perfect fit — if only the PMEM had a table of struct page as regular memory does. Without those page structures, moving
memory that is currently mapped becomes a much more
interesting task, though likely not an insurmountable one.
On the whole, it seems like DAX is showing a lot of promise but is still in its infancy. Currently, it can only be used on ext2, ext4, and XFS, and only where they are directly mounted on a PMEM device (i.e. there is no LVM support). Given the recent rate of change, it is unlikely to stay this way. Bugs will be fixed, performance will be improved, coverage and features will likely be added. When inexpensive persistent memory starts appearing on our motherboards it seems that Linux will be ready to make good use of it.
Patches and updates
Kernel trees
Architecture-specific
Build system
Core kernel code
Development tools
Device drivers
Device driver infrastructure
Filesystems and block I/O
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
The end of the Iceweasel Age
For roughly the past decade, Debian has shipped the Mozilla desktop applications (Firefox, Thunderbird, and Seamonkey) in a rebranded form that replaces the original, trademarked names and logos with alternatives (Iceweasel, Icedove, and Iceape). Originally, this effort was undertaken to work around incompatibilities between the Debian Free Software Guidelines (DFSG), the Mozilla trademark-usage policy, and the licenses of the Mozilla logos. But times—and policy wordings—change, and Debian now seems poised to resume calling its packages by the original, upstream Mozilla names.
It is important to understand that, despite the similarities in name, Debian's Iceweasel is not in the same category as GNU IceCat, which is an actual fork of the code. Iceweasel consists of binaries rebuilt by Debian with only minimal alterations—most obviously to remove the Mozilla branding, but other functional changes as well (such as using system libraries and hooking into the Debian package manager).
The rebranding issue originated in 2004. At that time, the Mozilla trademark policy only permitted usage of the Firefox logo on downstream packages that adhered to a set of strict "Distribution Partners" guidelines that prohibited changing the search engines, extensions, directory structure, and other details—clearly making the Distribution Partner rules (and the less stringent "Community Edition" rules) incompatible with the DFSG.
Confusingly enough, the Community Edition rules would have allowed Debian to use the name "Firefox" but not to use the name "Mozilla Firefox" nor to use the Firefox logo. Yet another wrinkle for DFSG compliance was that the actual graphics files for the logo, as the FAQ page explained, were distributed under non-free license terms (prohibiting modification) anyhow. Furthermore, and perhaps even most problematic, the policy required redistributors to seek Mozilla's approval for any other modifications to the package. And Debian's Firefox packagers needed to make modifications, starting with rather fundamental necessities like integrating with the distribution's package manager, rather than using Firefox's built-in updater.
It was proposed that Mozilla could grant a trademark license to Debian, outside of the generic, public trademark policy, but Debian Project Leader (DPL) Branden Robinson contended that such an agreement would run afoul of section eight of the DFSG, which prohibits licensing agreements that are specific to the Debian project and, thus, are not transferred automatically to Debian users. After considerable debate, bug #354622 was opened in February 2006 by Mozilla's Mike Connor, and the Iceweasel name change was implemented to close it.
Re-discussion
It is now 2016, however, and most users or developers could be
forgiven for forgetting that Mozilla ever had "Distribution" and
"Community" partner programs,
much less what all of the details were. The Mozilla trademark
guidelines have morphed considerably over the years and, in
particular, they have become far more open. The logos and product names
are no longer subject to separate terms, and the current guidelines
only state that "making significant functional changes
"
prohibits a downstream project from using the Mozilla trademarks.
On February 17, Mozilla's Sylvestre Ledru opened bug
#815006, stating that "the various issues mentioned in bug
#354622 have been now tackled
" and including a patch that
renames the packaged version of Iceweasel to Firefox. It is not
entirely clear whether the original logos will return as well,
although now that they are available under the same terms as the name
trademarks, it seems like a possibility. Ledru's initial report
includes a recap of recent discussions between Mozilla and Debian. Of
particular note is the assessment by Mozilla of Debian's modifications
to the code:
More generally, Mozilla trusts the Debian packagers to use their best judgment to achieve the same quality as the official Firefox binaries.
In case of derivatives of Debian, Firefox branding can be used as long as the patches applied are in the same category as described above. Ubuntu having a different packaging, this does not apply to that distribution.
Furthermore, Ledru notes that Debian has adopted a new approach to backporting security patches. In the past, one of the key non-branding modifications Debian made to the Mozilla applications was backporting recent security fixes. This was necessary because Debian's stable releases remain supported for a lengthy period of time (two years), far longer than Firefox, which is now updated every six to eight weeks. It might seem like security patches would be uncontroversial, given the benefit to users, but Mozilla objected to them quite early on in the Iceweasel debate.
Now, however, Mozilla has implemented its Extended Support Release (ESR) program, which makes maintaining an old release simpler for both Mozilla and Debian. First, Debian has committed to providing security fixes for the ESR releases of Firefox, not to every Firefox release. In addition, once the ESR release initially shipped with a Debian "stable" release is no longer provided with security updates from Mozilla, Debian updates the package to the next ESR release.
In essence, then, the logo-licensing problem, the trademark-usage incompatibility, and the patch-maintenance problem have all been resolved, so, Ledru said, Debian could return to the Firefox branding.
Except that not everyone in the Debian project was easily convinced
that the trademark issue was resolved. For instance, Paul
Wise asked for clarification about how
the new trademark-usage guidelines
meshed with section eight of the DFSG. " Stefano Zacchiroli replied,
however, that there is no formal or contractual
arrangement; in other words, Mozilla is not granting a trademark
license to Debian. Instead, Mozilla is acknowledging that the patches
and other work that have gone into the Debian packages over the past
ten years do not violate the trademark policy. Connor concurred, adding:
Perhaps it feels strange to have a dilemma that Debian was forced
into by the specifics of policy documents and project governance guidelines be resolved
by such a seemingly informal statement. But it is important to
remember that Mozilla's casual-sounding blessing of Debian's Firefox
modifications is not the only change to have taken place. The Mozilla
trademark policy and logo-usage guidelines have evolved considerably
since 2006, and the ESR program has changed the face of long-term
maintenance not just for Debian, but for many other users as well.
The plan, as it stands presently, is for the Iceweasel package to
be renamed Firefox in the Debian 9 "stretch" release (slated for
an early 2017 release). For simplicity in package maintenance, the
Iceweasel package in the current stable release (Debian 8
"jessie") will not be renamed.
Similar changes should be expected for Icedove and Iceape, although those
discussions are still underway with the Debian package maintainers.
Mozilla's trademark policy isn't clear about how much modification
requires Mozilla's written consent
", he noted. If Debian, in
order to use the trademarks in a manner different from the public policy, was being granted
special permission from Mozilla, that would constitute a licensing
agreement that Debian could not pass on to downstream users.
Brief items
Distribution quote of the week
Open source Zephyr Project aims to deliver an RTOS
The Linux Foundation has announced the Zephyr Project, which is aimed at building a real-time operating system (RTOS) for the Internet of Things (IoT). "Modularity and security are key considerations when building systems for embedded IoT devices. The Zephyr Project prioritizes these features by providing the freedom to use the RTOS as is or to tailor a solution. The project’s focus on security includes plans for a dedicated security working group and a delegated security maintainer. Broad communications and networking support is also addressed and will initially include Bluetooth, Bluetooth Low Energy and IEEE 802.15.4, with plans to expand communications and networking support over time." The Zephyr Kernel v1.0.0 Release Notes provide more details.
Linux Mint downloads (briefly) compromised
The Linux Mint blog announces that the project's web site was compromised and made to point to a backdoored version of the distribution. "As far as we know, the only compromised edition was Linux Mint 17.3 Cinnamon edition. If you downloaded another release or another edition, this does not affect you. If you downloaded via torrents or via a direct HTTP link, this doesn’t affect you either. Finally, the situation happened today, so it should only impact people who downloaded this edition on February 20th."
Update: it appears that the Linux Mint forums were compromised too; users should assume that their passwords have been exposed.
FreedomBox 0.8 Released
FreedomBox 0.8 has been released. New images have not been created for this release. It is available in Debian unstable as two packages, freedombox-setup 0.8 and plinth 0.8.1-1. Quassel, an IRC client that stays connected to IRC networks and can synchronize multiple frontends, has been added and the first boot user interface has been improved.Ubuntu 14.04.4 LTS released
The fourth point release of Ubuntu 14.04 LTS is available for its Desktop, Server, Cloud, and Core products, as well as other flavors of Ubuntu with long-term support. "We have expanded our hardware enablement offering since 12.04, and with 14.04.4, this point release contains an updated kernel and X stack for new installations to support new hardware across all our supported architectures, not just x86."
Newsletters and articles of interest
Distribution newsletters
- Debian Project News (February 18)
- DistroWatch Weekly, Issue 649 (February 22)
- 5 things in Fedora (February 19)
- Tails report (January)
- Ubuntu Kernel Team newsletter (February 16)
- Ubuntu Kernel Team newsletter (February 23)
- Ubuntu Weekly Newsletter, Issue 455 (February 21)
Subgraph OS Wants to Make Using a Secure Operating System Less of a Headache (Motherboard)
Motherboard takes a look at Subgraph OS. "In my tests, Subgraph OS worked fine out of the box, aside from some bugs that [Subgraph president David Mirza Ahmad] pointed out and provided workarounds for (the project is still in a pre-alpha stage). Those fixes required some use of the Linux command line, and users will probably need some experience of using a terminal to get the most out of their system. In sum, Subgraph OS appears easier to get to grips with than other secure options, but likely still requires a learning curve for users switching from, say, Windows or OSX for the first time. I ran Subgraph OS in virtual machines with 2GB and 4GB RAM."
Page editor: Rebecca Sobol
Development
Rethinking the OpenStack development cycle
The OpenStack cloud-management system project is a relative newcomer, having first been announced in mid-2010. It has grown quickly since then, and is now a core part of many commercial offerings. That growth has inevitably led to some growing pains, though. Recent discussions on a pair of proposals — one rather more official than the other — shine some light into where those pains are being felt and how the project might evolve to address them.
Stabilization cycles
Back in January, Flavio Percoco (a member of the OpenStack technical committee) posted a proposal for the addition of "stabilization cycles" to the OpenStack development process. OpenStack's process uses a six-month cycle; its many sub-projects are expected to coordinate their major releases around these cycles. The OpenStack "Design Summits" are scheduled for the beginning of each cycle. Each of these cycles brings a whole set of new features and, naturally, new bugs.
What Flavio was proposing, after a discussion in the technical committee, was that, occasionally, one of these cycles could be designated for "stabilization" changes only. The proposal included a fair amount of flexibility, in that "stabilization" could include work like refactoring and code cleanups. The period allotted for this work could be a full six-month cycle, or one of the "milestone" periods designated within a cycle. Stabilization cycles, Flavio said, could bring a number of benefits, including more bugs fixed, a reduction in the review backlog, and the ability to focus on larger features that require more than one cycle to implement.
Much of the discussion focused on the potential costs of stabilization cycles. One cost that a number of participants found surprising was that, it seems, a number of companies give bonuses to developers when they get features merged into the OpenStack mainline; a stabilization cycle would thus cost those developers money. There seems to be fairly widespread agreement that this kind of compensation model runs counter to the interests of an open-source project in general, but that doesn't change the fact that this model appears to be in use at some companies.
The bigger problem, though, is that, over the years, it has been shown that stabilization cycles tend not to work well in large, fast-moving projects. The kernel used to work in that model; a look at the 2.4 cycle gives a good example of how these things can go wrong. The refusal to accept features into the mainline does not stop the development of those features or magically create more resources for the fixing of bugs. Companies continue to develop the features that they want; they will then either try to sneak them in as "bug fixes" or simply ship them without bothering to merge them upstream first. The results tend to be less stability and more fragmentation in deployed versions of the code.
Based on this experience, James Bottomley recommended against the idea of stabilization cycles, suggesting instead that the OpenStack cycle should be reworked to look more like how the kernel does things:
To do this, he said, OpenStack would need to establish something like the linux-next tree for early integration and testing of new code. He also suggested that the design summit should be moved toward the middle of the development cycle to facilitate discussion of work that is aimed at the next merge window.
The discussion on this proposal eventually wound down without any firm conclusions beyond a sense that there was little interest in the establishment of project-wide stabilization cycles. Thierry Carrez (OpenStack's release manager and chair of the technical committee) suggested that the most important thing would be to communicate to the sub-projects that any of them could impose their own stabilization cycles if that seemed appropriate to them.
Reclaiming the design summit
James's suggestions for a reworked development cycle seem unlikely to be taken up by the project anytime soon, but one specific idea — changing the timing of the design summit — came back in a modified form in late February. The "design summit" event, which started as a small group of developers getting work done, has, over time, been overwhelmed by the OpenStack Summit, a rather larger, co-located event with a disturbingly high necktie-to-T-shirt ratio. That has led to developers feeling that the original purpose of the event has been lost; as Jay Pipes (another technical committee member) put it:
Jay suggested that the design summit should be split off from the main conference so that the developers could gain a respite from the suits at a more focused, less glitzy, and less expensive gathering.
It turns out that Thierry had been working on just this type of idea; he
posted his proposal on February 22.
The plan is to split the OpenStack Summit into two separate events. The
first would be a technical event "held in a simpler, scaled-back
setting
" aimed at getting work done. This gathering would happen in
relatively inexpensive locations, in the hope that companies would be more
willing to send more of their developers.
The second event, instead, would be "the main downstream business
conference, with high-end keynotes, marketplace and breakout
sessions
". It would, presumably, remain in relatively fancy,
suit-friendly locations. In addition to serving the needs of the business
community, it would be intended to serve as a location where feedback on
releases could be gathered, along with requests and requirements for future
releases.
The new, currently unnamed developer conference would be held toward the end of the development cycle, a couple of weeks before the release happens. That would allow the discussion of work that is planned for the next cycle, and on any last-minute release problems as well. The business event, instead, would move to the middle of the development cycle. At that point, the previous release will have been around for long enough to find its way into products and for companies to have learned about its good and bad points. The next cycle, meanwhile, is far enough away that feedback from the conference can still be incorporated. The new scheme would be phased in next year, with the first "contributors event" happening in February 2017. Thierry provided a timeline diagram [PDF] to illustrate how it would work.
Response to the proposal has been mostly positive, though Michael Krotscheck worried that it heralded the beginning of the end for the design summit. Sales and marketing is where the money is, he said, and a conference that excluded them would not do well in the corporate priority-setting process. Another potential concern, raised in the previous discussion, is that the ability to meet the developers is one of the selling points of the main conference. If the developers are no longer there, that conference, too, will suffer.
In the end, though, a development conference needs to be a relatively small and focused affair if it is to be a place where a lot of work gets done. The proposed event split might just make that possible, though the size of the project now ensures that a gathering of its developers can only be so small. In general, the problems faced by OpenStack are the kind of problems that many other projects can only hope for. Success tends to force changes; we have probably only begun to see the ways in which OpenStack will need to change to remain successful in the coming years.
Brief items
Quotes of the week
This lowers the number of people contacting website maintainers with typeface complaints bordering on harassment.
Ardour 4.7 released
Version 4.7 of the Ardour digital-audio workstation has been released. The update includes two key new features: a dialog that displays detailed spectral and waveform analysis for exported files, and substantially improved support for Mackie Control brand hardware control consoles. Many other improvements are listed in the announcement, including preliminary support for importing work from ProTools 10 and 11.
GNU C Library 2.23 released
Version 2.23 of the GNU C Library (glibc) has been released. The headline feature this time around seems to be Unicode 8.0.0 support; there are a number of API changes, performance improvements and security fixes as well.Libinput 1.2 released
Version 1.2.0 of the libinput library is now available.
New features include support for three-finger "pinch" gestures and the
ability to independently toggle support for tap-and-drag and tapping
in general. Also noteworthy is that the motion hysteresis feature is
now disabled by default. "This provides
smoother motion especially on small to tiny motions, making single-pixel
elements much easier to target.
On some devices, especially older touchpads the hysteresis may be required.
We've enabled a bunch of those already, if you notice the pointer wobbling
when hold the finger still, please file a bug so we can fix this.
"
Newsletters and articles
Development newsletters from the past week
- What's cooking in git.git (February 22)
- State of the Haskell Ecosystem (February 2016 edition)
- LLVM Weekly (February 22)
- OCaml Weekly News (February 23)
- OpenStack Developer Digest (February 20)
- Perl Weekly (February 22)
- PostgreSQL Weekly News (February 21)
- Python Weekly (February 18)
- Ruby Weekly (February 18)
- This Week in Rust (February 22)
- Tahoe-LAFS Weekly News (February 23)
- Tor Weekly News (February 22)
- Wikimedia Tech News (February 22)
Upcoming features in GCC 6
The Red Hat developer blog looks at what's coming in version 6 of the GNU Compiler Collection. "The x86/x86_64 is a segmented memory architecture, yet GCC has largely ignored this aspect of the Intel architecture and relied on implicit segment registers. Low level code such as the Linux kernel & glibc often have to be aware of the segmented architecture and have traditionally resorted to asm statements to use explicit segment registers for memory accesses. Starting with GCC 6, variables may be declared as being relative to a particular segment. Explicit segment registers will then be used to access those variables in memory." The GCC 6 release can be expected sometime around April.
Qt Roadmap for 2016
At the Qt blog, Tuukka Turunen sets
out the project's roadmap for the coming year. Three releases are
currently scheduled: Qt 5.6 in March, Qt 5.7 in May, and Qt 5.8 in
October. Of note, the 5.6 release will be designated a Long Term
Support (LTS) release: "As part of our LTS promise, we guarantee
that Qt 5.6 will be supported for three years via standard support,
after which additional extended support can be purchased. During this
time period, even though following Qt releases (Qt 5.7, Qt 5.8 and so
on) are available, Qt 5.6 will receive patch releases providing bug
fixes and security updates throughout the three-year period after the
release.
" New features on the roadmap include high-DPI
support, C++11 support in Qt modules, and dropping LGPLv2.1 as a
licensing option (in favor of LGPLv3).
Page editor: Nathan Willis
Announcements
Brief items
OSI annual report
The Open Source Initiative has published its annual report [PDF] for 2015. "In 2015 we bid fond-farewell to long time President Simon Phipps and welcomed Allison Randal as the new OSI Board President. Simon, who was first elected to the Board in 2010 and became President in 2012, opted not to run for President in his final year on the Board in order to help transition the new President. Simon's presidency will be remembered for his work to transform the OSI into a member-led organization, giving voice to our individual and affiliate members in Board elections and opening up opportunities for direct participation through Working Groups and Incubator Projects." (Thanks to Martin Michlmayr)
The new Board of Directors of The Document Foundation
The Document Foundation has announced its new Board of Directors. "Elected as directors are, in order of votes: Marina Latini (Studio Storti), Michael Meeks (Collabora), Thorsten Behrens (CIB), Jan Holesovsky (Collabora), Osvaldo Gervasi (independent), Simon Phipps (independent) and Eike Rathke (Red Hat). Elected as deputies are, in order of votes: Norbert Thiebaud (independent), Bjoern Michaelsen (Canonical) and Andreas Mantke (independent). The board has elected Marina Latini as Chairwoman and Michael Meeks as Deputy Chairman."
Articles of interest
Kirkland: ZFS licensing and Linux
Dustin Kirkland justifies Ubuntu's plans to ship the ZFS filesystem kernel module. "And zfs.ko, as a self-contained file system module, is clearly not a derivative work of the Linux kernel but rather quite obviously a derivative work of OpenZFS and OpenSolaris. Equivalent exceptions have existed for many years, for various other stand alone, self-contained, non-GPL and even proprietary (hi, nvidia.ko) kernel modules."
Calls for Presentations
EuroPython 2016: Call for Proposals
EuroPython will take place July 17-24 in Bilbao, Spain. The call for proposals closes March 6. "We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. EuroPython is a community conference and we are eager to hear about your experience." Early-bird ticket sales are open. Regular sales will begin after the early-bird tickets sell out.
CfP: MiniDebConf Vienna
There will be a MiniDebConf at Linuxwochen Wien in Vienna, Austria. MiniDebCamp will be held April 28-29, followed by the mini-conference April 30-May 1. The call for proposals closes March 15.Flock 2016 update
Registration is open for Flock to Fedora, which will be held August 2-5 in Krakow, Poland. The call for submissions for talks and workshops is open until April 8.openSUSE Conference returns to Nuremberg
The openSUSE Conference will take place June 22-26 in Nuremberg, Germany. See the announcement for more details. The call for papers closes April 15.CFP Deadlines: February 25, 2016 to April 25, 2016
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
| Deadline | Event Dates | Event | Location |
|---|---|---|---|
| February 28 | April 6 | PostgreSQL and PostGIS, Session #8 | Lyon, France |
| February 28 | May 10 May 12 |
Samba eXPerience 2016 | Berlin, Germany |
| February 28 | April 18 April 19 |
Linux Storage, Filesystem & Memory Management Summit | Raleigh, NC, USA |
| February 28 | June 21 June 22 |
Deutsche OpenStack Tage | Köln, Deutschland |
| February 28 | June 24 June 25 |
Hong Kong Open Source Conference 2016 | Hong Kong, Hong Kong |
| March 1 | April 23 | DevCrowd 2016 | Szczecin, Poland |
| March 6 | July 17 July 24 |
EuroPython 2016 | Bilbao, Spain |
| March 9 | June 1 June 2 |
Apache MesosCon | Denver, CO, USA |
| March 10 | May 14 May 15 |
Open Source Conference Albania | Tirana, Albania |
| March 12 | April 26 | Open Source Day 2016 | Warsaw, Poland |
| March 15 | April 28 May 1 |
Mini-DebCamp & DebConf | Vienna, Austria |
| March 20 | April 28 April 30 |
Linuxwochen Wien 2016 | Vienna, Austria |
| March 25 | July 11 July 17 |
SciPy 2016 | Austin, TX, USA |
| April 1 | May 26 | NLUUG - Spring conference 2016 | Bunnik, The Netherlands |
| April 2 | May 2 May 3 |
PyCon Israel 2016 | Tel Aviv, Israel |
| April 7 | April 8 April 10 |
mini Linux Audio Conference 2016 | Berlin, Germany |
| April 8 | August 2 August 5 |
Flock to Fedora | Krakow, Poland |
| April 15 | June 27 July 1 |
12th Netfilter Workshop | Amsterdam, Netherlands |
| April 15 | June 22 June 26 |
openSUSE Conference 2016 | Nürnberg, Germany |
| April 24 | August 20 August 21 |
Conference for Open Source Coders, Users and Promoters | Taipei, Taiwan |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Get ready to Fork the System at LibrePlanet
Registration is open for LibrePlanet, which will be held March 19-20 in Cambridge, MA. "This year's conference program will examine how free software creates the opportunity of a new path for its users, allows developers to fight the restrictions of a system dominated by proprietary software by creating free replacements, and is the foundation of a philosophy of freedom, sharing, and change. Sessions like "Yes, the FCC might ban your operating system" and "GNU/Linux and Chill: Free software on a college campus" will offer insights about how to resist the dominance of proprietary software, which is often built in to university policies and government regulations."
Power Management and Energy-Awareness Microconference
The Power Management and Energy-Awareness Microconference has been accepted into the 2016 Linux Plumbers Conference, which will be held November 2-4 in Santa Fe, NM. "This microconference will look at elimination of timers from cpufreq governors, unifying idle management in SoCs with power resources shared between CPUs and I/O devices, load balancing utilizing workload consolidation and/or platform energy models to improve energy-efficiency and/or performance, improving CPU frequency selection efficiency by utilizing information provided by the scheduler in intel_pstate and cpufreq governors, idle injection (CFS-based vs. play idle with kthreads), ACPI compliance tests (from the power management perspective) and more."
Events: February 25, 2016 to April 25, 2016
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| February 24 February 25 |
AGL Member's Meeting | Tokyo, Japan |
| February 27 | Open Source Days | Copenhagen, Denmark |
| March 1 | Icinga Camp Berlin | Berlin, Germany |
| March 1 March 6 |
Internet Freedom Festival | Valencia, Spain |
| March 8 March 10 |
Fluent 2016 | San Francisco, CA, USA |
| March 9 March 11 |
18th German Perl Workshop | Nürnberg, Germany |
| March 10 March 12 |
Studencki Festiwal Informatyczny (Students' Computer Science Festival) | Cracow, Poland |
| March 11 March 13 |
PyCon SK 2016 | Bratislava, Slovakia |
| March 11 March 13 |
Zimowisko Linuksowe TLUG | Puck, Poland |
| March 14 March 17 |
Open Networking Summit | Santa Clara, CA, USA |
| March 14 March 18 |
CeBIT 2016 Open Source Forum | Hannover, Germany |
| March 16 March 17 |
Great Wide Open | Atlanta, GA, USA |
| March 18 March 20 |
FOSSASIA 2016 Singapore | Singapore, Singapore |
| March 19 March 20 |
Chemnitzer Linux Tage 2016 | Chemnitz, Germany |
| March 19 March 20 |
LibrePlanet | Boston, MA, USA |
| March 23 | Make Open Source Software 2016 | Bucharest, Romania |
| March 29 March 31 |
Collaboration Summit | Lake Tahoe, CA, USA |
| April 1 | DevOps Italia | Bologna, Italy |
| April 4 April 8 |
OpenFabrics Alliance Workshop | Monterey, CA, USA |
| April 4 April 6 |
Web Audio Conference | Atlanta, GA, USA |
| April 4 April 6 |
Embedded Linux Conference | San Diego, CA, USA |
| April 4 April 6 |
OpenIoT Summit | San Diego, CA, USA |
| April 5 April 7 |
Lustre User Group 2016 | Portland, OR, USA |
| April 6 | PostgreSQL and PostGIS, Session #8 | Lyon, France |
| April 7 April 8 |
SRECon16 | Santa Clara, CA, USA |
| April 8 April 10 |
mini Linux Audio Conference 2016 | Berlin, Germany |
| April 9 April 10 |
OSS Weekend | Bratislava, Slovakia |
| April 11 April 13 |
O’Reilly Software Architecture Conference | New York, NY, USA |
| April 15 April 18 |
Libre Graphics Meeting | London, UK |
| April 15 April 17 |
PyCon Italia Sette | Firenze, Italia |
| April 15 April 17 |
Akademy-es 2016 | Madrid, Spain |
| April 16 | 15. Augsburger Linux Info Tag | Augsburg, Germany |
| April 18 April 19 |
Linux Storage, Filesystem & Memory Management Summit | Raleigh, NC, USA |
| April 18 April 20 |
PostgreSQL Conference US 2016 | New York, NY, USA |
| April 20 April 21 |
Vault 2016 | Raleigh, NC, USA |
| April 21 April 24 |
GNOME.Asia Summit | Delhi, India |
| April 23 | DevCrowd 2016 | Szczecin, Poland |
| April 23 April 24 |
LinuxFest Northwest | Bellingham, WA, USA |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol
