|
|
Log in / Subscribe / Register

LWN.net Weekly Edition for February 25, 2016

Trouble at Linux Mint — and beyond

By Jonathan Corbet
February 24, 2016
When the Linux Mint project announced that, for a while on February 20, its web site had been changed to point to a backdoored version of its distribution, the open-source community took notice. Everything we have done is based on the ability to obtain and install software from the net; this incident was a reminder that this act is not necessarily as safe as we would like to think. We would be well advised to think for a bit on the implications of this attack and how we might prevent similar attacks in the future.

It would appear that the attackers were able to take advantage of a WordPress vulnerability on the Linux Mint site to obtain a shell; from there, they were able to change the web site's download page to point to their special version of the Linux Mint 17.3 Cinnamon edition. It also appears that the Linux Mint site was put back on the net without being fully secured; the attackers managed to compromise the site again on the 21st, restoring the link to the corrupted download. Anybody who downloaded this distribution anywhere near those two days will want to have a close look at what they got.

The Linux Mint developers have taken a certain amount of grief for this episode, and for their approach to security in general. They do not bother with security advisories, so their users have no way to know if they are affected by any specific vulnerability or whether Linux Mint has made a fixed package available. Putting the web site back online without having fully secured it mirrors a less-than-thorough approach to security in general. These are charges that anybody considering using Linux Mint should think hard about. Putting somebody's software onto your system places the source in a position of great trust; one has to hope that they are able to live up to that trust.

It could be argued that we are approaching the end of the era of amateur distributions. Taking an existing distribution, replacing the artwork, adding some special new packages, and creating a web site is a fair amount of work. Making a truly cohesive product out of that distribution and keeping the whole thing secure is quite a bit more work. It's not that hard to believe that only the largest and best-funded projects will be able to sustain that effort over time, especially when faced with an increasingly hostile and criminal net.

There is just one little problem with that view: it's not entirely clear that the larger, better-funded distributions are truly doing a better job with security. It probably is true that they are better able to defend their infrastructure against attacks, have hardware security modules to sign their packages, etc. But a distribution is a large collection of software, and few distributors can be said to be doing a good job of keeping all of that software secure.

So, for example, we have recently seen this article on insecure WordPress packages in Debian, and this posting on WebKit security in almost all distributions. Both are heavily used pieces of software that are directly exposed to attackers on the net; one would hope that distributors would be focused on keeping them secure. But that has not happened; the projects and companies involved simply have not found the resources to stay on top of the security of these packages.

It is not hard to see how this could be a widespread problem. When users evaluate distributions, the range of available packages tends to be an important criterion. Distributors thus have an incentive to include as many packages as they can — far more than they can support at a high level. The one exception might be the enterprise distributions which, one would hope, would be more conservative in the packages they choose to provide for their customers. But such distributions tend to ship old software (which can have problems of its own) and are often accompanied by add-on repositories filling in the gaps — and possibly introducing security problems of their own.

The situation is seemingly getting murkier rather than better. Some projects try to get users to install their software directly rather than use the distribution's packages. That might lead to better support for that one package, but it adds another moving part to the mix and shorts out all of the mechanisms put in place to get security updates to users. Language-specific packages are often more easily installed from a repository like CPAN or PyPI, but these organizations, too, do not issue security advisories and almost certainly do not have the resources in place to ensure that they are not distributing packages with known vulnerabilities. Many complex applications support some form of plugins and host repositories for them; the attention to security there is mixed at best. Projects like Docker host repositories of images for download. Public hosting sites deliver a lot of software, but are in no position to guarantee the security of that software. And so on.

Combine all this with a net full of bad actors who are intent on installing malware onto users' systems, and the stage is set for a lot of unhappiness. Indeed, it is surprising that there have not been more incidents than we have seen so far. There can be no doubt that we will see more of them in the future.

As a community, we take a certain amount of pride in the security of our software. But, regardless of whether that pride is truly justified, we are all too quick to grab software off some server on the net and run it on our systems — and to encourage others to do the same. As a community, we are going to have to learn to do a better job of keeping our infrastructure secure, of not serving insecure software to others, and of critically judging the security of providers before accepting software from them. It is not fun to feel the need to distrust software given to the community by our peers, but the alternatives may prove to be even less fun than that.

Comments (22 posted)

New video and music features in Kodi 16

By Nathan Willis
February 24, 2016

Version 16.0 of the Kodi free-software media center was released on February 21. Unlike some previous releases that debuted major new features (such as Android support in 13.0 or H.265 video support in 14.0), this new release appears comparatively quiet on the surface. Under the hood, though, there are several changes that will make life easier for users, particularly where user-interface issues and system maintenance are concerned.

[Kodi 16 video playback]

The installation procedure is a rather cut-and-dried affair at this point. Official builds are provided by the project for several Ubuntu-based distributions, as are community-maintained builds for Fedora, Debian, and OpenELEC. There are also packages for download for Android (both ARM and x86), a variety of Raspberry-Pi–specific distributions, Windows, iOS, OS X, BSD, and for an ever-growing list of consumer hardware devices like Amazon's Fire TV. Apart from the iOS and consumer-hardware cases (which may involve jailbreaking steps), little is required to get up and running.

The Kodi user interface, likewise, has stabilized over the past few release cycles, and navigation is more streamlined and logical than it was in years past. There are still UI inconsistencies to be found, such as whether the "close" button appears on the right or the left of a pop-over window. And in some instances, multi-page blocks of text (such as the "Description" block in the Add-ons Manager) can only be scrolled by using the mouse, not with arrow or page up/down keys. But, on the whole, media sources and configuration options are reachable in only a few clicks, it is difficult if not impossible to get lost, and the interface is virtually devoid of arcane internal terminology. The latter is a considerable accomplishment indeed, given that it includes explaining video calibration and debug logging.

Speaking of logging, one of the most-highlighted new features in this release is Kodi's new event-logging framework. This is a mechanism that provides an in-application, browseable view of a wide variety of events: adding new media to the library, altering settings, installing add-ons, and so forth. The logged events include errors and warnings, which the release notes highlight as a feature that users have missed in past releases—leaving them unable to troubleshoot problems when newly-added media fails to show up in the library, for example.

[Kodi music browser]

An example of the subtle improvements in 16.0 is the revamped Music Library feature. Kodi has developed a reputation for admirable handling of video content (both local and remote), but has let its support for serving as a music manager languish. The 16.0 release marks the start of a renewed effort to rectify the situation. Adding new audio content is now much simpler, and Kodi automatically scans and adds the relevant metadata from the files.

There is also a new framework in place to support advanced audio processing. Though it is not active yet, in future releases it will pave the way for a number of audio features, like multi-channel equalizers, "fake surround sound," and a variety of other effects.

Deeper under the hood, the new release brings two changes to the way skins and other add-ons are stored. The first is that the file layout used within skin add-ons has been changed to match that of other add-on types; this was primarily done to make it easier to migrate settings from one skin to another. The second and perhaps more interesting change is that add-ons can now share image resources. It may take some time for add-on authors to begin taking advantage of the feature, but it will enable skins to, for example, provide a customized look to other add-ons (such as theming the icons of media sources to match the UI).

[Kodi nonlinear video stretching]

On the video front, perhaps the most obvious new addition is support for non-linear stretching of 4:3 content to fit on 16:9 displays. The technique employed tries to retain the center of the screen without visible distortion and progressively stretches out the image closer to the sides of the display. Of course, purists still might scoff at anyone deigning to watch Citizen Kane in anything other than the original aspect ratio, but there are surely instances when such elongation is necessary. Users who employ Kodi as a digital video recorder (DVR) will be pleased to note that Kodi's PVR module (from "personal" video recorder) now supports "series" recording rules, which is a staple of most other DVR applications.

Those users with 3D displays (either 3D-capable TVs or virtual-reality headsets) will get to sample a unique new UI feature: Kodi UI "skins" can now employ 3D depth effects, with the default skin ("Confluence") providing an example. The much larger group of users without 3D displays also get some UI improvements, however. Most notably, the "long press" action is now supported in Kodi's remote-control command mapping. That makes it possible to use Kodi with a number of modern, simple remotes—where the recent trend is toward directional arrow keys, a "Select" or "OK" button, and little else.

[Kodi logging settings]

This style of remote is particularly popular with consumer hardware like the Fire TV; users who control Kodi through other means (such as a wireless keyboard) are unaffected. The long press is bound to Kodi's context-menu action by default, so it pops up a menu of additional commands. Those using Kodi on a Linux box with touchscreen support have yet another UI option, as Kodi now supports multi-touch gestures. Gesture support has been available in the Android and iOS releases for some time; there is a small set of gestures recognized by default, though it is configurable.

Finally, the Android rendering stack has been reworked to cope with 4K displays. In earlier releases, both the Kodi UI and any video content being displayed were rendered to the same surface, using libstagefright. But that made it impossible to render the UI and the video at different resolutions. Rendering the 4K version of the Kodi UI brought interactivity to a crawl on most Android devices, while limiting video playback to 720p or 1080p resolution would defeat the purpose of 4K support. Starting with the 16.0 release, the video stream and the UI are rendered to separate MediaCodec surfaces (rather than libstagefright), thus enabling 4K hardware-accelerated video while keeping the UI at its native, non-4K resolution.

As a project, Kodi relies heavily on the community of add-on and skin developers for implementing new user-facing features. So as the core application matures, there may be fewer big developments in every release cycle. Nevertheless, as the 16.0 release shows, there will always be room for improvements. Some of the new under-the-hood functionality will take time to trickle out as developers update add-ons and skins, but there is certainly enough in the new release for users to be happy with the upgrade.

Comments (2 posted)

Systemd vs. Docker

February 24, 2016

This article was contributed by Josh Berkus


DevConf.cz

There were many different presentations at DevConf.cz, the developer conference sponsored by Red Hat in Brno, Czech Republic this year, but containers were the biggest theme of the conference. Most of the presentations were practical, either tutorials showing how to use various container technologies like Kubernetes and Atomic.app, or guided tours of new products like Cockpit.

However, the presentation about containers that was unquestionably the most entertaining was given by Dan Walsh, Red Hat's head of container engineering. He presented on one of the core conflicts in the Linux container world: systemd versus the Docker daemon. This is far from a new issue; it has been brewing since Ubuntu adopted systemd, and CoreOS introduced Rocket, a container system built around systemd.

Systemd vs. Docker

"This is Lennart Poettering," said Walsh, showing a picture. "This is Solomon Hykes", showing another. "Neither one of them is willing to compromise much. And I get to be in the middle between them."

Since Walsh was tasked with getting systemd to work with Docker, he detailed a history of code, personal, and operational conflicts between the two systems. In many ways, it was also a history of patch conflicts between Red Hat and Docker Inc. Poettering is the primary author of systemd and works for Red Hat, while Hykes is a founder and CTO of Docker, Inc.

[Dan Walsh]

According to Walsh's presentation, the root cause of the conflict is that the Docker daemon is designed to take over a lot of the functions that systemd also performs for Linux. These include initialization, service activation, security, and logging. "In a lot of ways Docker wants to be systemd," he claimed. "It dreams of being systemd."

The first conflict he detailed was about service initialization and restart. In the systemd model, all of this is controlled by systemd; in the Docker world, it is all controlled by the Docker daemon. For example, services can be defined in systemd unit files as "docker run" statements to run them as containers, or they can be defined as "autorestart" containers in the Docker daemon. Either approach can work, but mixing them doesn't. The Docker documentation recommends Docker autorestart, except when mixing containerized services with services not in a container; there it recommends systemd or Upstart.

Where this breaks down, however, is when services running as containers depend on other containerized services. For regular services, systemd has a feature called sd_notify that passes messages about when services are ready, so that services that depend on them can then be started. However, Docker has a client-server architecture. docker run and other commands are called in the client for each user session, but the containers are started and managed in the Docker daemon (the "server" in this relationship). The client can't send sd_notify status messages because it doesn't actually manage the container service and doesn't know when the services are up, and the daemon can't send them because it wasn't called by the systemd unit file. This resulted in Walsh's team attempting an elaborate workaround to enable sd_notify:

  1. systemd requests sd_notify from the Docker client
  2. That client sends an sd_notify message to the Docker daemon
  3. The daemon sets up a container to do sd_notify
  4. The daemon gets an sd_notify from the container
  5. The daemon sends an sd_notify message to the client
  6. The client sends an sd_notify message to tell systemd that the Docker container is ready

Walsh was unsurprised when the patches to enable this byzantine system were not accepted by the Docker project. sd_notify does work for the Docker daemon itself, so systemd services can depend on the daemon running. But there is still no way to do sd_notify for individual containerized services, so the Docker project still has no reliable way to manage containerized service dependency startup order.

Systemd has a feature called "socket activation", where services start automatically upon receiving a request to a particular network socket. This lets servers support "occasionally needed" services without running them all the time. There used to be support for socket activation of the Docker daemon itself, but the feature was disabled because it interfered with Docker autorestart.

Walsh's team was more interested in socket activation of individual containers. This would have the benefit of eliminating the overhead of "always on" containers. However, the developers realized that they'd have to do something similar to the sd_notify workaround, only they'd be passing around a socket instead of just a message. They didn't even try to implement it.

Linux control groups, or cgroups, let you define system resource allocations per service, such as CPU, memory, and I/O limits. Systemd allows defining cgroup limits in the initialization files, so that you can define resource profiles for services when they start. With Docker, though, this runs afoul of the client-server model again. The systemd cgroup settings affect only the client; they do not affect the daemon process, where the container is actually running. Instead, each one inherits the cgroup settings of the Docker daemon. Users can pass cgroup limits by passing flags to the docker run statement instead, which works but does not integrate with the overall administrative policies for the system.

The only success story Walsh had to relate was regarding logging. Docker logs also didn't work with systemd's journald. Logging of container output was local to each container, which would cause all logs to be automatically erased whenever a container was deleted. This was a major failing in the eyes of security auditors. Docker 1.9 now supports the --log‑driver=journald switch, which logs to journald instead. However, using journald is not the default for Docker containers, so the switch needs to be passed each time.

Systemd inside containers

Walsh also wanted to get systemd working in Fedora, Red Hat Enterprise Linux (RHEL), and CentOS container base images, partly because many packages require the systemctl utility in order to install correctly. His first effort was something called "fakesystemd" that replaced systemctl with a service that satisfied the systemctl requirement for packages and did nothing else. This turned out to cause problems for users and he soon abandoned it, but not soon enough to prevent it from being released in RHEL 7.0.

In RHEL 7.1, the team added something called "systemd-container", that was a substantially reduced version of systemd. This still caused problems for users who needed full systemd for their software, and Poettering pressured the container team to change it. As of RHEL 7.2, containers have real systemd with decreased dependencies installed so that it can be a little smaller. Walsh's team is working on reducing these dependencies further.

The biggest problem with not having systemd in the container, according to Walsh, is that it goes "back to the days before init scripts." Each image author creates his or her own crazy startup script for the application inside the container, instead of using the startup scripts crafted by the packagers. He showed how easily service initialization is done inside a container that has systemd available, by showing the three-line Dockerfile that is all that is required to create a container running the Apache httpd server:

    FROM fedora
    RUN yum -y install httpd; yum clean all; systemctl enable httpd;
    CMD [ "/sbin/init" ]
[DockerCon,EU badge]

There is a major roadblock to making systemd inside Docker work, though: running a container with systemd inside requires running it with the --privileged flag, which makes it insecure. This is because the Docker daemon requires the "service" application run by the container to always be PID 1. In a container with it, systemd is PID 1 and the application has some other PID, which causes Docker to think the container has failed and shut it down.

Poettering says that PID 1 has special requirements. One of these is killing "zombie" processes that have been abandoned by their calling session. This is a real problem for Docker since the application runs as PID 1 and does not handle the zombie processes. For example, containers running the Oracle database can end up with thousands of zombie processes. Another requirement is writing to syslog, which goes to /dev/null unless you've configured the container to log to journald.

Walsh tried several approaches to make systemd work in non-privileged containers, submitting four different pull requests (7685, 10994, 13525, and 13526) to the Docker project. Each of these pull requests (PRs) was rejected by the Docker maintainers. Arguments around these changes peaked when Jessie Frazelle, a Docker committer, came to DockerCon.EU 2015 with the phrase "I say no to systemd specific PRs" printed on her badge (seen at right).

The future of systemd and containers

The Red Hat container team has also been heavily involved in developing the runC tool of the Open Container Project. That project is the practical output of the Open Container Initiative (OCI), the non-profit council established through the Linux Foundation in 2015 in order to set industry standards for container APIs. The OCI also maintains libcontainer, the library that Docker uses to launch containers. According to Walsh, Docker will eventually need to adopt runC as part of its stack in order to be able to operate on other platforms, particularly Windows.

Using work from runC, Red Hat staff have created a patch set called "oci-hooks" that adds a lot of the systemd-supporting functionality to Docker. It makes use of a "hook" that can activate any executables found in a specific directory between the time the container starts up and when the application is running. Among the things executed by this method is the RegisterMachine hook, which notifies systemd's machinectl on the host that the container is running. This lets users see all Docker containers, as well as runC containers, using the machinectl command:

    # machinectl
    MACHINE                          CLASS     SERVICE
    9a65036e4a6dc769d0e40fa80871f95a container docker 
    fd493b71a79c2b7913be54a1c9c77f1c container runc
    2 machines listed.

The hooks also allow running systemd in non-privileged containers. This PR (17021) was also rejected by the Docker project. Nevertheless, it is being included in the Docker packages that are shipped by Red Hat. So part of the future of Docker and systemd may involve forking Docker.

Walsh also pointed out that cgroups, sd_notify, and socket activation all work out-of-the-box with runC. This is because runC does not use Docker's client-server model; it is just an executable. He does not see the breach between Docker Inc. and Red Hat over systemd healing over in the future. Walsh predicted that Red Hat would probably be moving more toward runC and away from the Docker daemon. According to him, Docker is working on "containerd", its new alternative to systemd, which will take over the functions of the init system.

Given the rapid changes in the Linux container ecosystem in the short time since the Docker project was launched, though, it is almost impossible to predict what the relationship between systemd, Docker, and runC will look like a year from now. Undoubtedly there will be plenty more changes and conflicts to report.

[ Josh Berkus works for Red Hat. ]

Comments (151 posted)

Page editor: Jonathan Corbet

Security

The Glibc DNS resolution vulnerability

By Jake Edge
February 24, 2016

While the recently disclosed GNU C library (Glibc) DNS bug (CVE-2015-7547) is quite serious, one of the interesting aspects is that the real scope of the problem is not yet known. The ability to exploit this bog-standard buffer overflow is dependent on a number of other factors, so there are no known vectors for widespread code execution—publicly known, anyway. There are certainly millions of vulnerable systems out there, many of which are not likely to ever be patched, but it is truly unclear if that will lead to large numbers of compromised systems.

There are a number of obstacles in the way of an attacker wishing to exploit this bug. First off, a client application must call getaddrinfo() to resolve a domain name and use the AF_UNSPEC address family. That family indicates that either an IPv4 or IPv6 address is acceptable, which is the normal way that getaddrinfo() is called these days. Glibc then does two parallel queries for the A and AAAA records for the domain. It is the buffer handling in this parallel query step where things go awry.

Many systems are not configured to query a local caching nameserver; instead Glibc will make a query to the remote nameserver that was configured (or auto-configured by DHCP or the like) for the system. That means these two queries leave the system and, crucially, replies are received. Typically, DNS replies are short, but they can be as large as 64KB. Glibc allocates 2KB bytes on the stack for the reply, but it has provisions to increase that by allocating a heap buffer for replies that are larger. Unfortunately, if the query needs to be retried, the stack buffer gets used instead of the larger, newly allocated buffer, so roughly 62KB of attacker-controlled data can be written to the stack.

There are still more requirements to make all of this happen, though. Normally, UDP is used to do the query, which is typically limited to 512-byte replies, but a man-in-the-middle (MITM) attacker could send more data. But any server could set the "truncation bit" in the reply to cause the client to switch to TCP for its query. Causing the client to retry is evidently tricky, but can be done. The net result can be as bad as a bunch of attacker data on the stack, but even that may be difficult to turn into code execution due to address-space layout randomization (ASLR) and other defensive measures.

Ostensibly it would seem that an attacker could simply set up a DNS server for their domain that would send malicious responses (while causing retries), then "force" clients into looking up this compromised domain. But there are complications; most notably any caching resolvers between the attacker and victim will reject most or all of the malicious responses because they aren't well-formed. It is unclear, however, whether cache-surviving, malicious responses can be constructed.

In a detailed advisory, Glibc developer Carlos O'Donell of Red Hat indicated that the possibility exists:

A back of the envelope analysis shows that it should be possible to write correctly formed DNS responses with attacker controlled payloads that will penetrate a DNS cache hierarchy and therefore allow attackers to exploit machines behind such caches.

Dan Kaminsky followed up on that in his own detailed analysis:

I’m just going to state outright: Nobody has gotten this glibc flaw to work through caches yet. So we just don’t know if that’s possible.

But Kaminsky goes on to posit that "some networks are going to be vulnerable to some cache traversal attacks sometimes", under the theory that attacks only get better over time. The emphasis on the cache is important. An MITM attacker does not need the malicious responses to reside in intermediary caching resolvers (and an MITM can do plenty of other malicious things), but others who might want to exploit this flaw do need that. If a way is found to get these malicious responses into caches, CVE-2015-7547 gets a whole lot worse.

The scope of programs affected by the vulnerability is rather surprising as well. As Kaminsky and others have noted, the problem affects many different programs, from sudo and httpd to gpg and ssh—and beyond. Languages like Python, Haskell, JavaScript, and others are also affected. Some of these "memory-safe" languages protect against buffer overflows in programs written in the language, but the runtimes for those languages use Glibc, so flaws at that level can still affect them. And plenty of programs look up domain names for a variety of reasons. As Kaminsky put it:

If a DNS vulnerability could work through the DNS hierarchy, we would be in a whole new class of trouble, because it is just extraordinarily easy to compel code that does not trust you to retrieve arbitrary domains from anywhere in the DNS. You connect to a web server, it wants to put your domain in its logs, it’s going to look you up. You connect to a mail server, it wants to see if you’re a spammer, it’s going to look you up. You send someone an email, they reply. How does their email find you? Their systems are going to look you up.

Clearly the best "mitigation" is to update affected systems, but that may not be possible in many cases. There are an enormous number of Glibc-using devices out there (e.g. some home routers) that rarely, if ever, get updates. Even if updates are released, getting them into the hands of users and onto the devices is decidedly non-trivial. That has some looking for other types of mitigation.

One that is often mentioned is limiting the size of DNS replies. If no reply is large enough to tickle the bug, then devices running the old code won't be affected. That still doesn't solve the MITM problem, but Kaminsky also argued that length-limiting will have other hard-to-diagnose effects, so it should be avoided. There is a reason that DNS has been engineered to allow for larger responses, so it is effectively too late to put that cat back in the bag.

Using a local caching resolver, rather than requiring Glibc to query the network, will also help in environments where that is possible. If cache-traversing responses eventually surface, they can be handled at that level. Both local and remote caching servers can be changed as we learn more over time. Kaminsky described it this way:

Caching resolvers will learn how to specially handle the case of simultaneous A and AAAA requests. If we’re protected from traversing attacks it’s because the attacker just can’t play a lot of games between UDP and TCP and A and AAAA responses. As we learn more about when the attacks can traverse caches, we can intentionally work to make them not.

Some devices, Android devices in particular, use different C libraries, which are presumably not vulnerable to this particular flaw. There are undoubtedly other vulnerabilities in those, with unknown effects and scope—at least publicly. The bug in Glibc has existed for almost eight years (it was introduced in Glibc 2.9 in May 2008); it is hard to guess what else lurks there—or elsewhere.

It is refreshing to see a security vulnerability disclosed without a name, logo, animated GIF, and hype-ridden web page touting it. Instead we have the disclosure announcements along with some sober analysis of what it all might mean. That used to be the norm and, while it may be a little awkward to use "CVE-2015-7547" rather than some catchy name, it is a welcome change from the hoopla surrounding Heartbleed, GHOST, and others.

Comments (3 posted)

Brief items

Security quotes of the week

The San [Bernardino] shooters legally purchased weapons that resulted in all those deaths. And the big legal push the US Government has decided to make in response?

It’s decided to seek a precedent that would allow it to force every American company to create a backdoor for the Government to snoop on anyone it so pleases.

The logic is outrageous: “People got shot. So we need a backdoor into your phone.

James Allworth

Based on information currently available, we can see that the government effectively locked themselves out of the iPhone in question -- I prefer to charitably assume through error and/or incompetence, rather than the darker possibility of a purposeful plan to force the crypto backdoor controversies more directly into the spotlight of politics during a contentious election year.

Passwords were changed under FBI orders that should not have been. San Bernardino officials did not avail themselves of common device management software that could have prevented this entire problem -- software of a sort that most responsible corporations and other organizations already use with company-owned smartphones in employee hands.

Lauren Weinstein

Somewhere between base arithmetic and x86 is a sandbox people can’t just walk in and out of. To put it bluntly, if this code had been written in JavaScript – yes, really – it wouldn’t have been vulnerable. Even if this network exposed code remained in C, and was just compiled to JavaScript via Emscripten, it still would not have been vulnerable. Efficiently microsandboxing individual codepaths is a thing we should start exploring. What can we do to the software we deploy, at what cost, to actually make exploitation of software flaws actually impossible, as opposed to merely difficult?
Dan Kaminsky on the Glibc DNS vulnerability

Comments (28 posted)

Kaminsky: A Skeleton Key of Unknown Strength

Dan Kaminsky looks at the Glibc DNS bug (CVE-2015-7547). "We’ve investigated the DNS lookup path, which requires the glibc exploit to survive traversing one of the millions of DNS caches dotted across the Internet. We’ve found that it is neither trivial to squeeze the glibc flaw through common name servers, nor is it trivial to prove such a feat is impossible. The vast majority of potentially affected systems require this attack path to function, and we just don’t know yet if it can. Our belief is that we’re likely to end up with attacks that work sometimes, and we’re probably going to end up hardening DNS caches against them with intent rather than accident. We’re likely not going to apply network level DNS length limits because that breaks things in catastrophic and hard to predict ways."

Comments (5 posted)

New vulnerabilities

chromium: code execution

Package(s):chromium-browser CVE #(s):CVE-2016-1628
Created:February 22, 2016 Updated:February 24, 2016
Description: From the CVE entry:

pi.c in OpenJPEG, as used in PDFium in Google Chrome before 48.0.2564.109, does not validate a certain precision value, which allows remote attackers to execute arbitrary code or cause a denial of service (out-of-bounds read) via a crafted JPEG 2000 image in a PDF document, related to the opj_pi_next_rpcl, opj_pi_next_pcrl, and opj_pi_next_cprl functions.

Alerts:
Mageia MGASA-2016-0127 chromium-browser-stable 2016-03-31
Gentoo 201603-09 chromium 2016-03-12
Debian DSA-3486-1 chromium-browser 2016-02-21

Comments (none posted)

chromium: code execution

Package(s):chromium CVE #(s):CVE-2016-1629
Created:February 22, 2016 Updated:February 24, 2016
Description: From the CVE entry:

Google Chrome before 48.0.2564.116 allows remote attackers to bypass the Blink Same Origin Policy and a sandbox protection mechanism via unspecified vectors.

Alerts:
Mageia MGASA-2016-0127 chromium-browser-stable 2016-03-31
Gentoo 201603-09 chromium 2016-03-12
Ubuntu USN-2905-1 oxide-qt 2016-02-23
Red Hat RHSA-2016:0286-01 chromium-browser 2016-02-23
openSUSE openSUSE-SU-2016:0525-1 chromium 2016-02-20
openSUSE openSUSE-SU-2016:0520-1 chromium 2016-02-20
openSUSE openSUSE-SU-2016:0529-1 Chromium 2016-02-20
Debian DSA-3486-1 chromium-browser 2016-02-21
Arch Linux ASA-201602-17 chromium 2016-02-21

Comments (none posted)

didiwiki: unintended access

Package(s):didiwiki CVE #(s):CVE-2013-7448
Created:February 22, 2016 Updated:April 12, 2016
Description: From the Debian advisory:

Alexander Izmailov discovered that didiwiki, a wiki implementation, failed to correctly validate user-supplied input, thus allowing a malicious user to access any part of the filesystem.

Alerts:
Debian DSA-3485-2 didiwiki 2016-04-12
Debian-LTS DLA-424-1 didiwiki 2016-02-22
Debian DSA-3485-1 didiwiki 2016-02-20

Comments (none posted)

ffmpeg: denial of service

Package(s):ffmpeg CVE #(s):CVE-2016-2329
Created:February 22, 2016 Updated:February 24, 2016
Description: From the CVE entry:

libavcodec/tiff.c in FFmpeg before 2.8.6 does not properly validate RowsPerStrip values and YCbCr chrominance subsampling factors, which allows remote attackers to cause a denial of service (out-of-bounds array access) or possibly have unspecified other impact via a crafted TIFF file, related to the tiff_decode_tag and decode_frame functions.

Alerts:
Gentoo 201606-09 ffmpeg 2016-06-19
openSUSE openSUSE-SU-2016:0528-1 ffmpeg 2016-02-20

Comments (none posted)

GraphicsMagick: out-of-bounds read flaw

Package(s):GraphicsMagick CVE #(s):CVE-2015-8808
Created:February 24, 2016 Updated:February 24, 2016
Description: From the Red Hat bugzilla:

An out-of-bounds read flaw was found in the parsing of GIF files using GraphicsMagick.

Alerts:
Mageia MGASA-2016-0252 graphicsmagick 2016-07-14
Debian-LTS DLA-484-1 graphicsmagick 2016-05-21
Debian DSA-3746-1 graphicsmagick 2016-12-24
Fedora FEDORA-2016-49bf88cd29 vdr-tvguide 2016-02-23
Fedora FEDORA-2016-49bf88cd29 vdr-skinnopacity 2016-02-23
Fedora FEDORA-2016-49bf88cd29 vdr-skinenigmang 2016-02-23
Fedora FEDORA-2016-49bf88cd29 octave 2016-02-23
Fedora FEDORA-2016-49bf88cd29 GraphicsMagick 2016-02-23
Fedora FEDORA-2016-49bf88cd29 gdl 2016-02-23

Comments (none posted)

hamster-time-tracker: two denial of service flaws

Package(s):hamster-time-tracker CVE #(s):
Created:February 18, 2016 Updated:February 25, 2016
Description: The Red Hat bugzilla entries: 1 and 2 have some more information about two different crashes of the server processes.
Alerts:
Fedora FEDORA-2016-c97f297cd6 hamster-time-tracker 2016-02-25
Fedora FEDORA-2016-7d556fdafa hamster-time-tracker 2016-02-17

Comments (none posted)

kernel: privilege escalation

Package(s):kernel CVE #(s):CVE-2016-1576 CVE-2016-1575
Created:February 23, 2016 Updated:February 24, 2016
Description: From the Ubuntu advisory:

halfdog discovered that OverlayFS, when mounting on top of a FUSE mount, incorrectly propagated file attributes, including setuid. A local unprivileged attacker could use this to gain privileges. (CVE-2016-1576)

halfdog discovered that OverlayFS in the Linux kernel incorrectly propagated security sensitive extended attributes, such as POSIX ACLs. A local unprivileged attacker could use this to gain privileges. (CVE-2016-1575)

Alerts:
Ubuntu USN-2910-2 linux-lts-vivid 2016-02-27
Ubuntu USN-2909-2 linux-lts-utopic 2016-02-27
Ubuntu USN-2908-5 linux-lts-wily 2016-02-27
Ubuntu USN-2908-4 kernel 2016-02-26
Ubuntu USN-2908-3 linux-raspi2 2016-02-22
Ubuntu USN-2908-2 linux-lts-wily 2016-02-22
Ubuntu USN-2910-1 linux-lts-vivid 2016-02-22
Ubuntu USN-2909-1 linux-lts-utopic 2016-02-22
Ubuntu USN-2907-2 linux-lts-trusty 2016-02-22
Ubuntu USN-2907-1 kernel 2016-02-22
Ubuntu USN-2908-1 kernel 2016-02-22

Comments (none posted)

libssh: insecure ssh sessions

Package(s):libssh CVE #(s):CVE-2016-0739
Created:February 23, 2016 Updated:March 24, 2016
Description: From the Debian LTS advisory:

Aris Adamantiadis of the libssh team discovered that libssh, an SSH2 protocol implementation used by many applications, did not generate sufficiently long Diffie-Hellman secrets.

This vulnerability could be exploited by an eavesdropper to decrypt and to intercept SSH sessions.

Alerts:
Gentoo 201606-12 libssh 2016-06-26
Red Hat RHSA-2016:0566-01 libssh 2016-04-01
openSUSE openSUSE-SU-2016:0880-1 libssh 2016-03-24
Fedora FEDORA-2016-dc9e8da03c libssh 2016-03-13
openSUSE openSUSE-SU-2016:0722-1 libssh 2016-03-11
Slackware SSA:2016-057-01 libssh 2016-02-26
Fedora FEDORA-2016-d9f950c779 libssh 2016-02-28
Mageia MGASA-2016-0082 libssh 2016-02-24
Debian DSA-3488-1 libssh 2016-02-23
Arch Linux ASA-201602-18 libssh 2016-02-23
Ubuntu USN-2912-1 libssh 2016-02-23
Debian-LTS DLA-425-1 libssh 2016-02-23

Comments (none posted)

libssh2: insecure ssh sessions

Package(s):libssh2 CVE #(s):CVE-2016-0787
Created:February 23, 2016 Updated:November 23, 2016
Description: From the Debian advisory:

Andreas Schneider reported that libssh2, a SSH2 client-side library, passes the number of bytes to a function that expects number of bits during the SSHv2 handshake when libssh2 is to get a suitable value for 'group order' in the Diffie-Hellman negotiation. This weakens significantly the handshake security, potentially allowing an eavesdropper with enough resources to decrypt or intercept SSH sessions.

Alerts:
Gentoo 201606-12 libssh 2016-06-26
Red Hat RHSA-2016:0428-01 libssh2 2016-03-10
CentOS CESA-2016:0428 libssh2 2016-03-10
CentOS CESA-2016:0428 libssh2 2016-03-10
Oracle ELSA-2016-0428 libssh2 2016-03-10
Scientific Linux SLSA-2016:0428-1 libssh2 2016-03-10
Fedora FEDORA-2016-7942ee2cc5 libssh2 2016-03-09
Oracle ELSA-2016-0428 libssh2 2016-03-10
openSUSE openSUSE-SU-2016:0639-1 libssh2_org 2016-03-03
Fedora FEDORA-2016-215a2219b1 libssh2 2016-02-26
Arch Linux ASA-201602-21 lib32-libssh2 2016-02-25
Arch Linux ASA-201602-20 libssh2 2016-02-25
Debian-LTS DLA-426-1 libssh2 2016-02-23
Debian DSA-3487-1 libssh2 2016-02-23
Mageia MGASA-2016-0392 libssh2 2016-11-21

Comments (none posted)

libxmp: multiple vulnerabilities

Package(s):libxmp CVE #(s):
Created:February 18, 2016 Updated:February 24, 2016
Description: From the Mageia advisory:

The libxmp package has been updated to version 4.3.11, fixing several bugs, including possible crashes when loading corrupted input data. See the upstream changelog for details.

Alerts:
Mageia MGASA-2016-0064 libxmp 2016-02-17

Comments (none posted)

mariadb: multiple vulnerabilities

Package(s):mariadb mysql CVE #(s):CVE-2015-4807 CVE-2016-0599 CVE-2016-0601
Created:February 22, 2016 Updated:February 24, 2016
Description: From the CVE entries:

Unspecified vulnerability in Oracle MySQL Server 5.5.45 and earlier and 5.6.26 and earlier, when running on Windows, allows remote authenticated users to affect availability via unknown vectors related to Server : Query Cache. (CVE-2015-4807)

Unspecified vulnerability in Oracle MySQL 5.7.9 allows remote authenticated users to affect availability via unknown vectors related to Optimizer. (CVE-2016-0599)

Unspecified vulnerability in Oracle MySQL 5.7.9 allows remote authenticated users to affect availability via unknown vectors related to Partition. (CVE-2016-0601)

Alerts:
Fedora FEDORA-2016-65a1f22818 community-mysql 2016-03-09
Fedora FEDORA-2016-5cb344dd7e community-mysql 2016-03-09
Fedora FEDORA-2016-868c170507 mariadb 2016-03-05
Fedora FEDORA-2016-e30164d0a2 mariadb 2016-02-21

Comments (none posted)

ntp: three vulnerabilities

Package(s):ntp CVE #(s):CVE-2015-7973 CVE-2015-7975 CVE-2015-7976
Created:February 24, 2016 Updated:February 24, 2016
Description: From the Red Hat bugzilla:

It was found that when NTP is configured in broadcast mode, a man-in-the-middle attacker or a malicious client could replay packets received from the broadcast server to all (other) clients. This could cause the time on affected clients to become out of sync over a longer period of time. (CVE-2015-7973)

It was found that ntpq did not implement a proper lenght check when calling nextvar(), which executes a memcpy(), on the name buffer. A remote attacker could potentially use this flaw to crash an ntpq client instance. (CVE-2015-7975)

The ntpq saveconfig command does not do adequate filtering of special characters from the supplied filename. Note: the ability to use the saveconfig command is controlled by the 'restrict nomodify' directive, and the recommended default configuration is to disable this capability. If the ability to execute a 'saveconfig' is required, it can easily (and should) be limited and restricted to a known small number of IP addresses. (CVE-2015-7976)

Alerts:
Ubuntu USN-3096-1 ntp 2016-10-05
SUSE SUSE-SU-2016:2094-1 yast2-ntp-client 2016-08-17
SUSE SUSE-SU-2016:1912-1 ntp 2016-07-29
Gentoo 201607-15 ntp 2016-07-20
openSUSE openSUSE-SU-2016:1423-1 ntp 2016-05-27
SUSE SUSE-SU-2016:1311-1 ntp 2016-05-17
openSUSE openSUSE-SU-2016:1292-1 ntp 2016-05-12
SUSE SUSE-SU-2016:1247-1 ntp 2016-05-06
SUSE SUSE-SU-2016:1177-1 ntp 2016-04-28
SUSE SUSE-SU-2016:1175-1 ntp 2016-04-28
Slackware SSA:2016-054-04 ntp 2016-02-23

Comments (none posted)

obs-service-download_files: code injection

Package(s):obs-service-download_files CVE #(s):
Created:February 22, 2016 Updated:February 24, 2016
Description: From the openSUSE advisory:

Various code/parameter injection issues could have allowed malicious service definition to execute commands or make changes to the user's file system

Alerts:
openSUSE openSUSE-SU-2016:0521-1 obs-service-download_files, 2016-02-20

Comments (none posted)

php-horde-horde: cross-site scripting

Package(s):php-horde-horde CVE #(s):CVE-2015-8807 CVE-2016-2228
Created:February 22, 2016 Updated:February 29, 2016
Description: From the Red Hat bugzilla:

An XSS vulnerability was found in _renderVarInput_number in Horde/Core/Ui/VarRenderer/Html.php, where input in numeric field wasn't properly escaped. (CVE-2015-8807).

A cross-site scripting vulnerability was found in php-horde application framework. No input validation was put in place while searching via the menu bar. (CVE-2016-2228).

Alerts:
Debian DSA-3496-1 php-horde-core 2016-02-28
Debian DSA-3497-1 php-horde 2016-02-28
Fedora FEDORA-2016-3d1183830b php-horde-horde 2016-02-21
Fedora FEDORA-2016-5d0e7f15ef php-horde-horde 2016-02-21

Comments (none posted)

poco: SSL server spoofing

Package(s):poco CVE #(s):CVE-2014-0350
Created:February 22, 2016 Updated:February 24, 2016
Description: From the CVE entry:

The Poco::Net::X509Certificate::verify method in the NetSSL library in POCO C++ Libraries before 1.4.6p4 allows man-in-the-middle attackers to spoof SSL servers via crafted DNS PTR records that are requested during comparison of a server name to a wildcard domain name in an X.509 certificate.

Alerts:
Fedora FEDORA-2016-0b3a611401 poco 2016-02-21
Fedora FEDORA-2016-4a3e5618eb poco 2016-02-21

Comments (none posted)

websvn: cross-site scripting

Package(s):websvn CVE #(s):CVE-2016-2511
Created:February 24, 2016 Updated:March 21, 2016
Description: From the Debian advisory:

Jakub Palaczynski discovered that websvn, a web viewer for Subversion repositories, does not correctly sanitize user-supplied input, which allows a remote user to run reflected cross-site scripting attacks.

Alerts:
Fedora FEDORA-2016-11537160e9 websvn 2016-03-20
Fedora FEDORA-2016-657a1305aa websvn 2016-03-21
Debian-LTS DLA-428-1 websvn 2016-02-24
Debian DSA-3490-1 websvn 2016-02-24

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 4.5-rc5, released on February 20. "Things continue to look normal, and things have been fairly calm. Yes, the VM THP cleanup seems to still be problematic on s390, but other than that I don't see anything particularly worrisome."

Stable updates: 4.3.6 (the final 4.3.x update) and 3.10.97 were released on February 19. The 4.4.3, 3.14.62, and 3.10.98 updates are in the review process as of this writing; they can be expected on or after February 24.

Comments (none posted)

Quotes of the week

Oh geez, we have a spelling.txt! I think we can declare the kernel as done and go do something else with our lives...
Borislav Petkov

Which realistically won't actually matter because in 22 years time nobody will be able to find a 32bit system in common use. If you look at x86 platforms today a Pentium Pro is already a collectors item. All of today's locked down half-maintained embedded and phone devices will be at best the digital equivalent of toxic waste if connected to anything.
Alan Cox

Comments (none posted)

Kernel development news

A BoF on kernel network performance

By Nathan Willis
February 24, 2016

Netdev/Netconf

Whether one measures by attendance or by audience participation, one of the most popular sessions at the Netdev 1.1 conference in Seville, Spain was the network-performance birds-of-a-feather (BoF) session led by Jesper Brouer. The session was held in the largest conference room to a nearly packed house. Brouer and seven other presenters took the stage, taking turns presenting topics related to finding and removing bottlenecks in the kernel's packet-processing pipeline; on each topic, the audience weighed in with opinions and, often, proposed fixes.

The BoF was not designed to produce final solutions, but rather to encourage debate and discussion—hopefully fostering further work. Debate was certainly encouraged, to the point where Brouer was not able to get to every topic on the agenda before time had elapsed. But what was covered provides a good snapshot of where network-optimization efforts stand today.

DDoS mitigation

The first to speak was Gilberto Bertin from web-hosting provider CloudFlare. The company periodically encounters network bottlenecks on its Linux hosts, he said, with the most egregious being those caused by distributed denial-of-service (DDoS) attacks. Even a relatively small packet flood, such as two million UDP packets per second (2Mpps), will max out the kernel's packet-processing capabilities, saturating the receive queue faster than it can be emptied and causing the system to drop packets. 2Mpps is nowhere near the full 10G Ethernet wire speed of 14Mpps.

DDoS attacks are usually primitive, and an iptables drop rule targeting each source address should suffice, but CloudFlare has found it insufficient. Instead, the company is forced to offload traffic to a user-space packet handler. Bertin proposed two approaches to solving the problem: using Berkeley Packet Filter (BPF) programs shortly after packet ingress to parse incoming packets (dropping DDoS packets before they enter the receive queue), and using circular buffers to process incoming traffic (thus eliminating many memory allocations).

Brouer reported that he had tested several possible solutions himself, including using receive packet steering (RPS) and dedicating a CPU to handling the receive queue. Using RPS alone, he was able to handle 7Mpps in laboratory tests; by also binding a CPU, the number rose to 9Mpps. Audience members proposed several other approaches; Jesse Brandeburg suggested designating a queue for DDoS processing and steering other traffic away from it. Brouer discussed some tests he had run attempting to put drop rules as early as possible in the pipeline; none made a drastic difference in the throughput. When an audience member asked if BPF programs could be added to the network interface card's (NIC's) driver, David Miller suggested that running drop-only rules against the NIC's DMA buffer would be the fastest the kernel could possibly respond.

There was also a lengthy discussion about how to reduce the overhead caused by memory operations. Brouer reported that memcpy() calls accounted to as much as 43% of the time required to process a received packet. Jamal Hadi Salim asked whether sk_buff buffers could simply be recycled in a ring; Alexander Duyck replied that not all NIC drivers would support that approach. Ultimately, Brouer wrapped up the topic by saying there was no clear solution: latency hides in a number places in the pipeline, so reducing cache misses, using bulk memory allocation, and re-examining the entire allocation strategy on the receive side may be required.

Transmit powers

Brouer then presented the next topic, improving transmit performance. He noted that bulk transmission with the xmit_more API had solved the outgoing-traffic bottleneck, enabling the kernel to transmit packets at essentially full wire speed. But, he said, the "full wire speed" numbers are really achievable only in artificial workloads. For practical usage, it is hard to activate the bulk dequeuing discipline. Since the technique lowers CPU utilization, it would be beneficial to many users if it could be enabled well before one approaches the bandwidth limit.

He suggested several possible alternative means to activate xmit_more, including setting off a trigger whenever the hardware transmit queue gets full, tuning Byte Queue Limits (BQLs), and providing a user-space API to activate bulk sending. He had experimented some with the BQL idea, he reported: adjusting the BQL downward until the bulk queuing discipline kicks in resulted in a 64% increase in throughput.

Tom Herbert was not thrilled about that approach, noting that BQL was, by design, intended to be configuration-free; using it as a tunable feature seems like asking for trouble. John Fastabend asked if a busy driver could drop packets rather than queuing them, thus triggering the bulk discipline. Another audience member proposed adding an API through which the kernel could tell a NIC driver to split its queues. There was little agreement on approaches, although most in attendance seemed to feel that further discussion in this area was well warranted.

The trials of small devices

Next, Felix Fietkau of the OpenWrt project spoke, raising concerns that recent development efforts in the kernel networking space focused too much on optimizing behavior for high-end Intel-powered machines, at the risk of hurting performance on low-end devices like home routers and ARM systems. In particular, he pointed out that these smaller devices have significantly smaller data cache sizes, comparable instruction cache sizes but without smart pre-fetchers, and smaller cache-line sizes. Some of the recent optimizations, particularly cache-line optimizations, can hurt performance on small systems, he said.

He showed some benchmarks of kernel 4.4 running on a 720MHz Qualcomm QCA9558 system-on-chip. Base routing throughput was around 268Mbps; activating nf_conntrack_rtcache raised throughput to 360Mbps. Also removing iptable_mangle and iptable_raw increased throughput to 400Mbps. The takeaway, he said, was that removing or conditionally disabling unnecessary hooks (such as statistics-gathering hooks) was vital, as was eliminating redundant accesses to packet data.

Miller commented that the transactional overhead of the hooks in question was the real culprit, and asked whether or not many of the small devices in question would be a good fit for hardware offloading via the switchdev driver model. Fietkau replied that many of the devices do support offload, but that it is usually crippled in some fashion, such as not being configurable.

Fietkau also presented some out-of-tree hacks used to improve performance on small devices, including using lightweight socket buffers and using dirty pointer tricks to avoid invalidating the data cache.

Caching

Brouer then moved on to the topic of instruction-cache optimization. The network stack, he said, does a poor job of utilizing the instruction cache, since the typical cache size is shorter than the code used to process the average Ethernet packet. Furthermore, even though many packets appearing in the same time window get handled in the same manner, processing each packet individually means each packet hits the same instruction-cache misses.

He proposed several possible ways to better utilize the cache, starting with processing packets in bundles, enabling several to be processed simultaneously at each stage. NIC drivers could bundle received packets, he said, for more optimal processing. The polling routine already processes many packets at once, but it currently calls "the full stack" for each packet. And the driver can view all of the packets available in the receive ring, so it could simply treat them all as having arrived at the same time and process them in bulk. A side effect of this approach, he said, would be that it hides latency caused by cache misses.

A related issue, he said, is that the first cache miss often happens too soon for prefetching, in the eth_type_trans() function. By delaying the call to eth_type_trans() in the network stack's receive loop, the miss can be avoided. Even better, he said, would be to avoid calling eth_type_trans() altogether. The function is used to determine the packet's protocol ID, he said, which could also be determined from the hardware RX descriptor.

Brouer also proposed staging bundles of packets for processing at the generic receive offload (GRO) and RPS layers. GRO does this already, he said, though it could be further optimized. Implementing staged processing for RPS faces one hurdle in the fact that RPS takes cross-CPU locks for each packet. But Eric Dumazet pointed out that bulk enqueuing for remote CPUs should be easily doable. RPS already defers sending the inter-processor interrupt, which essentially amortizes the cost across multiple packets.

TC and other topics

Fastabend then spoke briefly (as time was running short) about the queuing discipline (qdisc) code path in the kernel's traffic control (TC) mechanism. Currently, qdisc takes six lock operations, even if the queue is empty and the packet is transmitted directly. He ran some benchmarks that showed that the locks account for 70–82% of the time spent in qdisc, and thus set out to re-implement qdisc in a lockless manner. He has posted an RFC implementation that reduces the lock count to two; the work is, therefore, not complete yet, but there are other items remaining on the to-do list. One is support for bulk dequeuing, the other is gathering some real-world numbers to determine if the performance improvement is as anticipated.

Brouer then gave a quick overview of the "packet-page" concept: at a very early point in the receive process, a packet could be extracted from the receive ring into a memory page, allowing it to be sent on an alternative processing path. "It's a crazy idea," he warned the crowd, but it has several potential use cases. First, it could be a point for kernel-bypass tools (such as the Data Plane Development Kit) to hook into. It could also allow the outgoing network interface to simply move the packet directly into the transmit ring, and it could be useful for virtualization (allowing guest operating systems to rapidly forward traffic on the same host). Currently, implementing packet-page requires hardware support (in particular, hardware that marks packet types in the RX descriptor), but Brouer reported that he has seen some substantial and encouraging results in his own experiments.

As the session time finally elapsed for good, Brouer also briefly addressed some ideas for reworking the memory-allocation strategy for received packets (as alluded to in the first mini-presentation of the BoF). One idea is to write a new allocator specific to the network receive stack. There are a number of allocations identified as introducing overhead, so there is plenty of room for improvement.

But other approaches are possible, too, he said. Perhaps using a DMA mapping would be preferable, thus avoiding all allocations. There are clear pitfalls, such as needing a full page for each packet and the overhead of clearing out enough headroom for inserting each sk_buff.

Finally, Brouer reminded the audience of just how far the kernel networking stack has come in recent years. In the past two years alone, he said, the kernel has moved from a maximum transmit throughput of 4Mpps to the full wire speed of 14.8Mpps. IPv4 forwarding speed has increased from 1Mpps to 2Mpps on single core machines (and even more on multi-core machines). The receive throughput started at 6.4Mpps and, with the latest experimental patches, now hovers around 12Mpps. Those numbers should be an encouragement; if the BoF attendees are anything to judge by, further performance gains are no doubt on the horizon still.

[The author would like to thank the Netconf and Netdev organizers for travel assistance to Seville.]

Comments (3 posted)

Sigreturn-oriented programming and its mitigation

By Jonathan Corbet
February 24, 2016
In the good old days (from one point of view, at least), attackers had an easy life; all they had to do was to locate a buffer overrun vulnerability, then they could inject whatever code they liked into the vulnerable process. Over the years, kernel developers have worked to ensure that data that can be written by an application cannot be executed by that application; that has made simple code-injection unfeasible in most settings. Attackers have responded with techniques like return-oriented programming (ROP), but ROP attacks are relatively hard to get right. On some systems, attackers may be able to use the simpler sigreturn-oriented programming (SROP) technique instead; kernel patches have been circulating in an attempt to head off that class of attacks.

Some background

If data on the stack cannot be executed, a buffer overflow vulnerability cannot be used to inject code directly into an application. Such vulnerabilities can, however, be used to change the program counter by overwriting the current function's return address. If the attacker can identify code existing within the target process's address space that performs the desired task, they can use a buffer overflow to "return" to that code and gain control.

Unfortunately for attackers, most programs lack a convenient "give me a shell" location to jump to via an overwritten return address. But it is still likely that the program contains the desired functionality; it is just split into little pieces and scattered throughout the address space. The core idea behind return-oriented programming is to find these pieces in places where they are followed by a return instruction. The attacker, who controls the stack, can not only jump to the first of these pieces; they can also place a return address on the stack so that when this piece executes its return instruction, control goes to another attacker-chosen location — the next piece of useful code. By stringing together a set of these "gadgets," the attacker can create a new program within the target process.

There are various tools out there to help with the creation of ROP attacks. Scanners can pass through an executable image and identify gadgets of interest. "ROP compilers" can then create a program to accomplish the attacker's objective. But the necessary gadgets may not be available, and techniques like address-space layout randomization (ASLR) make ROP attacks harder. So ROP attacks tend to be fiddly affairs, often specific to the system being attacked (or even to the specific running process). Attackers, being busy people like the rest of us, cannot be blamed if they look for easier ways to compromise a system.

Exploiting sigreturn()

Enter sigreturn(), a Linux system call that nobody calls directly. When a signal is delivered to a process, execution jumps to the designated signal handler; when the handler is done, control returns to the location where execution was interrupted. Signals are a form of software interrupt, and all of the usual interrupt-like accounting must be dealt with. In particular, before the kernel can deliver a signal, it must make a note of the current execution context, including the values stored in all of the processor registers.

It would be possible to store this information in the kernel itself, but that might make it possible for an attacker (of a different variety) to cause the kernel to allocate arbitrary amounts of memory. So, instead, the kernel stores this information on the stack of the process that is the recipient of the signal. Prior to invoking the signal handler, the kernel pushes an (architecture-specific) variant of the sigcontext structure onto the process's stack; this structure contains register information, floating-point status, and more. When the signal handler has completed its job, it calls sigreturn(), which restores all that information from the on-stack structure.

Attackers employing ROP techniques have to work to find gadgets that will store the desired values into specific processor registers. If they can call sigreturn(), though, life gets easier, since that system call sets the values of all registers directly from the stack. As it happens, the kernel has no way to know whether a specific sigreturn() call comes from the termination of a legitimate signal handler or not; the whole system was designed so that the kernel would not have to track that information. So, as Erik Bosman and Herbert Bos noted in this paper [PDF], sigreturn() looks like it might be helpful to attackers.

There is one obstacle that must be overcome first, though: an attacker must find a ROP gadget that makes a call to sigreturn() — and few applications do that directly. One way to do that would be to locate a more generic gadget for invoking system calls, then arrange for the appropriate number to be passed to indicate sigreturn(). But in many cases that is unnecessary; for years, the kernel developers conveniently put a sigreturn() call in a place where attackers could find it — at a fixed address that is not subject to ASLR. That address is in the "virtual dynamic shared object" (vDSO) area, a page mapped by the kernel in a known location into every process to optimize some system calls. On other systems, the sigreturn() call can be found in the C library; exploiting that one requires finding a way to leak some ASLR information first.

Bosman and Bos demonstrated that sigreturn() can be used to exploit processes with a buffer overflow vulnerability. Often, the sigreturn() gadget is the only one that is required to make the exploit work; in some cases, the exploit can be written in a system-independent way, able to be reused with no additional effort. More recent kernels have made these exploits harder (the vDSO area is no longer usable, for example), but they are still far from impossible. And, in any case, many interesting targets are running older kernels.

Stopping SROP

Scott Bauer recently posted a patch set meant to put an end to SROP attacks. Once the problem is understood, the solution becomes clear relatively quickly: the kernel needs a way to verify that a sigcontext structure on the stack was put there by the kernel itself. That would ensure that sigreturn() can only be called at the end of a real signal delivery.

Scott's patch works by generating a random "cookie" value for each process. As part of the signal-delivery process, that cookie is stored onto the stack, next to the sigcontext structure. Prior to being stored, it is XORed with the address of the stack location where it is to be stored, making it a bit harder to read back; future plans call for hashing the value as well, making the recovery of the cookie value impossible. Even without hashing, though, the cookie should be secure enough; an attacker who can force a signal and read the cookie off the stack is probably already in control.

The sigreturn() implementation just needs to verify that the cookie exists in the expected location; if it's there, then the call is legitimate and the call can proceed. Otherwise the operation ends and a SIGSEGV signal is delivered to the process, killing it unless the process has made other arrangements.

There are some practical problems with the patch still; for example, it will not do the right thing in settings where checkpoint-restore in user space is in use (a restored process will have a new and different random cookie value, but old cookies may still be on the stack). Such problems can be worked around, but they may force the addition of a sysctl knob to turn this protection off in settings where it breaks things. It also does nothing to protect against ROP attacks in general, it just closes off one relatively easy-to-exploit form of those attacks. But, as low-hanging fruit, it is probably worth pursuing; there is no point in making an attacker's life easier.

Comments (4 posted)

DAX and fsync: the cost of forgoing page structures

February 24, 2016

This article was contributed by Neil Brown

DAX, the support library that can help Linux filesystems provide direct access to persistent memory (PMEM), has seen substantial ongoing development since we covered it nearly 18 months ago. Its main goal is to bypass the page cache, allowing reads and writes to become memory copies directly to and from the PMEM, and to support mapping that PMEM directly into a process's address space with mmap(). Consequently, it was a little surprising to find that one of the challenges in recent months was the correct implementation of fsync() and related functions that are primarily responsible for synchronizing the page cache with permanent storage.

While that primary responsibility of fsync() is obviated by not caching any data in volatile memory, there is a secondary responsibility that is just as important: ensuring that all writes that have been sent to the device have landed safely and are not still in the pipeline. For devices attached using SATA or SCSI, this involves sending (and waiting for) a particular command; the Linux block layer provides the blkdev_issue_flush() API (among a few others) for achieving this. For PMEM we need something a little different.

There are actually two "flush" stages needed to ensure that CPU writes have made it to persistent storage. One stage is a very close parallel to the commands sent by blkdev_issue_flush(). There is a subtle distinction between PMEM "accepting" a write and "committing" a write. If power fails between these events, data could be lost. The necessary "flush" can be performed transparently by a memory controller using Asynchronous DRAM Refresh (ADR) [PDF], or explicitly by the CPU using, for example, the new x86_64 instruction PCOMMIT. This can be seen in the wmb_pmem() calls sprinkled throughout the DAX and PMEM code in Linux; handling this stage is no great burden.

The burden is imposed by the other requirement: the need to flush CPU caches to ensure that the PMEM has "accepted" the writes. This can be avoided by performing "non-temporal writes" to bypass the cache, but that cannot be ensured when the PMEM is mapped directly into applications. Currently, on x86_64 hardware, this requires explicitly flushing each cache line that might be dirty by invoking the CLFLUSH (Cache Line Flush) instruction or possibly a newer variant if available (CLFLUSHOPT, CLWB). An easy approach, referred to in discussions as the "Big Hammer", is to implement the blkdev_issue_flush() API by calling CLFLUSH on every address of the entire persistent memory. While CLFLUSH is not a particularly expensive operation, performing it over potentially terabytes of memory was seen as worrisome.

The alternative is to keep track of which regions of memory might have been written recently and to only flush those. This can be expected to bring the amount of memory being flushed down from terabytes to gigabytes at the very most, and hence to reduce run time by several orders of magnitude. Keeping track of dirty memory is easy when the page cache is in use by using a flag in struct page. Since DAX bypasses the page cache, there are no page structures for most of PMEM, so an alternative is needed. Finding that alternative was the focus of most of the discussions and of the implementation of fsync() support for DAX, culminating in patch sets posted by Ross Zwisler (original and fix-ups) that landed upstream for 4.5-rc1.

Is it worth the effort?

There was a subthread running through the discussion that wondered whether it might be best to avoid the problem rather than fix it. A filesystem does not have to use DAX simply because it is mounted from a PMEM device. It can selectively choose to use DAX or not based on usage patterns or policy settings (and, for example, would never use DAX on directories, as metadata generally needs to be staged out to storage in a controlled fashion). Normal page-cache access could be the default and write-out to PMEM would use non-temporal writes. DAX would only be enabled while a file is memory mapped with a new MMAP_DAX flag. In that case, the application would be explicitly requesting DAX access (probably using the nvml library) and it would take on the responsibility of calling CLFLUSH as appropriate. It is even conceivable that future processors could make cache flushing for a physical address range much more direct, so keeping track of addresses to flush would become pointless.

Dan Williams championed this position putting his case quite succinctly:

DAX in my opinion is not a transparent accelerator of all existing apps, it's a targeted mechanism for applications ready to take advantage of byte addressable persistent memory.

He also expressed a concern that fsync() would end up being painful for large amounts of data.

Dave Chinner didn't agree. He provided a demonstration suggesting that the proposed overheads needed for fsync() would be negligible. He asserted instead:

DAX is a method of allowing POSIX compliant applications get the best of both worlds - portability with existing storage and filesystems, yet with the speed and byte [addressablity] of persistent storage through the use of mmap.

Williams' position resurfaced from time to time as it became clear that there were real and ongoing challenges in making fsync() work, but he didn't seem able to rally much support.

Shape of the solution

In general, the solution chosen is to still use the page cache data structures, but not to store struct page pointers in them. The page cache uses a radix tree that can store a pointer and a few tags (single bits of extra information) at every page-aligned offset in a file. The space reserved for the page pointer can be used for anything else by setting the least significant bit to mark it as an exception. For example, the tmpfs filesystem uses exception entries to keep track of file pages that have been written out to swap.

Keeping track of dirty regions of a file can be done by allocating entries in this radix tree, storing a blank exception entry in place of the page pointer, and setting the PAGECACHE_TAG_DIRTY tag. Finding all entries with a tag set is quite efficient, so flushing all the cache lines in each dirty page to perform fsync() should be quite straightforward.

As this solution was further explored, it was repeatedly found that some of those fields in struct page really are useful, so an alternative needed to be found.

Page size: PG_head

To flush "all the cache lines in each dirty page" you need to know how big the page is — it could be a regular page (4K on x86) or it could be a huge page (2M on x86). Huge pages are particularly important for PMEM, which is expected to sometimes be huge. If the filesystem creates files with the required alignment, DAX will automatically use huge pages to map them. There are even patches from Matthew Wilcox that aim to support the direct mapping for extra-huge 1GB pages — referred to as "PUD pages" after the Page Upper Directory level in the four-level page tables from which they are indexed.

With a struct page the PG_head flag can be used to determine the page size. Without that, something else is needed. Storing 512 entries in the radix tree for each huge page would be an option, but not an elegant option. Instead, one bit in the otherwise unused pointer field is used to flag a huge-page entry, which is also known as a "PMD" entry because it is linked from the Page Middle Directory.

Locking: PG_locked

The page lock is central to handling concurrency within filesystems and memory management. With no struct page there is no page lock. One place where this has caused a problem is in managing races between one thread trying to sync a page and mark it as clean and another thread dirtying that page. Ideally, clean pages should be removed from the radix tree completely as they are not needed there, but attempts to do that have, so far, failed to avoid the race. Jan Kara suggested that another bit in the pointer field could be used as a bit-spin-lock, effectively duplicating the functionality of PG_locked. That seems a likely approach but it has not yet been attempted.

Physical memory address

Once we have enough information in the radix tree to reliably track which pages are dirty and how big they are, we just need to know where each page is in PMEM so it can be flushed. This information is generally of little interest to common code so handling it is left up to the filesystem. Filesystems will normally attach something to the struct page using the private pointer. In filesystems that use the buffer_head library, the private pointer links to a buffer_head that contains a b_blocknr field identifying the location of the stored data.

Without a struct page, the address needs to be found some other way. There are a number of options, several of which have been explored. The filesystem could be asked to perform the lookup from file offset to physical address using its internal indexing tables. This is an indirect approach and may require the filesystem to reload some indexing data from the PMEM (it wouldn't use direct-access for that). While the first patch set used this approach, it did not survive long.

Alternately, the physical address could be stored in the radix tree when the page is marked as dirty; the physical address will already be available at that time as it is just about to be accessed for write. This leads to another question: exactly how is the physical address represented? We could use the address where the PMEM is mapped into the kernel address space, but that leads to awkward races when a PMEM device is disabled and unmapped. Instead, we could use a sector offset into the block device that represents the PMEM. That is what the current implementation does, but it implicitly assumes there is just one block device, or at least just one per file. For a filesystem that integrates volume management (as Btrfs does), this may not be the case.

Finally, we could use the page frame number (PFN), which is a stable index that is assigned by the BIOS when the memory is discovered. Wilcox has patches to move in this direction, but the work is 70% maybe 50% done. Assuming that the PFN can be reliably mapped to the kernel address that is needed for CLFLUSH, this seems like the best solution.

Is this miniature struct page enough?

One way to look at this development is that a 64-bit miniature struct page has been created for the DAX use case to avoid the cost of a full struct page. It currently contains a "huge page" flag and a physical sector number. It may yet gain a lock bit and have a PFN in place of the sector number. It seems prudent to ask if there is anything else that might be needed before DAX functionality is complete.

As quoted above, Chinner appears to think that transparent support for full POSIX semantics should be the goal. He went on to opine that:

This is just another example of how yet another new-fangled storage technology maps precisely to a well known, long serving storage architecture that we already have many, many experts out there that know to build reliable, performant storage from... :)

Taking that position to its logical extreme would suggest that anything that can be done in the existing storage architecture should work with PMEM and DAX. One such item of functionality that springs to mind is the pvmove tool. When a filesystem is built on an LVM2 volume, it is possible to use pvmove to move some of the data from one device to another, to balance the load, decommission old hardware, or start using new hardware. Similar functionality could well be useful with PMEM.

There would be a number of challenges to making this work with DAX, but possibly the biggest would be tearing down memory mappings of a section of the old memory before moving data across to the new. The Linux kernel has some infrastructure for memory migration that would be a perfect fit — if only the PMEM had a table of struct page as regular memory does. Without those page structures, moving memory that is currently mapped becomes a much more interesting task, though likely not an insurmountable one.

On the whole, it seems like DAX is showing a lot of promise but is still in its infancy. Currently, it can only be used on ext2, ext4, and XFS, and only where they are directly mounted on a PMEM device (i.e. there is no LVM support). Given the recent rate of change, it is unlikely to stay this way. Bugs will be fixed, performance will be improved, coverage and features will likely be added. When inexpensive persistent memory starts appearing on our motherboards it seems that Linux will be ready to make good use of it.

Comments (7 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 4.5-rc5 ?
Greg KH Linux 4.3.6 ?
Steven Rostedt 3.18.27-rt25 ?
Steven Rostedt 3.14.61-rt62 ?
Steven Rostedt 3.12.54-rt72 ?
Greg KH Linux 3.10.97 ?
Steven Rostedt 3.10.97-rt105 ?
Steven Rostedt 3.2.77-rt110 ?

Architecture-specific

Build system

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Filesystems and block I/O

Memory management

Networking

Security-related

Miscellaneous

Page editor: Jonathan Corbet

Distributions

The end of the Iceweasel Age

By Nathan Willis
February 24, 2016

For roughly the past decade, Debian has shipped the Mozilla desktop applications (Firefox, Thunderbird, and Seamonkey) in a rebranded form that replaces the original, trademarked names and logos with alternatives (Iceweasel, Icedove, and Iceape). Originally, this effort was undertaken to work around incompatibilities between the Debian Free Software Guidelines (DFSG), the Mozilla trademark-usage policy, and the licenses of the Mozilla logos. But times—and policy wordings—change, and Debian now seems poised to resume calling its packages by the original, upstream Mozilla names.

It is important to understand that, despite the similarities in name, Debian's Iceweasel is not in the same category as GNU IceCat, which is an actual fork of the code. Iceweasel consists of binaries rebuilt by Debian with only minimal alterations—most obviously to remove the Mozilla branding, but other functional changes as well (such as using system libraries and hooking into the Debian package manager).

The rebranding issue originated in 2004. At that time, the Mozilla trademark policy only permitted usage of the Firefox logo on downstream packages that adhered to a set of strict "Distribution Partners" guidelines that prohibited changing the search engines, extensions, directory structure, and other details—clearly making the Distribution Partner rules (and the less stringent "Community Edition" rules) incompatible with the DFSG.

Confusingly enough, the Community Edition rules would have allowed Debian to use the name "Firefox" but not to use the name "Mozilla Firefox" nor to use the Firefox logo. Yet another wrinkle for DFSG compliance was that the actual graphics files for the logo, as the FAQ page explained, were distributed under non-free license terms (prohibiting modification) anyhow. Furthermore, and perhaps even most problematic, the policy required redistributors to seek Mozilla's approval for any other modifications to the package. And Debian's Firefox packagers needed to make modifications, starting with rather fundamental necessities like integrating with the distribution's package manager, rather than using Firefox's built-in updater.

It was proposed that Mozilla could grant a trademark license to Debian, outside of the generic, public trademark policy, but Debian Project Leader (DPL) Branden Robinson contended that such an agreement would run afoul of section eight of the DFSG, which prohibits licensing agreements that are specific to the Debian project and, thus, are not transferred automatically to Debian users. After considerable debate, bug #354622 was opened in February 2006 by Mozilla's Mike Connor, and the Iceweasel name change was implemented to close it.

Re-discussion

It is now 2016, however, and most users or developers could be forgiven for forgetting that Mozilla ever had "Distribution" and "Community" partner programs, much less what all of the details were. The Mozilla trademark guidelines have morphed considerably over the years and, in particular, they have become far more open. The logos and product names are no longer subject to separate terms, and the current guidelines only state that "making significant functional changes" prohibits a downstream project from using the Mozilla trademarks.

On February 17, Mozilla's Sylvestre Ledru opened bug #815006, stating that "the various issues mentioned in bug #354622 have been now tackled" and including a patch that renames the packaged version of Iceweasel to Firefox. It is not entirely clear whether the original logos will return as well, although now that they are available under the same terms as the name trademarks, it seems like a possibility. Ledru's initial report includes a recap of recent discussions between Mozilla and Debian. Of particular note is the assessment by Mozilla of Debian's modifications to the code:

Mozilla recognizes that patches applied to Iceweasel/Firefox don't impact the quality of the product. Patches which should be reported upstream to improve the product always have been forward upstream by the Debian packagers. Mozilla agrees about specific patches to facilitate the support of Iceweasel on architecture supported by Debian or Debian-specific patches.

More generally, Mozilla trusts the Debian packagers to use their best judgment to achieve the same quality as the official Firefox binaries.

In case of derivatives of Debian, Firefox branding can be used as long as the patches applied are in the same category as described above. Ubuntu having a different packaging, this does not apply to that distribution.

Furthermore, Ledru notes that Debian has adopted a new approach to backporting security patches. In the past, one of the key non-branding modifications Debian made to the Mozilla applications was backporting recent security fixes. This was necessary because Debian's stable releases remain supported for a lengthy period of time (two years), far longer than Firefox, which is now updated every six to eight weeks. It might seem like security patches would be uncontroversial, given the benefit to users, but Mozilla objected to them quite early on in the Iceweasel debate.

Now, however, Mozilla has implemented its Extended Support Release (ESR) program, which makes maintaining an old release simpler for both Mozilla and Debian. First, Debian has committed to providing security fixes for the ESR releases of Firefox, not to every Firefox release. In addition, once the ESR release initially shipped with a Debian "stable" release is no longer provided with security updates from Mozilla, Debian updates the package to the next ESR release.

In essence, then, the logo-licensing problem, the trademark-usage incompatibility, and the patch-maintenance problem have all been resolved, so, Ledru said, Debian could return to the Firefox branding.

Except that not everyone in the Debian project was easily convinced that the trademark issue was resolved. For instance, Paul Wise asked for clarification about how the new trademark-usage guidelines meshed with section eight of the DFSG. "Mozilla's trademark policy isn't clear about how much modification requires Mozilla's written consent", he noted. If Debian, in order to use the trademarks in a manner different from the public policy, was being granted special permission from Mozilla, that would constitute a licensing agreement that Debian could not pass on to downstream users.

Stefano Zacchiroli replied, however, that there is no formal or contractual arrangement; in other words, Mozilla is not granting a trademark license to Debian. Instead, Mozilla is acknowledging that the patches and other work that have gone into the Debian packages over the past ten years do not violate the trademark policy. Connor concurred, adding:

The one point I'll clarify is that this isn't even something I'd call an exception. We have always sought to permit and enable modifications that do not negatively impact users (in terms of security/privacy, user expectations of Firefox stability/behaviour/compatibility, etc). Other distros have been following this process for more than a decade, so it's definitely not a special case for Debian. I'm thrilled that we're finally making this step forward with Debian.

Perhaps it feels strange to have a dilemma that Debian was forced into by the specifics of policy documents and project governance guidelines be resolved by such a seemingly informal statement. But it is important to remember that Mozilla's casual-sounding blessing of Debian's Firefox modifications is not the only change to have taken place. The Mozilla trademark policy and logo-usage guidelines have evolved considerably since 2006, and the ESR program has changed the face of long-term maintenance not just for Debian, but for many other users as well.

The plan, as it stands presently, is for the Iceweasel package to be renamed Firefox in the Debian 9 "stretch" release (slated for an early 2017 release). For simplicity in package maintenance, the Iceweasel package in the current stable release (Debian 8 "jessie") will not be renamed. Similar changes should be expected for Icedove and Iceape, although those discussions are still underway with the Debian package maintainers.

Comments (8 posted)

Brief items

Distribution quote of the week

I guess I'm a fan of cross-distro solutions in general, but it would be fair to say that the fact that something is broadly used doesn't automatically make it better. The reality is that Gentoo's approach to package management is completely different from how almost everybody does things but we wouldn't be here if we didn't like it.
-- Rich Freeman

Comments (9 posted)

Open source Zephyr Project aims to deliver an RTOS

The Linux Foundation has announced the Zephyr Project, which is aimed at building a real-time operating system (RTOS) for the Internet of Things (IoT). "Modularity and security are key considerations when building systems for embedded IoT devices. The Zephyr Project prioritizes these features by providing the freedom to use the RTOS as is or to tailor a solution. The project’s focus on security includes plans for a dedicated security working group and a delegated security maintainer. Broad communications and networking support is also addressed and will initially include Bluetooth, Bluetooth Low Energy and IEEE 802.15.4, with plans to expand communications and networking support over time." The Zephyr Kernel v1.0.0 Release Notes provide more details.

Comments (53 posted)

Linux Mint downloads (briefly) compromised

The Linux Mint blog announces that the project's web site was compromised and made to point to a backdoored version of the distribution. "As far as we know, the only compromised edition was Linux Mint 17.3 Cinnamon edition. If you downloaded another release or another edition, this does not affect you. If you downloaded via torrents or via a direct HTTP link, this doesn’t affect you either. Finally, the situation happened today, so it should only impact people who downloaded this edition on February 20th."

Update: it appears that the Linux Mint forums were compromised too; users should assume that their passwords have been exposed.

Comments (121 posted)

FreedomBox 0.8 Released

FreedomBox 0.8 has been released. New images have not been created for this release. It is available in Debian unstable as two packages, freedombox-setup 0.8 and plinth 0.8.1-1. Quassel, an IRC client that stays connected to IRC networks and can synchronize multiple frontends, has been added and the first boot user interface has been improved.

Full Story (comments: none)

Ubuntu 14.04.4 LTS released

The fourth point release of Ubuntu 14.04 LTS is available for its Desktop, Server, Cloud, and Core products, as well as other flavors of Ubuntu with long-term support. "We have expanded our hardware enablement offering since 12.04, and with 14.04.4, this point release contains an updated kernel and X stack for new installations to support new hardware across all our supported architectures, not just x86."

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Subgraph OS Wants to Make Using a Secure Operating System Less of a Headache (Motherboard)

Motherboard takes a look at Subgraph OS. "In my tests, Subgraph OS worked fine out of the box, aside from some bugs that [Subgraph president David Mirza Ahmad] pointed out and provided workarounds for (the project is still in a pre-alpha stage). Those fixes required some use of the Linux command line, and users will probably need some experience of using a terminal to get the most out of their system. In sum, Subgraph OS appears easier to get to grips with than other secure options, but likely still requires a learning curve for users switching from, say, Windows or OSX for the first time. I ran Subgraph OS in virtual machines with 2GB and 4GB RAM."

Comments (none posted)

Page editor: Rebecca Sobol

Development

Rethinking the OpenStack development cycle

By Jonathan Corbet
February 24, 2016
The OpenStack cloud-management system project is a relative newcomer, having first been announced in mid-2010. It has grown quickly since then, and is now a core part of many commercial offerings. That growth has inevitably led to some growing pains, though. Recent discussions on a pair of proposals — one rather more official than the other — shine some light into where those pains are being felt and how the project might evolve to address them.

Stabilization cycles

Back in January, Flavio Percoco (a member of the OpenStack technical committee) posted a proposal for the addition of "stabilization cycles" to the OpenStack development process. OpenStack's process uses a six-month cycle; its many sub-projects are expected to coordinate their major releases around these cycles. The OpenStack "Design Summits" are scheduled for the beginning of each cycle. Each of these cycles brings a whole set of new features and, naturally, new bugs.

What Flavio was proposing, after a discussion in the technical committee, was that, occasionally, one of these cycles could be designated for "stabilization" changes only. The proposal included a fair amount of flexibility, in that "stabilization" could include work like refactoring and code cleanups. The period allotted for this work could be a full six-month cycle, or one of the "milestone" periods designated within a cycle. Stabilization cycles, Flavio said, could bring a number of benefits, including more bugs fixed, a reduction in the review backlog, and the ability to focus on larger features that require more than one cycle to implement.

Much of the discussion focused on the potential costs of stabilization cycles. One cost that a number of participants found surprising was that, it seems, a number of companies give bonuses to developers when they get features merged into the OpenStack mainline; a stabilization cycle would thus cost those developers money. There seems to be fairly widespread agreement that this kind of compensation model runs counter to the interests of an open-source project in general, but that doesn't change the fact that this model appears to be in use at some companies.

The bigger problem, though, is that, over the years, it has been shown that stabilization cycles tend not to work well in large, fast-moving projects. The kernel used to work in that model; a look at the 2.4 cycle gives a good example of how these things can go wrong. The refusal to accept features into the mainline does not stop the development of those features or magically create more resources for the fixing of bugs. Companies continue to develop the features that they want; they will then either try to sneak them in as "bug fixes" or simply ship them without bothering to merge them upstream first. The results tend to be less stability and more fragmentation in deployed versions of the code.

Based on this experience, James Bottomley recommended against the idea of stabilization cycles, suggesting instead that the OpenStack cycle should be reworked to look more like how the kernel does things:

I still think what OpenStack actually needs is simply a longer stabilisation time. Right now, in the 6 month cycle, there are about five months of development beginning with the design summit and one month of -rc stabilisation. In today's model, to extend stabilisation you have to steal time from feature development, which again causes a lot of argument. To fix this, I'd turn that around and make it one month of feature merging followed by five months of -rc stabilisation.

To do this, he said, OpenStack would need to establish something like the linux-next tree for early integration and testing of new code. He also suggested that the design summit should be moved toward the middle of the development cycle to facilitate discussion of work that is aimed at the next merge window.

The discussion on this proposal eventually wound down without any firm conclusions beyond a sense that there was little interest in the establishment of project-wide stabilization cycles. Thierry Carrez (OpenStack's release manager and chair of the technical committee) suggested that the most important thing would be to communicate to the sub-projects that any of them could impose their own stabilization cycles if that seemed appropriate to them.

Reclaiming the design summit

James's suggestions for a reworked development cycle seem unlikely to be taken up by the project anytime soon, but one specific idea — changing the timing of the design summit — came back in a modified form in late February. The "design summit" event, which started as a small group of developers getting work done, has, over time, been overwhelmed by the OpenStack Summit, a rather larger, co-located event with a disturbingly high necktie-to-T-shirt ratio. That has led to developers feeling that the original purpose of the event has been lost; as Jay Pipes (another technical committee member) put it:

With the OpenStack Summit growing more and more marketing- and sales-focused, the contributors attending the design summit are often unfocused. The precious little time that developers have to actually work on the next release planning is often interrupted or cut short by the large numbers of "suits" and salespeople at the conference event, many of which are peddling a product or pushing a corporate agenda.

Jay suggested that the design summit should be split off from the main conference so that the developers could gain a respite from the suits at a more focused, less glitzy, and less expensive gathering.

It turns out that Thierry had been working on just this type of idea; he posted his proposal on February 22. The plan is to split the OpenStack Summit into two separate events. The first would be a technical event "held in a simpler, scaled-back setting" aimed at getting work done. This gathering would happen in relatively inexpensive locations, in the hope that companies would be more willing to send more of their developers.

The second event, instead, would be "the main downstream business conference, with high-end keynotes, marketplace and breakout sessions". It would, presumably, remain in relatively fancy, suit-friendly locations. In addition to serving the needs of the business community, it would be intended to serve as a location where feedback on releases could be gathered, along with requests and requirements for future releases.

The new, currently unnamed developer conference would be held toward the end of the development cycle, a couple of weeks before the release happens. That would allow the discussion of work that is planned for the next cycle, and on any last-minute release problems as well. The business event, instead, would move to the middle of the development cycle. At that point, the previous release will have been around for long enough to find its way into products and for companies to have learned about its good and bad points. The next cycle, meanwhile, is far enough away that feedback from the conference can still be incorporated. The new scheme would be phased in next year, with the first "contributors event" happening in February 2017. Thierry provided a timeline diagram [PDF] to illustrate how it would work.

Response to the proposal has been mostly positive, though Michael Krotscheck worried that it heralded the beginning of the end for the design summit. Sales and marketing is where the money is, he said, and a conference that excluded them would not do well in the corporate priority-setting process. Another potential concern, raised in the previous discussion, is that the ability to meet the developers is one of the selling points of the main conference. If the developers are no longer there, that conference, too, will suffer.

In the end, though, a development conference needs to be a relatively small and focused affair if it is to be a place where a lot of work gets done. The proposed event split might just make that possible, though the size of the project now ensures that a gathering of its developers can only be so small. In general, the problems faced by OpenStack are the kind of problems that many other projects can only hope for. Success tends to force changes; we have probably only begun to see the ways in which OpenStack will need to change to remain successful in the coming years.

Comments (none posted)

Brief items

Quotes of the week

But just because there is a skill-based fence around your project, does not mean there's no gate in that fence. Let's back up to the hypothetical snob's argument: "We don't want just anyone throwing patches into the repository. They need to know what they are doing." This is very true. But the rest of that statement needs to be: "…and we need to show them how to participate."
Brian Proffitt

For some reason the httpd status pages (e.g. 404) use the Comic Sans typeface. This patch removes comic sans and sets the typeface to the default sans-serif typeface of the client.

This lowers the number of people contacting website maintainers with typeface complaints bordering on harassment.

Peter Krantz

Comments (none posted)

Ardour 4.7 released

Version 4.7 of the Ardour digital-audio workstation has been released. The update includes two key new features: a dialog that displays detailed spectral and waveform analysis for exported files, and substantially improved support for Mackie Control brand hardware control consoles. Many other improvements are listed in the announcement, including preliminary support for importing work from ProTools 10 and 11.

Comments (none posted)

GNU C Library 2.23 released

Version 2.23 of the GNU C Library (glibc) has been released. The headline feature this time around seems to be Unicode 8.0.0 support; there are a number of API changes, performance improvements and security fixes as well.

Full Story (comments: 6)

Libinput 1.2 released

Version 1.2.0 of the libinput library is now available. New features include support for three-finger "pinch" gestures and the ability to independently toggle support for tap-and-drag and tapping in general. Also noteworthy is that the motion hysteresis feature is now disabled by default. "This provides smoother motion especially on small to tiny motions, making single-pixel elements much easier to target. On some devices, especially older touchpads the hysteresis may be required. We've enabled a bunch of those already, if you notice the pointer wobbling when hold the finger still, please file a bug so we can fix this."

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

Upcoming features in GCC 6

The Red Hat developer blog looks at what's coming in version 6 of the GNU Compiler Collection. "The x86/x86_64 is a segmented memory architecture, yet GCC has largely ignored this aspect of the Intel architecture and relied on implicit segment registers. Low level code such as the Linux kernel & glibc often have to be aware of the segmented architecture and have traditionally resorted to asm statements to use explicit segment registers for memory accesses. Starting with GCC 6, variables may be declared as being relative to a particular segment. Explicit segment registers will then be used to access those variables in memory." The GCC 6 release can be expected sometime around April.

Comments (37 posted)

Qt Roadmap for 2016

At the Qt blog, Tuukka Turunen sets out the project's roadmap for the coming year. Three releases are currently scheduled: Qt 5.6 in March, Qt 5.7 in May, and Qt 5.8 in October. Of note, the 5.6 release will be designated a Long Term Support (LTS) release: "As part of our LTS promise, we guarantee that Qt 5.6 will be supported for three years via standard support, after which additional extended support can be purchased. During this time period, even though following Qt releases (Qt 5.7, Qt 5.8 and so on) are available, Qt 5.6 will receive patch releases providing bug fixes and security updates throughout the three-year period after the release." New features on the roadmap include high-DPI support, C++11 support in Qt modules, and dropping LGPLv2.1 as a licensing option (in favor of LGPLv3).

Comments (none posted)

Page editor: Nathan Willis

Announcements

Brief items

OSI annual report

The Open Source Initiative has published its annual report [PDF] for 2015. "In 2015 we bid fond-farewell to long time President Simon Phipps and welcomed Allison Randal as the new OSI Board President. Simon, who was first elected to the Board in 2010 and became President in 2012, opted not to run for President in his final year on the Board in order to help transition the new President. Simon's presidency will be remembered for his work to transform the OSI into a member-led organization, giving voice to our individual and affiliate members in Board elections and opening up opportunities for direct participation through Working Groups and Incubator Projects." (Thanks to Martin Michlmayr)

Comments (none posted)

The new Board of Directors of The Document Foundation

The Document Foundation has announced its new Board of Directors. "Elected as directors are, in order of votes: Marina Latini (Studio Storti), Michael Meeks (Collabora), Thorsten Behrens (CIB), Jan Holesovsky (Collabora), Osvaldo Gervasi (independent), Simon Phipps (independent) and Eike Rathke (Red Hat). Elected as deputies are, in order of votes: Norbert Thiebaud (independent), Bjoern Michaelsen (Canonical) and Andreas Mantke (independent). The board has elected Marina Latini as Chairwoman and Michael Meeks as Deputy Chairman."

Full Story (comments: none)

Articles of interest

Kirkland: ZFS licensing and Linux

Dustin Kirkland justifies Ubuntu's plans to ship the ZFS filesystem kernel module. "And zfs.ko, as a self-contained file system module, is clearly not a derivative work of the Linux kernel but rather quite obviously a derivative work of OpenZFS and OpenSolaris. Equivalent exceptions have existed for many years, for various other stand alone, self-contained, non-GPL and even proprietary (hi, nvidia.ko) kernel modules."

Comments (219 posted)

Calls for Presentations

EuroPython 2016: Call for Proposals

EuroPython will take place July 17-24 in Bilbao, Spain. The call for proposals closes March 6. "We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. EuroPython is a community conference and we are eager to hear about your experience." Early-bird ticket sales are open. Regular sales will begin after the early-bird tickets sell out.

Full Story (comments: none)

CfP: MiniDebConf Vienna

There will be a MiniDebConf at Linuxwochen Wien in Vienna, Austria. MiniDebCamp will be held April 28-29, followed by the mini-conference April 30-May 1. The call for proposals closes March 15.

Full Story (comments: none)

Flock 2016 update

Registration is open for Flock to Fedora, which will be held August 2-5 in Krakow, Poland. The call for submissions for talks and workshops is open until April 8.

Full Story (comments: none)

openSUSE Conference returns to Nuremberg

The openSUSE Conference will take place June 22-26 in Nuremberg, Germany. See the announcement for more details. The call for papers closes April 15.

Comments (none posted)

CFP Deadlines: February 25, 2016 to April 25, 2016

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
February 28 April 6 PostgreSQL and PostGIS, Session #8 Lyon, France
February 28 May 10
May 12
Samba eXPerience 2016 Berlin, Germany
February 28 April 18
April 19
Linux Storage, Filesystem & Memory Management Summit Raleigh, NC, USA
February 28 June 21
June 22
Deutsche OpenStack Tage Köln, Deutschland
February 28 June 24
June 25
Hong Kong Open Source Conference 2016 Hong Kong, Hong Kong
March 1 April 23 DevCrowd 2016 Szczecin, Poland
March 6 July 17
July 24
EuroPython 2016 Bilbao, Spain
March 9 June 1
June 2
Apache MesosCon Denver, CO, USA
March 10 May 14
May 15
Open Source Conference Albania Tirana, Albania
March 12 April 26 Open Source Day 2016 Warsaw, Poland
March 15 April 28
May 1
Mini-DebCamp & DebConf Vienna, Austria
March 20 April 28
April 30
Linuxwochen Wien 2016 Vienna, Austria
March 25 July 11
July 17
SciPy 2016 Austin, TX, USA
April 1 May 26 NLUUG - Spring conference 2016 Bunnik, The Netherlands
April 2 May 2
May 3
PyCon Israel 2016 Tel Aviv, Israel
April 7 April 8
April 10
mini Linux Audio Conference 2016 Berlin, Germany
April 8 August 2
August 5
Flock to Fedora Krakow, Poland
April 15 June 27
July 1
12th Netfilter Workshop Amsterdam, Netherlands
April 15 June 22
June 26
openSUSE Conference 2016 Nürnberg, Germany
April 24 August 20
August 21
Conference for Open Source Coders, Users and Promoters Taipei, Taiwan

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Get ready to Fork the System at LibrePlanet

Registration is open for LibrePlanet, which will be held March 19-20 in Cambridge, MA. "This year's conference program will examine how free software creates the opportunity of a new path for its users, allows developers to fight the restrictions of a system dominated by proprietary software by creating free replacements, and is the foundation of a philosophy of freedom, sharing, and change. Sessions like "Yes, the FCC might ban your operating system" and "GNU/Linux and Chill: Free software on a college campus" will offer insights about how to resist the dominance of proprietary software, which is often built in to university policies and government regulations."

Full Story (comments: none)

Power Management and Energy-Awareness Microconference

The Power Management and Energy-Awareness Microconference has been accepted into the 2016 Linux Plumbers Conference, which will be held November 2-4 in Santa Fe, NM. "This microconference will look at elimination of timers from cpufreq governors, unifying idle management in SoCs with power resources shared between CPUs and I/O devices, load balancing utilizing workload consolidation and/or platform energy models to improve energy-efficiency and/or performance, improving CPU frequency selection efficiency by utilizing information provided by the scheduler in intel_pstate and cpufreq governors, idle injection (CFS-based vs. play idle with kthreads), ACPI compliance tests (from the power management perspective) and more."

Full Story (comments: none)

Events: February 25, 2016 to April 25, 2016

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
February 24
February 25
AGL Member's Meeting Tokyo, Japan
February 27 Open Source Days Copenhagen, Denmark
March 1 Icinga Camp Berlin Berlin, Germany
March 1
March 6
Internet Freedom Festival Valencia, Spain
March 8
March 10
Fluent 2016 San Francisco, CA, USA
March 9
March 11
18th German Perl Workshop Nürnberg, Germany
March 10
March 12
Studencki Festiwal Informatyczny (Students' Computer Science Festival) Cracow, Poland
March 11
March 13
PyCon SK 2016 Bratislava, Slovakia
March 11
March 13
Zimowisko Linuksowe TLUG Puck, Poland
March 14
March 17
Open Networking Summit Santa Clara, CA, USA
March 14
March 18
CeBIT 2016 Open Source Forum Hannover, Germany
March 16
March 17
Great Wide Open Atlanta, GA, USA
March 18
March 20
FOSSASIA 2016 Singapore Singapore, Singapore
March 19
March 20
Chemnitzer Linux Tage 2016 Chemnitz, Germany
March 19
March 20
LibrePlanet Boston, MA, USA
March 23 Make Open Source Software 2016 Bucharest, Romania
March 29
March 31
Collaboration Summit Lake Tahoe, CA, USA
April 1 DevOps Italia Bologna, Italy
April 4
April 8
OpenFabrics Alliance Workshop Monterey, CA, USA
April 4
April 6
Web Audio Conference Atlanta, GA, USA
April 4
April 6
Embedded Linux Conference San Diego, CA, USA
April 4
April 6
OpenIoT Summit San Diego, CA, USA
April 5
April 7
Lustre User Group 2016 Portland, OR, USA
April 6 PostgreSQL and PostGIS, Session #8 Lyon, France
April 7
April 8
SRECon16 Santa Clara, CA, USA
April 8
April 10
mini Linux Audio Conference 2016 Berlin, Germany
April 9
April 10
OSS Weekend Bratislava, Slovakia
April 11
April 13
O’Reilly Software Architecture Conference New York, NY, USA
April 15
April 18
Libre Graphics Meeting London, UK
April 15
April 17
PyCon Italia Sette Firenze, Italia
April 15
April 17
Akademy-es 2016 Madrid, Spain
April 16 15. Augsburger Linux Info Tag Augsburg, Germany
April 18
April 19
Linux Storage, Filesystem & Memory Management Summit Raleigh, NC, USA
April 18
April 20
PostgreSQL Conference US 2016 New York, NY, USA
April 20
April 21
Vault 2016 Raleigh, NC, USA
April 21
April 24
GNOME.Asia Summit Delhi, India
April 23 DevCrowd 2016 Szczecin, Poland
April 23
April 24
LinuxFest Northwest Bellingham, WA, USA

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds