|
|
Log in / Subscribe / Register

LWN.net Weekly Edition for September 29, 2016

GTK+, version numbering, and long-term support

By Nathan Willis
September 28, 2016

GTK+ version 3.22 was released on September 21, bringing with it a range of improvements to Wayland support, gesture support for pressure-sensitive tablets, several new widgets, and more. The release also marks a turning point for how stable and development branches of the code will be maintained. Moving forward, the project is adopting a new scheme that allows it to designate certain stable releases for long-term support. The plan also breaks with past releases where version numbering is concerned, though the project is keen to downplay that change in favor of focusing on the support that stable releases will offer to downstream projects.

The new release scheme was announced on September 1 in a Google+ post and an accompanying blog post written by Allan Day. The blog post explains more of the background issues that led up to the decision to adopt a new scheme.

GTK+ has long used the traditional "major.minor.micro" numbering scheme (sometimes called semantic versioning) that was once the approach favored by free-software projects. Bumping the major number indicated a significant API break, breaking backward compatibility. Bumping the minor number to the next even value indicated a stable update, while the odd values designated development branches. Micro (or patch) releases were reserved for bug-fix updates.

But, in the GTK+ 3.x era, Day notes, the project picked up significant development speed and also adopted a strict six-month release cycle. That pace has led to concerns over GTK+'s stability, particularly for projects other than GNOME, which shares many developers and other contributors with the GTK+ project. The GTK+ developers, however, want the project to be useful for a wide range of projects outside of GNOME, which prompted discussions earlier in 2016 about changing the release schedule.

Rethinking

In June, after the annual GTK+ hackfest, Allison Lortie announced the proposed change in a blog post that sparked a fair share of confusion and concern. Commenters were evidently perplexed by the proposal, which included difficult-to-parse statements like this:

"Gtk 4.0" is not "Gtk 4". "Gtk 4.0" is the first raw version of what will eventually grow into "Gtk 4", sometime around Gtk 4.6 (18 months later).

One early comment by "Alex" summed up much of the general reaction:

Why in all that is holy are you going to release unstable apis with x.0 versions? Why don’t you call them betas like all proper developers do and call the stable versions x.0?

To be fair, though, releasing x.0 versions that were unstable was certainly not the intent of the scheme announced. Rather, the plan was meant to suggest that GTK+ version 4 would continue to evolve over the course of the subsequent 4.y releases. Nevertheless, the confusion was demonstrably a problem. At GUADEC in August, the GTK+ team reexamined the topic with a promise to present an updated plan as soon as possible.

Rethinking again

The September 1 announcements, then, constitute that update, which will hopefully prove clearer to outsiders. In essence, the GTK+ x.0.0 releases moving forward will be designated stable, long-term support versions, with the project planning to release an x.0.0 release about once every two to three years. In between these releases, minor updates will also appear that may introduce new functionality. The minor releases will not be bound to a fixed six-month release cycle, however.

Next, the GTK+ development branches will be numbered x.9, to indicate that they are unstable releases being built in preparation for release x+1. This means that, in the future, there may be (for example) a stable, long-term support GTK+ 5.0 available, a series of updated releases (GTK+ 5.2, 5.4, and so on), and a development branch numbered 5.9.

Furthermore, any features deprecated in one x.0 release will be removed in the following x+1.0 release. This is another area where GTK+ has not historically had a strict policy, so stating and adhering to a regular deprecation formula will no doubt please many outside developers. The new plan also states that minor releases may add new widgets and update the GDK drawing backends used by the various window systems supported, but that no other changes will be made. Finally, micro releases for bug fixes and security updates will be made for three years.

Thus, the total lifespan of the x.0 long-term support releases will be three years. The wording is a bit ambiguous as to whether x.2 and other minor updates will also be supported for three years (potentially several months after the x.0 release), but that does not sound like the intent of the plan.

On a technical note, the blog post notes that future development releases of GTK+ will be labeled with the future stable release's version number in the pkg-config file, in order to make them parallel-installable with the current release. So, for example, the pkg-config file in GTK+ 3.90 will be gtk+-4.0, so it will not conflict with the current stable release, GTK+ 3.22.

Development releases are expected to appear about once every six months, all bearing version numbers in the x.9 range (e.g., x.90, x.92, x.94, etc.). That puts some indirect pressure on the project to release a stable y.0 release once the development version's minor number reaches .98, as Sébastien Wilmet noted on the GTK+ development list.

The new plan sets out a fairly regular numbering and release scheme but, of course, transitioning between the old and new schemes will be a tad awkward. This awkwardness takes the form of the new stable release, GTK+ 3.22, being declared the first long-term support version, even though it is not branded with an x.0 version number. Hopefully, that will be seen as a small price to pay for more predictable releases.

Downstreaming

The hope is that the plan for major and minor releases will better serve downstream project developers and Linux distributions. A guarantee of three years of security fixes should be enough for most Linux distributions, while the promise to make no significant changes to GTK+ internals in minor releases ought to be welcome news for downstream application developers. For distributions that offer their own long-term support releases with a lifespan longer than three years, Day asks that distribution representatives get in touch with the GTK+ project to develop a support plan.

Day's blog post also assures downstream developers that the project is committed to doing a better job of communicating changes—and of doing so in advance:

While the GTK+ team reserves the right to change API during development series, this does not mean that the whole GTK+ API will constantly break each release; only specific, and hopefully rarely used parts of the API may change, and if the changes are too extensive they will very likely be delayed to the next major development cycle. We’ll ensure that these changes are well-communicated in advance.

One of the other criticisms the project has faced in the past was that too many decisions were made within the relatively small set of core GTK+ developers, with that information not always making its way out into the wider GTK+ community in a timely fashion. The project must still deliver on this promise to ensure that changes are well-communicated to the outside world, but acknowledging the concern and making a public commitment to doing better are important steps.

Despite the increased emphasis on meeting the needs of downstream developers, there has not yet been a public statement from GTK+'s largest downstream project, GNOME, on whether (or how) it will adopt the same updated version-numbering and stability plan. In the past, GNOME and GTK+ version numbers have stayed in sync; with the newly announced plan, GNOME would have to adjust its numbering and release schedule as well in order to maintain that relationship.

Then again, perhaps no such change is warranted. A big part of the rationale for GTK+'s change was to better serve non-GNOME projects; enabling those two projects to move at different paces could be just what the developers want.

Comments (29 posted)

The anatomy of a Vulkan driver

By Jake Edge
September 28, 2016

X.Org Developers Conference

Jason Ekstrand gave a presentation at the 2016 X.Org Developers Conference (XDC) on a driver that he and others wrote for the new Vulkan 3D graphics API on Intel graphics hardware. Vulkan is significantly different from OpenGL, which led the developers to making some design decisions that departed from those made for OpenGL drivers.

[Jason Ekstrand]

He started with an "obligatory brag slide" (slides [PDF]) that outlined the progress that had been made on the driver in only eight months, with roughly three and a half people. Ekstrand, Kristian Høgsberg, and Chad Versace, with help from a dozen others, got a Vulkan driver working that was released (as open source) on the same day that the Vulkan specification was released in February. Not everything was written from scratch; the driver uses the same internal representation and back-end compiler that Mesa uses. The driver passed the conformance tests on day one as well, which is not something that everyone in the industry can say, Ekstrand said.

Vulkan is a new industry-standard 3D rendering and compute API from Khronos, which is the same group that maintains OpenGL. It is not simply OpenGL++, he said, as it has been redesigned from the ground up. Vulkan is designed for modern GPUs and software. It will run on currently shipping (OpenGL ES 3.1 class) hardware.

A lot has happened since SGI released OpenGL 1.0 in 1992, which is why a new 3D API is needed. In the 24 years since that first release: GPUs have become more powerful and flexible, memory has become much cheaper, and multi-core CPUs are common. OpenGL has done "amazingly well" over that time, but it is showing its age at this point.

Multi-threaded programs are now commonplace, which makes OpenGL's state machine based on a singleton context kind of obsolete. Off-screen rendering is common as well. Beyond that, GPU hardware has become more standardized, so application developers don't want the API to hide the details of what the GPU is doing as OpenGL does.

Vulkan takes a different approach. It has an object-based API where there is no global state. All state is stored in the command buffer and there can be multiple command buffers. It is more explicit about what the GPU is doing: texture formats, memory management, and synchronization are all client-controlled. Those things are needed to support multi-threading, but also make drivers simpler.

Vulkan drivers do no error checking. There is a set of open-source, vendor-neutral validation layers that do much the same checking as is done in Mesa but they are meant to be disabled at runtime. The idea is for application developers to check their Vulkan code during development, so "why burn 10% of my CPU doing validation" when there are no errors in the Vulkan code?

There is a short distance between the API call and the driver in Vulkan, rather than traversing multiple layers as in Mesa. There is also a short distance between the driver function and actually putting data into the command buffer for the GPU. There are "no extra layers", Ekstrand said.

To handle multiple generations of hardware, each with its own packet format and packing scheme, the Vulkan driver has header files that are generated using Python scripts to process an XML representation of the formats. There is a function that uses that header file information to pack the command data into the buffer in the right way. It has debugging support that can assert() for various problems and the code can be run under Valgrind to find other kinds of problems.

To handle four separate Intel GPU generations, the code is compiled four times to create one version per generation. That allows the driver to keep up with new hardware more easily. The hardware-generation checks for each command function (as in the Mesa driver) are compiled away and the right thing is done for the generation in use. This is one example of where the team got to rethink things because it is a new, from-scratch driver.

One of the challenges faced by the team was in memory allocation. Vulkan provides a collection of heaps where clients can allocate VkDeviceMemory objects. The client can place VkImage or VkBuffer objects at explicit offsets within the VkDeviceMemory object. This doesn't map well to allocation from LibDRM, he said, but it does map well to Graphics Execution Manager (GEM) buffer objects. Other objects have small amounts of driver-allocated memory for state that the driver needs to track. The team had to figure out how to manage all those pieces of memory. Complicating matters was that the Intel hardware has different base addresses for different types of allocations (e.g. shaders, surface states), so the state information needs to be stored with others of the same type.

He and Høgsberg came up with a "crazy" memory allocation structure that they are pretty proud of, Ekstrand said. For device memory objects, GEM buffers are used; there is also a pool of GEM buffers that are used for back buffers. For the state objects, there are block pools that are allocated as a buffer object that grows in both directions as needed. The pools are initialized to provide objects of a specific size. Allocating from either end of the pool is required because of some hardware-specific restrictions.

The block pools are implemented as a 2GB memfd that gets mmap()-ed into the driver. An address in the middle is then turned into a GEM buffer object. The block pool is used to implement both a traditional "allocate and free" style state pool as well as a pool that is used for state that is associated with a command buffer. The latter pool has no free function, it simply gets reset when the command buffer is thrown away. It is a complicated infrastructure, but has worked well, he said.

Most hardware has support for compressed surfaces, but not all parts of the GPU understand all of the different formats. So a "resolve" operation is needed to decompress or recompress the surface at different points in the pipeline. Due to the multi-threaded nature of Vulkan, though, there is no real way to track when the resolves are needed on the CPU side. The Vulkan API provides two features ("render passes" and "layout transitions") that can help. Layout transitions are not currently used in the driver, but render passes delineate where resolves may be needed.

It is easier to write a Vulkan driver than one for OpenGL, Ekstrand said. The lack of error checking simplifies things to start with. The SPIR-V shader language is a bit easier to deal with than OpenGL's GLSL. Also, the Vulkan conformance tests consist of 115,000 tests that the driver developer doesn't have to write. It is a good set of tests, but there are still some holes, he said.

Some things are harder to do for Vulkan than for OpenGL. There is no CPU-side object state-tracking, for one thing. In addition, "applications have a lot more power for stupid". If the application is doing something wrong, which results in a bug filed against the driver, there is a good bit of work—without good tools—needed to track down the problem.

As far as sharing code between Vulkan and OpenGL drivers goes, there are a couple of different approaches. The approach taken was a "toolbox" that provides a number of different parts, from which a driver can be created. That approach has also provided better infrastructure for building other drivers in the future. Those looking for more details may want to view the YouTube video of the talk.

[I would like to thank the X.Org Foundation for sponsoring my travel to Helsinki for XDC.]

Comments (17 posted)

OpenType 1.8 and style attributes

By Nathan Willis
September 28, 2016

ATypI

In last week's look at the new revision of the OpenType font format, we focused primarily on the new variations font feature, which makes it possible to encode multiple design "masters" into a single font binary. This enables the renderer to generate a new font instance at runtime based on interpolating the masters in a particular permutation of their features (weight, width, slant, etc). Such new functionality will, at least in some cases, mean that application software will have to be reworked in order to present the available font variations to the end user in a meaningful fashion.

But there is another change inherent in the new feature that may not be as obvious at first glance. Variations fonts redefine the relationships between individual font files and font "families." There is a mechanism defined in the new standard to bridge the gap between the old world and the new, called the Style Attributes (STAT) table. For it to work in a meaningful fashion, though, it must be implemented by traditional, non-variations fonts as well—which may not be an easy sell.

There is no formal definition of a font family, but in general usage the term refers to a set of fonts that share core design principles and, in most cases, use a single name and come from the same designer or design team. The Ubuntu Font Family, for example, includes upright and italic fonts in four weights at the standard width, one weight of upright-only condensed width, and two weights (in upright and italic) of a monospaced variant.

The designers clearly present the fonts as a single conceptual unit, even though (for example) the monospaced version has several characters that use considerably different designs than the proportional version. Some people might argue that the monospace fonts are a separate family, and that together with the proportional fonts, they form a "superfamily." Since no one is in charge of the terminology, such disagreements happen. Similar ambiguities could be found in the Source Code Pro, Source Sans Pro, and Source Serif Pro fonts from Adobe, which were developed separately and take their design cues from unrelated historical typefaces.

An indisputable key to a font family, though, is the fact that the fonts belong together when they are presented to the user. In an OpenType variations font, there is a technical challenge at present but, conceptually, the task is easy: each of the various instances of the font comes from the same source and it can be addressed and otherwise treated as a set of coordinates in the overall design space: (weight=bold, width=normal, italic=no), for example, or (weight=750, width=200, italic=0), to be a bit more numerical. But there has never been a consistent way to map those sorts of design-space characteristics onto standard, non-variations font files. Doing so is the purpose of the STAT table.

Family matters

At the top level, the table lists all of the axes of variation used in the font family. Each axis has a string that can be displayed in user interfaces and an optional axisOrdering number. That ordering has a couple of possible interpretations. One is the order in which the axes should be sorted in a font name. For instance, if width sorts before weight, then a list would look like:

    Foo Condensed
    Foo Condensed Bold
    Foo 
    Foo Bold
    Foo Extended
    Foo Extended Bold
    

and so forth. If weight sorts before width, though, then one would see:

    Foo Condensed
    Foo 
    Foo Extended
    Foo Condensed Bold
    Foo Bold
    Foo Extended Bold

A different interpretation of the axisOrdering numbers would be to specify the order in which the various axes are shown in a font name. That is, whether to show "Foo Condensed Bold" or "Foo Bold Condensed" in the font menu.

Complicating this interpretation is the fact that OpenType already supports several other mechanisms with which to specify a font's name including all of those design attributes, via the name table. The three options are Name IDs 1 and 2, which can be used to specify a Font Family Name and Font Subfamily Name (respectively), Name IDs 16 and 17, which can be used for a Typographic Family Name and Typographic Subfamily Name, and Name IDs 21 and 22, which are for a Weight/Width/Slope (WWS) Family Name and WWS Subfamily Name. Each pair of Name ID entries can take any string, which are intended to be concatenated together in Family Subfamily fashion. The redundancy of multiple such similar options has not escaped the community's notice, of course; it remains for historical reasons.

Complicating matters even more is the fact that different software platforms interpret name table data in their own peculiar ways, such as when parsing and tokenizing the strings in Name IDs. In the OpenType session at ATypI 2016, Peter Constable noted that Microsoft's Graphics Device Interface (GDI) and Windows Presentation Foundation (WPF) each has its own approach to assembling the font name from the Name IDs, and CSS uses a different approach altogether. The obvious question, he said, is why add yet another possible naming mechanism to the mix. The answer is that STAT does not impose a hierarchical solution like the Family/Subfamily options in name do; it defines the variation axes and that is all. Whereas name table entries can be arbitrary strings that may or may not make sense, the thinking goes, at least STAT axes are well-defined and can be reasoned about.

The mapping problem

Confusing though the naming issues may be, a more practical feature of the STAT table is that fonts can provide a mapping between the numeric values defined for OpenType 1.8 axes and the names commonly used by the various font classification systems and shown in user interfaces. The predefined axes each have an expected range. Italic (ital) must be between zero and one; slant (slnt) must be between -90 and 90 degrees; optical size (opsz) and width (wdth) must be greater than zero; weight (wght) must be between one and 1000. During the interpolation process for variable fonts, all of these values get normalized to [-1,1], but these human-readable ranges were selected to better map to how existing font families are described, including the conventions of CSS, GDI, and WPF.

So it's quite simple to specify in the STAT table that the "Regular" weight of a font is 200, the "Semibold" is 450, and the "Bold" is 625, for example. The table even offers an ElidableAxisValueName flag to indicate that "Regular" (for example) can be dropped from the name shown in UIs, which is a nice convenience for end users.

Where things become trickier, however, is when a font family starts out with a few fonts files at first (possibly even just one) and adds more later. For example, consider a variations font that supports the weight and width axes, all in roman (non-italic) style. In that case, there would be no ital axis defined in the font file. But if a matching italic variations font is released later, then the original font's STAT table is suddenly incomplete, because it does not indicate where that first font lies on the font family's new roman-to-italic axis.

The OpenType 1.8 specification offers a fix for this through the STAT table. The newly released italic font should include (naturally) an ital variation axis, and the axis's record in STAT table would include the relevant entries for the new font, plus one entry for the old font as well. The old font's record gets marked with the OlderSiblingFontAttribute flag, which is meant to indicate to the application or operating system where the old font gets mapped into the new, expanded font family on the ital axis. In our simple example, the entry for the old font would be a zero on the ital axis, but lots of other permutations are certainly possible.

So this feature lets one font file supply data about a separate, older font file that software implementations are expected to read and adhere to. The specification does not dictate how a program should go about determining which older font file is the one referenced by entries flagged with OlderSiblingFontAttribute. Presumably, some Name ID(s) from the name table are involved but, as we have seen, there are several of those to choose from. And it is possible that more than one older font might need to be retroactively referenced in such a fashion—consider yet another new font added to our example above that adds an optical-size variant. That font would have to include OlderSiblingFontAttribute information for the older fonts as well.

Assuming that the old and new font files are released by the same (non-pathological) person and there are no naming conflicts with other fonts on the system, there should be no misunderstandings. But it is not quite clear how software should interpret matters when the font name in the new font file seems to match more than one old font file. And the specification recommends that all new non-variations font families supply the STAT table with OlderSiblingFontAttribute flags, too. For a traditional font family like the Ubuntu Font Family, there are lots of individual files to be built and distributed (13 in the Ubuntu Font's case), with a lot of tables that could get confused or out-of-sync as updates are installed.

Practically speaking, it will be quite some time before OpenType variations fonts become the norm on most users' systems. So type foundries can be expected to release variations-font versions of their binaries as well as sets of individual, non-variations font files. Getting the STAT tables right may take some time; deciphering the font-family information that the tables encode, on the software side, may take some time as well.

Comments (4 posted)

Page editor: Jonathan Corbet

Security

The trouble with new TLS version numbers

September 28, 2016

This article was contributed by Hanno Böck

The TLS working group in the IETF is currently working on the next version of the encryption protocol: TLS 1.3. The new protocol will bring performance improvements by avoiding round trips and will deprecate a lot of dangerous cryptographic constructions. But, apart from technical improvements, it will also bring something that may seem trivial, but that could cause a lot of trouble: a new version number. That will probably lead to a redesign of the TLS version-negotiation mechanism.

When a new version of a protocol gets introduced, there must be some mechanism to keep compatibility with existing implementations. Not everyone will move to TLS 1.3; many legacy implementations will keep using TLS 1.2 or older versions for years to come.

TLS uses a version mechanism that may seem relatively simple, but it has been the source of a surprising number of problems. When a client connects to a server, it sends the highest version number it supports in the ClientHello message. The server can reply with any version equal or lower than that. Therefore, if a client connects to a server with a maximum version number of 1.2 and the server only supports TLS 1.0, it will answer with that version. As long as the client still has compatibility for TLS 1.0, a successful connection can be established.

This ideal case often doesn't occur, however, due to faulty server implementations. Many servers simply fail once one tries to connect with a higher TLS version than they support. The failure can happen in a variety of ways. Some servers terminate the connection on a TCP level or send a TLS error alert, others simply wait until a timeout happens. Some also successfully send a TLS ServerHello and almost complete a handshake, but fail later during verification of the FinishedMessage, which is the last part of the handshake. All these behaviors are bugs in the server software.

Version intolerance

This problem is known as "version intolerance" and it has cropped up every time browsers and TLS implementations have introduced new protocol versions. An old web page documents the problem; it was written by Netscape in 2003 and can be found in the Mozilla wiki. Most of the affected devices were enterprise TLS appliances, although occasionally free implementations like OpenSSL were also affected.

Browser vendors have reacted to these problems with a questionable strategy: after a connection failure, the browser tries to reconnect with a lower TLS or SSL version. Back then, the only versions in widespread use were SSL 3 and TLS 1.0. While this avoided problems with broken servers, it introduced another problem: these downgrades occasionally happened because of dropped packets due to bad network connections. Therefore protocol features that were only supported in TLS 1.0 stopped working on an irregular basis.

One extension that TLS 1.0 introduced is called Server Name Indication (SNI) and it removed a limitation of the old SSL protocol, by allowing multiple domains with different certificates to be hosted on the same IP address. SNI allows shared hosting services that often host hundreds of websites on the same IP to deploy HTTPS. The deployment of SNI was severely hampered by the browser's version fallbacks in TLS, because randomly website visitors would see the wrong certificate due to a connection downgrade to SSL 3.

The version fallbacks also introduced security issues. If browsers try to reconnect with a lower TLS or SSL version, then a man-in-the-middle attacker can force these version downgrades by blocking ClientHello messages with higher version numbers. At the Black Hat USA conference in 2014, Antoine Delignat-Lavaud presented an attack called "virtual host confusion" (YouTube video, paper [PDF]). The attack exploited the fact that an attacker can disable SNI by a forced version downgrade.

Later that year, Bodo Möller, Thai Duong, and Krzysztof Kotowicz discovered the POODLE attack — a padding oracle attack that exploits the fact that in SSL 3 the padding of the encryption was undefined and could have any value. But that alone wouldn't have been very interesting, because at that time SSL 3 was rarely used. In combination with version fallbacks, however, POODLE became a severe issue because almost all servers and clients still supported SSL 3. With version downgrades it was easy to force a connection to use the old protocol. The POODLE paper introduced the term "protocol downgrade dance" for the downgrade behavior of browsers.

In response to these kinds of problems, a mechanism called "Signaling Cipher Suite Value" (SCSV) was introduced. By including a special cipher suite value, servers could signal to clients that they weren't defective, thus if a connection used a version downgrade it shouldn't be established. SCSV got standardized as RFC 7507, but it quickly became almost obsolete, because browser vendors decided that they could get rid of the questionable version fallbacks entirely.

SCSV is notable, though, because it is a feature for the TLS standard that exists solely to work around buggy implementations. But it's not the only such feature. Some devices from the company F5 fail to allow connections if a handshake has a size between 256 and 512 bytes. Therefore a padding extension was introduced that simply expands the handshake to avoid those sizes. However, it later turned out that this solution would cause other implementations to fail, because they don't accept handshakes larger than 512 bytes.

The return of fallbacks

Despite all the drama version fallbacks have caused, they may make a comeback. In a recent blog post, Google developer Adam Langley commented:

It's taken about 15 years to get to the point where web browsers don't have to work around broken version negotiation in TLS and that's mostly because we only have three active versions of TLS. When we try to add a fourth (TLS 1.3) in the next year, we'll have to add back the workaround, no doubt.

Langley was certain that there is no way to avoid TLS version fallbacks when TLS 1.3 gets introduced. The reason is that currently about three percent of the major web pages have problems with TLS 1.3 handshakes. In theory, browser vendors could skip the fallbacks and simply break non-compliant sites, however that's unlikely to happen. A browser that breaks a large number of sites and devices will likely face a backlash from users and may push those users to choose another browser. Chrome has often faced heavy criticism from users when it deprecated insecure mechanisms in the past. When Google deprecated insecure Diffie-Hellman parameters, it broke connections to a Cisco RV042G router. While it is obvious that Cisco was at fault here, the user reactions that can be seen in Chrome's public forum blamed Google for its effort to make the Internet more secure.

TLS 1.3 contains a mechanism similar to SCSV that could avoid the worst consequences of version intolerance. By sending a specific value in the random number field of the handshake, a server can indicate that it doesn't want downgraded connections. Still this is far from ideal, as it adds another layer of complexity. Ideally vendors should just fix their TLS implementations.

Vendor responses

The vendors responsible for broken version negotiations mostly don't seem to care a lot. I have tried to identify affected vendors. Many of the buggy web pages use Citrix Netscaler devices. Citrix has informed me that it is aware of this problem, although it doesn't consider it to be a security issue. Citrix was unable to give any timeline on when this bug will be fixed.

Several products from IBM, among them IBM HTTP Server and Lotus Domino, are also affected. At first IBM security simply denied that there is a problem and claimed that the issue was already fixed in the current HTTP Server release. After informing them that I actually tested with the latest release and that it is still affected, the company looked into it. IBM informed me that it doesn't treat the issue as a security vulnerability. IBM was unable to give a concrete timeline when a fix will be available, but informed me that it will likely happen with the next version of its TLS implementation, GSKit, which will be released by the end of the year. A while later, IBM went back into denial mode and informed me that the issue was closed, because the company was unable to reproduce it — after it already confirmed that it was working on a fix.

So two major vendors didn't consider this issue a security vulnerability and didn't see any urgency to tackle it. While it is true that this issue itself doesn't cause a security problem for its device owner,past experience has shown that down the line these bugs can cause security issues, because they force client implementations to implement dangerous behavior.

The third vendor that could be identified was Cisco and version intolerance affects their ACE load-balancer devices. These devices are out of support and no longer receive updates. It was made clear to me that Cisco won't consider any exceptions to its end-of-life policy. So people who still use these devices will have to live with this bug, with no way of fixing it. Cisco did promise to verify whether devices that are still supported are also affected by this bug. As the software of these devices is proprietary, there is no way for users to fix these bugs themselves.

I also tried to contact operators of major affected web pages, but with limited success. The most notable web pages that fail with a TLS 1.3 handshake are apple.com, ebay.com, and various localized versions of PayPal. In many cases, only connections without a leading www are affected. The reason for that is probably that the www version of a site is often transferred to a content delivery network, while the domain without www is delivered by another device that simply forwards connections.

Apple and eBay didn't answer questions about their version intolerant web services; both sites are still affected. PayPal simply said that TLS issues aren't covered by their bug bounty program, but refused to discuss the issue any further.

Server operators can test their server for TLS version intolerance with the SSL Labs test or with the testssl.sh tool. Both tests have limitations and don't catch all instances of version intolerance. The most reliable way to test right now is to use the Beta or Dev channel release of Chrome and manually enable TLS 1.3 (via chrome://flags option "Maximum TLS version enabled") or use Firefox Nightly (set "security.tls.version.max" and "security.tls.version.fallback-limit" to "4" in about:config). Trying to access version intolerant sites that usually support HTTPS will result in a connection failure.

Rethinking version negotiation

Given the situation, Google developer David Benjamin proposed a different route with a redesign of the whole version negotiation mechanism. He suggested that the version could be negotiated with an extension that sends a list of supported newer versions. Obviously the same problem with version intolerance could happen again with such a solution in the future: servers may simply not work if they see any version in the extension that they don't know.

To avoid this, Benjamin proposed that browsers could randomly send bogus version numbers that get reserved with a guarantee that they will never be used for any real TLS version. Any correct implementation should just ignore all unsupported version values. Bugs in servers that fail when they see a version number they don't support would likely be discovered much earlier, so they probably will never make it into production releases. It is still possible that vendors could implement this in the wrong way by just ignoring the reserved bogus version numbers. However, it is hardly imaginable that one does so without outright trying to create non-compliant software.

Benjamin also proposed a generalized variant of this mechanism under the name Generate Random Extensions And Sustain Extensibility (GREASE). The same way that bogus version numbers are sent could be used for extensions and cipher suites to avoid bugs in those areas.

The proposal for a TLS version negotiation via an extension was received with skepticism during the last IETF conference in Berlin. It would further complicate an already complicated handshake. The existing ClientHello already contains two version numbers, the TLS record layer version and the real ClientHello version. The TLS record layer version never had any real meaning, so most implementations simply set it to the version value of TLS 1.0 and ignore it. TLS 1.3 will make this official and says that it must be ignored. What further adds to confusion is that the version numbers sent over the wire don't match the version numbers of the protocol. For historic reasons — all versions of TLS came after SSL version 3 — TLS 1.0 is indicated with the value pair {3, 1}, TLS 1.3 will be {3, 4}.

The TLS community was therefore uneasy with the idea of adding another layer of complexity. But Benjamin's latest proposal got more support on the mailing list than during the IETF conference. It has now the status of a rough consensus and will most likely be part of TLS 1.3.

The GREASE strategy is an interesting new paradigm for designing protocols in an ecosystem where many vendors ship low-quality products that implement specifications incorrectly. There is a need to stay compatible with an existing infrastructure of defective devices. Similar strategies have been used in other cases. HTTP/2, for example, is not negotiated over a normal HTTP request, instead an extension mechanism for TLS called Application-Layer Protocol Negotiation (ALPN) is used to negotiate the higher version.

David Benjamin's GREASE concept goes one step further and tries anticipate potential failures. He has tried to design a protocol where bugs will show up before products are shipped. It'll be interesting to see whether this leads to a less fragile TLS ecosystem.

Comments (5 posted)

Brief items

Security quotes of the week

John Gilmore, an American entrepreneur and civil libertarian, once famously quipped that “the Internet interprets censorship as damage and routes around it.” This notion undoubtedly rings true for those who see national governments as the principal threats to free speech.

However, events of the past week have convinced me that one of the fastest-growing censorship threats on the Internet today comes not from nation-states, but from super-empowered individuals who have been quietly building extremely potent cyber weapons with transnational reach.

More than 20 years after Gilmore first coined that turn of phrase, his most notable quotable has effectively been inverted — “Censorship can in fact route around the Internet.” The Internet can’t route around censorship when the censorship is all-pervasive and armed with, for all practical purposes, near-infinite reach and capacity. I call this rather unwelcome and hostile development the “The Democratization of Censorship.”

Brian Krebs

Instead, the attacks against KrebsOnSecurity harness so-called Internet-of-things devices—think home routers, webcams, digital video recorders, and other everyday appliances that have Internet capabilities built into them. Manufacturers design these devices to be as inexpensive and easy-to-use as possible. Consumers often have little technical skill. As a result, the devices frequently come with bug-ridden firmware that never gets updated and easy-to-guess login credentials that never get changed. Their lax security and always-connected status makes the devices easy to remotely commandeer by people who turn them into digital cannons that spray the Internet with shrapnel.
Dan Goodin

The RecentFiles object gives access to the history of recent documents. Most users, unless they just installed Word, are going to have opened more than two documents. However, on a testing virtual machine (VM), the software is normally not "broken in". When the VM is initially created, software is installed, maybe opened once or twice to make sure it works, and then the state is saved and every time a test needs to be made, that state is loaded again. These VM images may then be used in automated analysis and testing tools which execute malware and see how they behave. If malware can be smart enough to know when it's being tested in a VM, it can avoid doing anything suspicious or malicious and thereby increase the time it takes to be detected by such tools.
Caleb Fenton

Comments (6 posted)

OpenSSL security advisory for September 26

This OpenSSL security advisory is notable in that it's the second one in four days; sites that updated after the first one may need to do so again. "This security update addresses issues that were caused by patches included in our previous security update, released on 22nd September 2016. Given the Critical severity of one of these flaws we have chosen to release this advisory immediately to prevent upgrades to the affected version, rather than delaying in order to provide our usual public pre-notification."

Comments (12 posted)

New vulnerabilities

bash: code execution

Package(s):bash CVE #(s):CVE-2016-0634
Created:September 26, 2016 Updated:December 13, 2016
Description: From the Red Hat bugzilla:

A vulnerability was found in a way bash expands the $HOSTNAME. Injecting the hostname with malicious code would cause it to run each time bash expanded \h in the prompt string.

Alerts:
openSUSE openSUSE-SU-2016:2715-1 bash 2016-11-03
Fedora FEDORA-2016-62e6c462ef bash 2016-09-25
Fedora FEDORA-2016-a822b472c4 bash 2016-09-23
Gentoo 201612-39 bash 2016-12-13
openSUSE openSUSE-SU-2016:2961-1 bash 2016-12-01
Mageia MGASA-2016-0393 bash 2016-11-21

Comments (3 posted)

bind: denial of service

Package(s):bind CVE #(s):CVE-2016-2776
Created:September 28, 2016 Updated:October 25, 2016
Description: From the CVE entry:

buffer.c in named in ISC BIND 9 before 9.9.9-P3, 9.10.x before 9.10.4-P3, and 9.11.x before 9.11.0rc3 does not properly construct responses, which allows remote attackers to cause a denial of service (assertion failure and daemon exit) via a crafted query.

Alerts:
Fedora FEDORA-2016-cbef6c8619 bind99 2016-10-24
Fedora FEDORA-2016-3af8b344f1 bind 2016-10-24
Red Hat RHSA-2016:2099-01 bind 2016-10-25
Oracle ELSA-2016-2094 bind97 2016-10-21
Oracle ELSA-2016-2093 bind 2016-10-21
Gentoo 201610-07 bind 2016-10-11
Debian-LTS DLA-645-1 bind9 2016-10-05
Mageia MGASA-2016-0332 bind 2016-10-04
Fedora FEDORA-2016-cca77daf70 bind99 2016-10-03
Fedora FEDORA-2016-2d9825f7c1 bind 2016-10-01
Scientific Linux SLSA-2016:1945-1 bind97 2016-09-28
Scientific Linux SLSA-2016:1944-1 bind 2016-09-28
Oracle ELSA-2016-1945 bind97 2016-09-28
Oracle ELSA-2016-1944 bind 2016-09-28
Oracle ELSA-2016-1944 bind 2016-09-28
Oracle ELSA-2016-1944 bind 2016-09-28
CentOS CESA-2016:1945 bind97 2016-09-28
CentOS CESA-2016:1944 bind 2016-09-28
CentOS CESA-2016:1944 bind 2016-09-28
CentOS CESA-2016:1944 bind 2016-09-28
Ubuntu USN-3088-1 bind9 2016-09-27
SUSE SUSE-SU-2016:2405-1 bind 2016-09-27
SUSE SUSE-SU-2016:2401-1 bind 2016-09-27
SUSE SUSE-SU-2016:2399-1 bind 2016-09-27
Slackware SSA:2016-271-01 bind 2016-09-27
openSUSE openSUSE-SU-2016:2406-1 bind 2016-09-28
Debian DSA-3680-1 bind9 2016-09-27
Arch Linux ASA-201609-29 bind 2016-09-27
Red Hat RHSA-2016:1945-01 bind97 2016-09-28
Red Hat RHSA-2016:1944-01 bind 2016-09-28

Comments (none posted)

drupal7-google_analytics: cross-site scripting

Package(s):drupal7-google_analytics CVE #(s):
Created:September 22, 2016 Updated:September 28, 2016
Description: The drupal "Google Analytics" module suffers from a cross-site scripting vulnerability. See this advisory for details. "This vulnerability is mitigated by the fact that an attacker must have a role with the permission 'Administer Google Analytics'."
Alerts:
Fedora FEDORA-2016-a3cc693fba drupal7-google_analytics 2016-09-21
Fedora FEDORA-2016-df1252db90 drupal7-google_analytics 2016-09-22

Comments (none posted)

drupal panels: multiple vulnerabilities

Package(s):drupal7-panels CVE #(s):
Created:September 22, 2016 Updated:September 28, 2016
Description: The Drupal "Panels" contrib module suffers from multiple "critical" vulnerabilities. "Much of the functionality to modify these panels rely on backend routes that call administrative forms. These forms did not provide any access checks, or site specific encoded urls. This can allow an attacker to guess the backend url as an anonymous user and see data loaded for the form."
Alerts:
Fedora FEDORA-2016-703a5e621c drupal7-panels 2016-09-21
Fedora FEDORA-2016-c01e32e071 drupal7-panels 2016-09-22

Comments (none posted)

dwarfutils: two vulnerabilities

Package(s):dwarfutils CVE #(s):CVE-2016-7510 CVE-2016-7511
Created:September 26, 2016 Updated:October 10, 2016
Description: From the Debian LTS advisory:

It was discovered that there were out-of-bounds read issues in dwarfutils, a library to consume and produce DWARF debug information.

Alerts:
Fedora FEDORA-2016-328754be1c libdwarf 2016-10-09
Debian-LTS DLA-635-1 dwarfutils 2016-09-24
Arch Linux ASA-201612-4 libdwarf 2016-12-04

Comments (none posted)

firefox: multiple vulnerabilities

Package(s):firefox CVE #(s):CVE-2016-5256 CVE-2016-5271 CVE-2016-5273 CVE-2016-5275 CVE-2016-5279 CVE-2016-5282 CVE-2016-5283
Created:September 22, 2016 Updated:September 28, 2016
Description: Among the many vulnerabilities fixed in the firefox 49 release are CVE-2016-5256 (memory corruption bugs), CVE-2016-5271 (information disclosure), CVE-2016-5273 (code execution), CVE-2016-5275 (code execution), CVE-2016-5279 (information disclosure), CVE-2016-5282 (loading of favicons via non-whitelisted protocols), and CVE-2016-5283 (information disclosure).
Alerts:
openSUSE openSUSE-SU-2016:2386-1 firefox, nss 2016-09-26
Fedora FEDORA-2016-a6672dbd40 firefox 2016-09-25
Fedora FEDORA-2016-de277b9183 firefox 2016-09-25
openSUSE openSUSE-SU-2016:2368-1 firefox, nss 2016-09-24
Ubuntu USN-3076-1 firefox 2016-09-22
Arch Linux ASA-201609-22 firefox 2016-09-22
Mageia MGASA-2017-0059 iceape 2017-02-20
Gentoo 201701-15 firefox thunderbird 2017-01-04
Gentoo 201701-15 firefox 2017-01-03

Comments (none posted)

freerdp: denial of service

Package(s):freerdp CVE #(s):CVE-2013-4118
Created:September 28, 2016 Updated:October 4, 2016
Description: From the openSUSE advisory:

Add a NULL pointer check to fix a server crash. See the openSUSE bug report for more information.

Alerts:
Mageia MGASA-2016-0331 freerdp 2016-10-04
openSUSE openSUSE-SU-2016:2400-1 freerdp 2016-09-27
openSUSE openSUSE-SU-2016:2402-1 freerdp 2016-09-27

Comments (none posted)

Horde: cross-site scripting

Package(s):php-horde-Horde-Mime-Viewer CVE #(s):
Created:September 22, 2016 Updated:September 28, 2016
Description: According to this commit, Horde renders SVG images in the browser in a way that is subject to cross-site scripting attacks.
Alerts:
Fedora FEDORA-2016-a506d298bf php-horde-Horde-Mime-Viewer 2016-09-21
Fedora FEDORA-2016-d9fc52c251 php-horde-Horde-Mime-Viewer 2016-09-22

Comments (none posted)

Horde: cross-site scripting

Package(s):php-horde-Horde-Text-Filter CVE #(s):
Created:September 22, 2016 Updated:September 28, 2016
Description: According to the Red Hat bug tracker, Horde suffers from a "possible XSS vulnerability with data:html and form action was found in Text Filter".
Alerts:
Fedora FEDORA-2016-084620f386 php-horde-Horde-Text-Filter 2016-09-21
Fedora FEDORA-2016-58bc2a649a php-horde-Horde-Text-Filter 2016-09-22

Comments (none posted)

imagemagick: code execution

Package(s):imagemagick CVE #(s):
Created:September 26, 2016 Updated:September 28, 2016
Description: From the Debian advisory:

This updates fixes several vulnerabilities in imagemagick: Various memory handling problems and cases of missing or incomplete input sanitising may result in denial of service or the execution of arbitrary code if malformed SIXEL, PDB, MAP, SGI, TIFF and CALS files are processed.

Alerts:
Debian DSA-3675-1 imagemagick 2016-09-23

Comments (none posted)

irssi: heap corruption

Package(s):irssi CVE #(s):CVE-2016-7045 CVE-2016-7044
Created:September 22, 2016 Updated:October 11, 2016
Description: According to the irssi advisory, a missing length check can cause a range of memory to be overwritten. Evidently, only zeroes can be written, so opinions differ on whether this vulnerability is exploitable for code execution.
Alerts:
openSUSE openSUSE-SU-2016:2524-1 irssi 2016-10-13
Fedora FEDORA-2016-0551065fe0 irssi 2016-10-11
Fedora FEDORA-2016-a64716084e irssi 2016-10-10
Ubuntu USN-3086-1 irssi 2016-09-21
Slackware SSA:2016-265-03 irssi 2016-09-21
Debian DSA-3672-1 irssi 2016-09-21
Arch Linux ASA-201609-20 irssi 2016-09-22

Comments (none posted)

mactelnet: code execution

Package(s):mactelnet CVE #(s):CVE-2016-7115
Created:September 26, 2016 Updated:September 28, 2016
Description: From the CVE entry:

Buffer overflow in the handle_packet function in mactelnet.c in the client in MAC-Telnet 0.4.3 and earlier allows remote TELNET servers to execute arbitrary code via a long string in an MT_CPTYPE_ENCRYPTIONKEY control packet.

Alerts:
Debian-LTS DLA-639-1 mactelnet 2016-09-25

Comments (none posted)

mod_cluster: "remote exploits"

Package(s):mod_cluster CVE #(s):
Created:September 22, 2016 Updated:September 28, 2016
Description: The Fedora advisory says: "Fixed remote exploits in Apache HTTP Server mod_manager and mod_proxy_cluster modules". Further information appears to be unavailable.
Alerts:
Fedora FEDORA-2016-249e92f700 mod_cluster 2016-09-22

Comments (none posted)

mozilla: denial of service

Package(s):firefox, nss CVE #(s):CVE-2016-2827
Created:September 26, 2016 Updated:September 28, 2016
Description: From the CVE entry:

The mozilla::net::IsValidReferrerPolicy function in Mozilla Firefox before 49.0 allows remote attackers to cause a denial of service (out-of-bounds read and application crash) via a Content Security Policy (CSP) referrer directive with zero values.

Alerts:
openSUSE openSUSE-SU-2016:2386-1 firefox, nss 2016-09-26
openSUSE openSUSE-SU-2016:2368-1 firefox, nss 2016-09-24
Mageia MGASA-2017-0059 iceape 2017-02-20
Gentoo 201701-15 firefox thunderbird 2017-01-04
Gentoo 201701-15 firefox 2017-01-03

Comments (none posted)

openssl: multiple vulnerabilities

Package(s):openssl CVE #(s):CVE-2016-2177 CVE-2016-2178 CVE-2016-2179 CVE-2016-2180 CVE-2016-2181 CVE-2016-2182 CVE-2016-2183 CVE-2016-6302 CVE-2016-6303 CVE-2016-6304 CVE-2016-6305 CVE-2016-6306
Created:September 22, 2016 Updated:January 23, 2017
Description: The September 22 2016 OpenSSL advisory lists a number of problems fixed in the 1.1.0a, 1.0.2i, and 1.0.1u releases. The most serious would appear to be CVE-2016-6305, a "moderate" denial-of-service vulnerability.
Alerts:
Red Hat RHSA-2016:2802-01 openssl 2016-11-17
openSUSE openSUSE-SU-2016:2788-1 mysql-community-server 2016-11-12
openSUSE openSUSE-SU-2016:2769-1 mysql-community-server 2016-11-10
SUSE SUSE-SU-2016:2470-2 nodejs4 2016-11-01
Oracle ELSA-2016-3627 openssl 2016-10-14
openSUSE openSUSE-SU-2016:2537-1 compat-openssl098 2016-10-14
openSUSE openSUSE-SU-2016:2496-1 nodejs 2016-10-11
Mageia MGASA-2016-0338 openssl 2016-10-12
Fedora FEDORA-2016-97454404fe openssl 2016-10-11
SUSE SUSE-SU-2016:2469-1 openssl1 2016-10-06
SUSE SUSE-SU-2016:2470-1 nodejs4 2016-10-06
SUSE SUSE-SU-2016:2468-1 compat-openssl098 2016-10-06
SUSE SUSE-SU-2016:2458-1 openssl 2016-10-05
Scientific Linux SLSA-2016:1940-1 openssl 2016-09-28
CentOS CESA-2016:1940 openssl 2016-09-29
CentOS CESA-2016:1940 openssl 2016-09-28
SUSE SUSE-SU-2016:2394-1 openssl 2016-09-27
Oracle ELSA-2016-1940 openssl 2016-09-27
Oracle ELSA-2016-1940 openssl 2016-09-27
openSUSE openSUSE-SU-2016:2407-1 openssl 2016-09-28
Fedora FEDORA-2016-a555159613 openssl 2016-09-28
SUSE SUSE-SU-2016:2387-1 openssl 2016-09-26
openSUSE openSUSE-SU-2016:2391-1 openssl 2016-09-27
Arch Linux ASA-201609-23 openssl 2016-09-26
Arch Linux ASA-201609-24 lib32-openssl 2016-09-26
Red Hat RHSA-2016:1940-01 openssl 2016-09-27
Ubuntu USN-3087-2 openssl 2016-09-23
Debian-LTS DLA-637-1 openssl 2016-09-25
Debian DSA-3673-2 openssl 2016-09-23
Slackware SSA:2016-266-01 openssl 2016-09-22
Ubuntu USN-3087-1 openssl 2016-09-22
Debian DSA-3673-1 openssl 2016-09-22
openSUSE openSUSE-SU-2017:0513-1 java-1_7_0-openjdk 2017-02-19
SUSE SUSE-SU-2017:0490-1 java-1_7_0-openjdk 2017-02-17
Ubuntu USN-3198-1 openjdk-6 2017-02-15
SUSE SUSE-SU-2017:0460-1 java-1_8_0-ibm 2017-02-14
CentOS CESA-2017:0269 java-1.7.0-openjdk 2017-02-13
CentOS CESA-2017:0269 java-1.7.0-openjdk 2017-02-13
CentOS CESA-2017:0269 java-1.7.0-openjdk 2017-02-13
Scientific Linux SLSA-2017:0269-1 java-1.7.0-openjdk 2017-02-13
Red Hat RHSA-2017:0269-01 java-1.7.0-openjdk 2017-02-13
Ubuntu USN-3194-1 openjdk-7 2017-02-08
Mageia MGASA-2017-0041 java-1.8.0-openjdk 2017-02-05
openSUSE openSUSE-SU-2017:0374-1 java-1_8_0-openjdk 2017-02-03
Ubuntu USN-3181-1 openssl 2017-01-31
SUSE SUSE-SU-2017:0346-1 java-1_8_0-openjdk 2017-01-31
Ubuntu USN-3179-1 openjdk-8 2017-01-25
Gentoo 201701-65 oracle-jre-bin 2017-01-25
Scientific Linux SLSA-2017:0180-1 java-1.8.0-openjdk 2017-01-24
Oracle ELSA-2017-0180 java-1.8.0-openjdk 2017-01-20
Oracle ELSA-2017-0180 java-1.8.0-openjdk 2017-01-20
CentOS CESA-2017:0180 java-1.8.0-openjdk 2017-01-21
CentOS CESA-2017:0180 java-1.8.0-openjdk 2017-01-21
Red Hat RHSA-2017:0180-01 java-1.8.0-openjdk 2017-01-20
Red Hat RHSA-2017:0175-01 java-1.8.0-oracle 2017-01-19
Red Hat RHSA-2017:0176-01 java-1.7.0-oracle 2017-01-19
Red Hat RHSA-2017:0177-01 java-1.6.0-sun 2017-01-19
Slackware SSA:2016-363-01 python 2016-12-28
Gentoo 201612-16 openssl 2016-12-07
Mageia MGASA-2016-0408 virtualbox 2016-12-05

Comments (none posted)

openssl: multiple vulnerabilities

Package(s):openssl CVE #(s):CVE-2016-6305 CVE-2016-6307 CVE-2016-6308
Created:September 23, 2016 Updated:September 28, 2016
Description:

From the OpenSSL advisory:

CVE-2016-6305 - OpenSSL 1.1.0 SSL/TLS will hang during a call to SSL_peek() if the peer sends an empty record. This could be exploited by a malicious peer in a Denial Of Service attack.

CVE-2016-6307 - A TLS message includes 3 bytes for its length in the header for the message. This would allow for messages up to 16Mb in length. Messages of this length are excessive and OpenSSL includes a check to ensure that a peer is sending reasonably sized messages in order to avoid too much memory being consumed to service a connection. A flaw in the logic of version 1.1.0 means that memory for the message is allocated too early, prior to the excessive message length check. Due to way memory is allocated in OpenSSL this could mean an attacker could force up to 21Mb to be allocated to service a connection. This could lead to a Denial of Service through memory exhaustion.

CVE-2016-6308 - A DTLS message includes 3 bytes for its length in the header for the message. This would allow for messages up to 16Mb in length. Messages of this length are excessive and OpenSSL includes a check to ensure that a peer is sending reasonably sized messages in order to avoid too much memory being consumed to service a connection. A flaw in the logic of version 1.1.0 means that memory for the message is allocated too early, prior to the excessive message length check. Due to way memory is allocated in OpenSSL this could mean an attacker could force up to 21Mb to be allocated to service a connection. This could lead to a Denial of Service through memory exhaustion.

Alerts:
Slackware SSA:2016-266-01 openssl 2016-09-22
Gentoo 201612-16 openssl 2016-12-07
Mageia MGASA-2016-0408 virtualbox 2016-12-05

Comments (none posted)

openssl: denial of service

Package(s):openssl CVE #(s):CVE-2016-7052
Created:September 27, 2016 Updated:September 28, 2016
Description: From the CVE entry:

crypto/x509/x509_vfy.c in OpenSSL 1.0.2i allows remote attackers to cause a denial of service (NULL pointer dereference and application crash) by triggering a CRL operation.

Alerts:
SUSE SUSE-SU-2016:2470-2 nodejs4 2016-11-01
openSUSE openSUSE-SU-2016:2496-1 nodejs 2016-10-11
Fedora FEDORA-2016-97454404fe openssl 2016-10-11
SUSE SUSE-SU-2016:2470-1 nodejs4 2016-10-06
Fedora FEDORA-2016-a555159613 openssl 2016-09-28
Arch Linux ASA-201609-30 openssl 2016-09-28
Arch Linux ASA-201609-28 lib32-openssl 2016-09-27
Slackware SSA:2016-270-01 openssl 2016-09-26
Gentoo 201612-16 openssl 2016-12-07
Mageia MGASA-2016-0408 virtualbox 2016-12-05

Comments (none posted)

openvas-libraries: multiple vulnerabilities

Package(s):openvas-libraries CVE #(s):
Created:September 23, 2016 Updated:September 28, 2016
Description:

From the OpenVAS release notes:

A number of memory leaks have been fixed.

A bug which caused NASL arrays to be freed improperly causing memory corruption under certain circumstances has been fixed.

Alerts:
Fedora FEDORA-2016-63633 openvas-libraries 2016-09-23
Fedora FEDORA-2016-b9ab1def88 openvas-libraries 2016-09-23

Comments (none posted)

openvas-scanner: denial of service

Package(s):openvas-scanner CVE #(s):
Created:September 23, 2016 Updated:September 28, 2016
Description:

From the OpenVAS release notes:

This release addresses a segmentation fault discovered after the release of OpenVAS Scanner 5.0.6 which could result in hanging or failing scans under certain circumstances.

Alerts:
Fedora FEDORA-2016-63633 openvas-scanner 2016-09-23
Fedora FEDORA-2016-b9ab1def88 openvas-scanner 2016-09-23

Comments (none posted)

pidgin: mysterious vulnerabilities

Package(s):pidgin CVE #(s):CVE-2016-1000030 CVE-2016-2379
Created:September 22, 2016 Updated:September 28, 2016
Description: Pidgin suffers from a hashed-password disclosure vulnerability (said hash being usable to login via a replay attack) and a problem described only as "X.509 certificates Improperly Imported" (CVE-2016-1000030).
Alerts:
Slackware SSA:2016-266-02 pidgin 2016-09-22
Slackware SSA:2016-265-01 pidgin 2016-09-21
Gentoo 201701-38 pidgin 2017-01-17

Comments (none posted)

policycoreutils: sandbox escape

Package(s):policycoreutils CVE #(s):CVE-2016-7545
Created:September 26, 2016 Updated:November 23, 2016
Description: From the Debian LTS advisory:

It was discovered that there was a sandbox escape via the "TIOCSTI" ioctl in policycoreutils, a set of programs required for the basic operation of an SELinux-based system.

Alerts:
Oracle ELSA-2016-2702 policycoreutils 2016-11-14
Oracle ELSA-2016-2702 policycoreutils 2016-11-14
Red Hat RHSA-2016:2702-01 policycoreutils 2016-11-14
Debian-LTS DLA-638-1 policycoreutils 2016-09-25
Scientific Linux SLSA-2016:2702-1 policycoreutils 2016-11-21
CentOS CESA-2016:2702 policycoreutils 2016-11-19

Comments (none posted)

python-django: cross-site request forgery

Package(s):python-django CVE #(s):CVE-2016-7401
Created:September 27, 2016 Updated:October 24, 2016
Description: From the Debian advisory:

Sergey Bobrov discovered that cookie parsing in Django and Google Analytics interacted such a way that an attacker could set arbitrary cookies. This allows other malicious web sites to bypass the Cross-Site Request Forgery (CSRF) protections built into Django.

Alerts:
Arch Linux ASA-201610-12 python2-django 2016-10-21
Arch Linux ASA-201610-13 python-django 2016-10-21
Fedora FEDORA-2016-3795497354 python-django 2016-10-11
Fedora FEDORA-2016-5706eeb875 python-django 2016-10-10
Red Hat RHSA-2016:2038-01 python-django 2016-10-10
Red Hat RHSA-2016:2039-01 python-django 2016-10-10
Red Hat RHSA-2016:2040-01 python-django 2016-10-10
Red Hat RHSA-2016:2041-01 python-django 2016-10-10
Debian-LTS DLA-DLA-649-1 python-django 2016-10-06
Mageia MGASA-2016-0334 python-django 2016-10-04
Ubuntu USN-3089-1 python-django 2016-09-27
Debian DSA-3678-1 python-django 2016-09-26

Comments (none posted)

qemu: multiple vulnerabilities

Package(s):qemu CVE #(s):CVE-2016-6490 CVE-2016-6833 CVE-2016-6834 CVE-2016-6836 CVE-2016-6888 CVE-2016-7156 CVE-2016-7157 CVE-2016-7422
Created:September 26, 2016 Updated:September 28, 2016
Description: From the Gentoo advisory:

Multiple vulnerabilities have been discovered in QEMU. Local users within a guest QEMU environment can execute arbitrary code within the host or a cause a Denial of Service condition of the QEMU guest process.

Alerts:
Ubuntu USN-3125-1 qemu, qemu-kvm 2016-11-09
openSUSE openSUSE-SU-2016:2642-1 qemu 2016-10-26
SUSE SUSE-SU-2016:2589-1 qemu 2016-10-21
Fedora FEDORA-2016-a56fb613a8 qemu 2016-10-18
SUSE SUSE-SU-2016:2533-1 xen 2016-10-13
SUSE SUSE-SU-2016:2507-1 xen 2016-10-12
openSUSE openSUSE-SU-2016:2497-1 xen 2016-10-11
openSUSE openSUSE-SU-2016:2494-1 xen 2016-10-11
SUSE SUSE-SU-2016:2473-1 xen 2016-10-07
Gentoo 201609-01 qemu 2016-09-25
Fedora FEDORA-2017-12394e2cc7 qemu 2017-01-25
Fedora FEDORA-2017-b953d4d3a4 qemu 2017-01-20
openSUSE openSUSE-SU-2016:3237-1 qemu 2016-12-22

Comments (none posted)

shiro: access control bypass

Package(s):shiro CVE #(s):CVE-2016-6802
Created:September 23, 2016 Updated:September 28, 2016
Description:

From the CVE entry:

Apache Shiro before 1.3.2, when using a non-root servlet context path, specifically crafted requests can be used to by pass some security servlet filters, resulting in unauthorized access.

Alerts:
Fedora FEDORA-2016-744 shiro 2016-09-23

Comments (none posted)

wireshark: denial of service

Package(s):wireshark-cli CVE #(s):CVE-2016-7175
Created:September 27, 2016 Updated:September 28, 2016
Description: From the CVE entry:

epan/dissectors/packet-qnet6.c in the QNX6 QNET dissector in Wireshark 2.x before 2.0.6 mishandles MAC address data, which allows remote attackers to cause a denial of service (out-of-bounds read and application crash) via a crafted packet.

Alerts:
Arch Linux ASA-201609-27 wireshark-cli 2016-09-26

Comments (none posted)

wordpress: multiple vulnerabilities

Package(s):wordpress CVE #(s):CVE-2015-8834 CVE-2016-4029 CVE-2016-6634 CVE-2016-6635
Created:September 23, 2016 Updated:October 3, 2016
Description:

From the Debian-LTS advisory:

CVE-2015-8834 - Cross-site scripting (XSS) vulnerability in wp-includes/wp-db.php in WordPress before 4.2.2 allows remote attackers to inject arbitrary web script or HTML via a long comment that is improperly stored because of limitations on the MySQL TEXT data type. NOTE: this vulnerability exists because of an incomplete fix for CVE-2015-3440

CVE-2016-4029 - WordPress before 4.5 does not consider octal and hexadecimal IP address formats when determining an intranet address, which allows remote attackers to bypass an intended SSRF protection mechanism via a crafted address.

CVE-2016-6634 - Cross-site scripting (XSS) vulnerability in the network settings page in WordPress before 4.5 allows remote attackers to inject arbitrary web script or HTML via unspecified vectors.

CVE-2016-6635 - Cross-site request forgery (CSRF) vulnerability in the wp_ajax_wp_compression_test function in wp-admin/includes/ajax- actions.php in WordPress before 4.5 allows remote attackers to hijack the authentication of administrators for requests that change the script compression option.

Alerts:
Debian DSA-3681-2 wordpress 2016-10-01
Debian DSA-3681-1 wordpress 2016-09-29
Debian-LTS DLA-633-1 wordpress 2016-09-22

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 4.8-rc8, released on September 25. Linus said: "Things actually did start to calm down this week, but I didn't get the feeling that there was no point in doing one final rc, so here we are. I expect the final 4.8 release next weekend, unless something really unexpected comes up."

The September 25 4.8 regression list has 15 entries.

Stable updates: 4.7.5 and 4.4.22 were released on September 24. The 4.7.6 and 4.4.23 updates are in the review process as of this writing; they can be expected on or after September 30.

Comments (none posted)

Kernel development news

A look at the 4.8 development cycle

By Jonathan Corbet
September 28, 2016
As of this writing, the 4.8 development cycle is nearing its end. Linus has let it be known that a relatively unusual -rc8 release candidate will be required before the final release, but that still means that the cycle will only require 70 days, fitting into the usual pattern. A look at the development statistics for this release also fits the pattern about now.

With regard to the release cycle, it has become boringly regular in recent years. The 3.8 kernel, released on February 18, 2013, came out on a Sunday, as has every subsequent release with the exception of 3.11, which was released on Monday, September 2, 2013. In these last few years, the only cycle that has taken longer than 70 days was 3.13, which required 77 days. The extra week that time around was forced by Linus's travels, rather than anything inherent in that cycle itself. Since then, every cycle has taken 63 or 70 days, with the sole exception of 3.16, which showed up in 56 (and one could quibble that it was really a 63-day cycle as well — that was the time Linus experimented with opening the merge window before the previous final release had been made).

In this 70-day cycle, we have seen the addition of 13,253 non-merge changesets from 1,578 developers — so far; the numbers will increase slightly before the end. It is thus a busy cycle, though the record for the busiest (3.15, with 13,722 commits) remains unchallenged. Those developers grew the kernel by 350,000 lines this time around. The most active developers in this cycle were:

Most active 4.8 developers
By changesets
Mauro Carvalho Chehab3472.6%
Chris Wilson2662.0%
Arnd Bergmann1801.4%
Daniel Vetter1441.1%
Geert Uytterhoeven1391.0%
Wei Yongjun1291.0%
Hans Verkuil1210.9%
Arnaldo Carvalho de Melo1170.9%
James Hogan1070.8%
Paul Gortmaker1000.8%
Trond Myklebust980.7%
David Hildenbrand920.7%
Christoph Hellwig900.7%
Krzysztof Kozlowski880.7%
Ville Syrjälä860.6%
Daniel Lezcano820.6%
Ben Dooks800.6%
Linus Walleij760.6%
Wolfram Sang750.6%
Christian König750.6%
By changed lines
Mauro Carvalho Chehab11074113.2%
Markus Heiser771969.2%
Hans Verkuil178682.1%
Wolfram Sang152111.8%
Moni Shoua130391.6%
Christoph Hellwig125351.5%
Yuval Mintz124671.5%
Jani Nikula123971.5%
Chris Wilson110031.3%
Darrick J. Wong74530.9%
Arnaldo Carvalho de Melo72040.9%
Marc Zyngier65140.8%
Daniel Vetter64990.8%
Megha Dey58440.7%
Florian Fainelli56970.7%
Krzysztof Kozlowski56000.7%
Gavin Shan53430.6%
Bryant G. Ly50190.6%
Arnd Bergmann49140.6%
Adrian Hunter49060.6%

Mauro Carvalho Chehab, the maintainer for the media subsystem, is traditionally a highly active developer. To understand his position at the top of both columns this time around, one need only to look back to the 4.8-rc1 announcement, where Linus said:

The merge window has been fairly normal, although the patch itself looks somewhat unusual: over 20% of the patch is documentation updates, due to conversion of the drm and media documentation from docbook to the Sphinx doc format.

Many of those documentation updates, part of the transition in the kernel's formatted documentation subsystem, came from Mauro, who jumped on the task of converting the (considerable) media documentation with gusto. Other developers at the top of the "by changesets" column include Chris Wilson, whose work was focused on the Intel i915 driver; Arnd Bergmann who, when he's not maintaining the arm-soc subsystem, stays busy eliminating warnings from the kernel build; Daniel Vetter, an active DRM developer, and Geert Uytterhoeven, who did a lot of system-on-chip support work.

In the "changed lines" column, Markus Heiser worked on the media document conversion — and contributed a fair amount of code to make the new documentation system work. Hans Verkuil did a lot of media driver work (including removing some unused drivers), Wolfram Sang spent time on on the ks7010 driver in the staging tree (along with maintaining the I2C subsystem), and Moni Shoua contributed a single patch adding the "RDMA over converged Ethernet" driver to the InfiniBand subsystem.

Normally, work in the staging tree figures prominently in these statistics, but it is almost absent this time around. Indeed, only 386 patches have been applied to the staging tree in the 4.8 cycle, far less than the 916 seen in 4.7, or the 1,852 in 4.6. One might be tempted to think that the staging tree is slowing down, but that seems likely to be a temporary state of affairs. Indeed, it appears that the 4.9 development cycle will see over 2,300 staging commits for the addition of the greybus subsystem alone.

Work on the 4.8 kernel was supported by 217 employers that we were able to identify. The most active employers this time around were:

Most active 4.8 employers
By changesets
Intel196014.8%
Red Hat11438.6%
(Unknown)8066.1%
(None)7465.6%
Linaro6625.0%
IBM6544.9%
Samsung6374.8%
SUSE3382.6%
Google2942.2%
AMD2812.1%
Oracle2592.0%
Texas Instruments2581.9%
Mellanox2431.8%
Renesas Electronics2231.7%
Broadcom2171.6%
ARM2041.5%
Huawei Technologies1701.3%
NVidia1661.3%
NXP Semiconductors1631.2%
(Consultant)1571.2%
By lines changed
Samsung12069314.4%
Intel10429112.4%
(None)10284812.3%
Red Hat485635.8%
IBM422985.0%
Mellanox292263.5%
(Unknown)276713.3%
Linaro229602.7%
Broadcom180402.2%
Cisco178682.1%
MediaTek162921.9%
QLogic159861.9%
ARM143971.7%
Renesas142831.7%
(Consultant)141461.7%
Free Electrons112271.3%
Oracle109821.3%
Texas Instruments97891.2%
Google95341.1%
Renesas Electronics94821.1%

The documentation work has shifted the numbers around here a bit but, for the most part, this table is as boring and unsurprising as usual. Samsung's position at the top of the "lines changed" column is, once again, the result of the formatted documentation transition.

In summary, this would appear to be another relatively normal busy development cycle. The kernel development machine appears to continue to hum along smoothly, with no serious process problems evident at this level though, as the recent discussion on backporting showed, there are issues elsewhere in the community. Both the 4.8 kernel and the community that produce it appear to be working well.

Comments (4 posted)

A low-level hibernation bug hunt

September 28, 2016

This article was contributed by Rafael J. Wysocki

This is a story about how several obscure and nasty hibernation bugs were fixed over the last few months and how hibernation on x86-64 was made to work correctly with kernel address space layout randomization (KASLR) at the same time. It is a success story, but it did not look like that in the beginning. That success would not have been possible without a series of bug reports that happened to appear just in the right order, one after another. Fortunately enough, in each case the bug in question was reliably reproducible on at least one system, which allowed it to be narrowed down to a particular kernel change or a specific piece of code. It also would not have been possible without the persistence and determination of the bug reporters and developers involved.

For me, it started with a problem report from Logan Gunthorpe forwarded to the Linux power-management development list by Ingo Molnar. In that report, Gunthorpe said that hibernation broke for him after a security-related change that had made the kernel set the "no execute" (NX) flag on memory pages in the gap between the kernel code and the read-only data section following it.

My initial idea about why that change might cause hibernation to fail was related to how resume from hibernation worked on x86-64, so let me explain that briefly to begin with.

Hibernation on x86-64

Hibernation is generally regarded as a power-management feature, but it really is a checkpoint/restore mechanism working on the system as a whole. When triggered, it creates a snapshot of all memory pages in use at that time and saves it in persistent storage. Of course, the snapshot of each page has to be saved along with the number of the page frame occupied by it, so that it can be put into the same page frame later on. All of that information combined is referred to as a "hibernation image".

Next, the system is turned off (that can be done in a few different ways which are not relevant here). When turned on again later, it undergoes full initialization, starting with the platform firmware, which invokes the bootloader that, in turn, loads a new kernel (that is what happens in Linux; the resume control flow in other operating systems may be different). That new kernel is then responsible for loading the hibernation image created earlier back into memory and for restoring its previous state, so it will be referred to as the "restore kernel" in what follows. In turn, the kernel that created the hibernation image and, therefore, is included in it will be referred to as the "image kernel".

Of course, the restore kernel is always different from the image kernel, but it may come from the same kernel binary, in which case the kernel code is the same in both of them. That is not a requirement on x86-64, though. Moreover, even if the kernel code (often referred to as the "kernel text") is the same, the layout of code and data in memory created by the restore kernel may be different from what the image kernel had used. For instance, if kernel address space layout randomization is in use, the physical location of the kernel code in the restore and image kernels usually will be different. Moreover, in Linux 4.8-rc1 (and later) KASLR will cause the virtual base address of the kernel identity mapping (the one that maps the entire physical address space of the system into the kernel's virtual address space) to be different in each of them as a rule.

When the restore kernel runs, it will first initialize itself and the hardware; then it will look for a hibernation image header. If it finds one, it reads image description data from there and, if all looks good, it will start to load the image.

The goal here is to put each memory page included in the image into the page frame it occupied before hibernation and pass control to the image kernel, which can take over from that point on (as the memory will then look the same as before hibernation to it). That is not as straightforward as it sounds, however, because at least some of the page frames in question will be occupied by the restore kernel itself or its data. To overcome that difficulty, the restore kernel takes several steps that each get it closer to its goal.

First of all, it allocates enough memory to hold all of the data pages and metadata (basically consisting of the page frame numbers to put those data pages into eventually) from the image. It uses two bitmaps to track the memory allocated in this step, to keep a record of (1) which page frames have been allocated and (2) which of them were in use before hibernation. The allocated ones that were not used before hibernation (i.e. their numbers are not included in the image metadata) are referred to as "safe", because they won't be overwritten with data coming from the image going forward.

Second, all of the image data pages are loaded into the allocated memory. The trick here is to store as many data pages from the image as possible in the page frames they occupied before hibernation; the bitmaps mentioned above are used for that. Namely, before loading a data page from the image, the page frame it occupied before hibernation is looked up in the bitmaps and, if it is present there (i.e. it was allocated in the previous step), the data page is loaded into it directly without the need to remember where it has been stored. If the page frame occupied by that data page before hibernation was not allocated in the previous step, the data page has to be stored in a safe page frame whose number has to be recorded along with the "target" location of the data page stored in it.

The next step is to quiesce all devices and all CPUs except for one and, having done that, the restore kernel prepares to copy all of the image data pages stored in "safe" page frames previously to their "target" locations. That has to be done in an architecture-specific way and it has to take into account the fact that the restore kernel itself and its data will be overwritten in the process, so the following step will not be reversible.

On x86-64, the restore kernel creates temporary page tables consisting of safe pages only, so that they will not get overwritten with image data. These page tables only need to cover two mappings: the identity mapping necessary for the image data pages copying operation itself and the kernel text mapping allowing the restore kernel to pass control back to the image kernel. This transfer of control is done by jumping to an address representing the image kernel's entry point (that can be read from the image header). In addition, the code that will copy the image data pages and perform the final jump to the image kernel's entry point has to be relocated to a safe page in order to prevent it from overwriting itself inadvertently; the page it has been relocated to must be marked as executable. With all that in place, the restore kernel only needs to jump to the relocated code that will switch over to the temporary page tables, copy the image data pages still held in "safe" page frames to their "target" locations, and jump to the image kernel's entry point.

Where things went wrong

That should sound reasonable enough — but it is what the restore kernel does today. At the time of the Gunthorpe's bug report, however, the code in question was somewhat less straightforward.

Namely, it also created temporary page tables but, while the identity mapping covered by those tables was set up from scratch, the restore kernel's own text mapping was reused by hooking it up directly into the topmost page directory of the new page tables. That allowed the restore kernel to switch over to the temporary page tables before jumping to the relocated code, but it also imposed serious limitations on the final jump to the image kernel's entry point such that it would only work in quite specific conditions. As it turned out, those conditions were not guaranteed to be met in general; that was the source of the problem seen by Gunthorpe.

My first idea about what might have gone wrong was that, perhaps, the security change identified by Gunthorpe as the one that introduced the problem caused the page containing the image kernel's entry point to become non-executable in the restore kernel's text mapping. With that in mind I prepared a patch that would mark that page as executable at the right time and asked Gunthorpe to test it, but it did not make any difference.

That caused me to look at the addresses involved more closely; I quickly realized that reusing the restore kernel's text mapping in the temporary page tables was a mistake, because that mapping might very well be corrupted in the process of copying image data pages to their target locations. If that happened, the final jump to the image kernel's entry point would go to nowhere, triggering a page fault that couldn't be handled at that point. Clearly, the temporary page tables needed a kernel text mapping set up from scratch consisting of only safe pages, just like the identity mapping. I noticed, though, that it didn't have to cover the entire kernel text. In fact, it didn't have to cover the kernel text at all. It only had to cover the image kernel's entry point itself.

That was the case because the code performing the final jump to the image kernel's entry point would be relocated and it would be running from a page covered by the identity mapping, so it didn't need the kernel text mapping to run. Moreover, the virtual address of the image kernel's entry point passed in the image header had to be mapped to the physical address of its location in memory, but that might not match the restore kernel's text mapping. Hence, the kernel text mapping used for the final jump to the image kernel's entry point had to be based on the information provided by the image kernel. For that reason, I changed the image header format to include the physical address of the image kernel's entry point too.

It didn't take me too much time to come up with a patch implementing that idea. With that patch, however, the restore kernel would still switch over to the temporary page tables before jumping to the relocated code, so its text mapping still had to be reused to start with. It would be replaced with a new minimum kernel text mapping that covered the image kernel's entry point just prior to the final jump to it.

The plot thickens

That patch fixed the resume problem for Gunthorpe, but it wasn't perfect. Namely, Borislav Petkov reported that it introduced a strange memory corruption during resume from hibernation for him. That new problem occurred on every resume from hibernation on his system and manifested itself as a corruption of the context of a user-space process that attempted to run after the image kernel had brought all CPUs back online and had completed the resume of I/O devices.

That was really unusual, so we spent quite a lot of time on trying to understand why and how it might happen. Linus Torvalds suspected that the problem might be related to the way the patch played with the kernel-text mapping and he clearly didn't like that part of it anyway, so I decided to change the code flow to first jump to the relocated code and then switch over to the temporary page tables from there. That still allowed the kernel-text mapping in the temporary page tables to be minimal, but it avoided the need to replace one version of the kernel-text mapping with another one on the fly which, admittedly, had been an ugly hack.

I posted a patch created along these lines and, again, it worked for Gunthorpe, but it still triggered memory corruption during resume from hibernation for Petkov, so we went into a long debug session trying to figure out what was going on. Theories taken into consideration included platform firmware involvement, a hardware issue, or a bitmap implementation error in the hibernation core, but there were substantial weaknesses in every one of them.

Eventually, we were able to narrow the breakage down to a single line of code in a new function added by my patch, but it was completely unclear why that particular line of code would lead to the observed symptoms. Since that line of code looked like it might be using a local variable on the stack, I decided to check whether changing the new function to use fewer local variables would make any difference (the theory was that the stack might have been corrupted somehow, although how exactly that could have happened was still a mystery). Surprisingly enough, that change appeared to fix the problem for Petkov (in fact, it only hid the problem, but that was found to be the case quite a bit later). It did that so effectively that the memory corruption went away and could not be reproduced on Petkov's machine any more.

In the meantime, Yu Chen analyzed Gunthorpe's original report in detail and explained why the security-related kernel commit identified as the one that introduced the problem could actually make a difference. According to Chen, the setting of the NX flag on the gap between the kernel text and the read-only data was not as straightforward as it looked because it might cause kernel page tables to be split. Specifically, if the end of the kernel text fell into a large (2M) page, that page had to be split into normal (4K) pages for the NX bit to be set on the gap only. That required more page-table memory to be allocated dynamically; that allocation happened within the kernel-text mapping that would be overwritten by image data during resume from hibernation, so reusing it in the restore kernel's temporary page tables would lead to an unrecoverable error.

In addition to that, Kees Cook reported that the fix for the issue reported by Gunthorpe also made hibernation work with KASLR on x86-64. At that time, KASLR worked on the kernel's text mapping only and randomized its physical base. As a result, the physical address of the base of the kernel text mapping used by the restore kernel would be different from what the image kernel had used most of the time. That prevented the restore kernel from mapping the virtual address of the image kernel's entry point (passed in the image header) to the correct physical address and resume from hibernation didn't work. That changed with the introduction of the minimal kernel-text mapping used for the final jump to the image kernel's entry point in my patch, because it mapped virtual addresses to physical addresses in the same way as the image kernel did.

In the face of this, and because the memory corruption seen by Petkov was apparently not reproducible with the last version of the resume fix (and I was quite confident that it could not be introduced by that fix itself anyway), I decided to go ahead with the fix and it finally landed in Linux 4.7 as kernel commit 65c0554b73c9. While the immediate problem was fixed, it was quite possible that the previous versions of the resume fix simply uncovered some obscure latent bug, so I made a few changes in the hibernation core to make it easier to debug in case the memory corruption problem or anything similar to it showed up again in the future. When I did that, though, I wasn't expecting the memory corruption issue to reappear a few days later in a report pointing to the kernel commit that was the true source of it. But, first, another problem had to be solved.

MWAIT vs. HLT

Meanwhile, my attention had been caught by another serious bug related to resume from hibernation on x86-64, but limited to Intel CPUs. At that point it had already been investigated for several weeks by Chen who had posted a couple of RFC patches to address it, but the reviewers looking at them pointed out some valid concerns to him.

That issue was related to the use of the MONITOR and MWAIT instructions of the CPU in the code that takes CPUs offline, in particular during resume from hibernation. CPU offlining is a complicated matter that involves migrating tasks and interrupts from the CPU going offline to ensure that it won't have anything to do from that point on. The last stage of the process is to make the CPU appear as though it is not functional from a software perspective. That is achieved by making it execute a "wait for something to happen" instruction in a tight endless loop with locally disabled interrupts.

There are two flavors of such "wait for something to happen" instructions in the Intel processors' instruction set. The first one is the old-school HLT instruction that causes the CPU to go into a relatively shallow low-power state and wait for an interrupt; if interrupts are locally disabled on the CPU, it will become almost completely unresponsive after executing that instruction (the only interrupts that can "revive" the CPU then are the non-maskable ones, but those are only used in very special situations). The second type of a "wait for something to happen" instruction is MWAIT, which goes together with MONITOR.

First, MONITOR takes an address identifying a range of memory that corresponds to a single line in the CPU's cache. Next, the MWAIT following it causes the CPU to enter a low-power state (and that state may be much deeper than the HLT-induced one) and wait for an event like an interrupt or a write to one of the MONITORed memory locations from another CPU in the system. Thus, from an energy consumption perspective, the MONITOR/MWAIT combination is much better than HLT, but that really wasn't important in the resume from hibernation case since CPUs stay offline for a very short time then. The important fact was that, during resume from hibernation, the memory locations MONITORed by the offline CPUs were almost guaranteed to be written to by the only online CPU that carries out the final resume stages described earlier.

Recall that, during those stages, the image data pages still held in safe page frames are copied into their target locations, which generally overlap with memory occupied by the restore kernel itself and by its data. In particular, with CPUs offline using MONITOR/MWAIT, they might (and usually did) overlap with the memory MONITORed by those offline CPUs. That was a recipe for disaster; because the page tables used by those CPUs might have been overwritten too at that point, an attempt to fetch the next instruction by any of them would lead to a page fault that could not be handled, so the kernel would panic and crash. Worse yet, the code those CPUs would be executing if woken up from the MWAIT-induced state inadvertently might have been overwritten at that point too.

The problem was figured out and a rough consensus about how to fix it had formed during the review of Chen's patches: everyone involved seemed to agree that, during resume from hibernation, the CPU offline code should use the HLT instruction instead of MONITOR/MWAIT. The question was how to implement that idea in the cleanest way possible.

Chen had already posted a couple of patches going in that direction when I started to look at the details of the code in question, but none of those approaches had been particularly attractive. My first attempts at fixing this issue were not any better, until I realized that the function to execute at the last stage of CPU offline was a callback pointed to by the play_dead field in the smp_ops structure, so replacing that callback temporarily with a special one using HLT during resume from hibernation would do the trick. The change needed for that was relatively isolated and, most importantly, it didn't add any overhead to the CPU offline code, so it was approved by Molnar and the final patch making the change shipped in Linux 4.8-rc1 as kernel commit 406f992e4a37.

The mystery bug returns

At that point, I was thinking that the worst problems related to resume from hibernation on x86-64 were fixed, but I forgot about the mystery memory corruption issue previously reported by Petkov. To my surprise, just then it was reported again by Andre Reinke. For Reinke, however, it was a regression introduced in Linux 4.6 and he was able to identify kernel commit ef0f3ed5a4ac as the source of it.

In retrospect, it was quite obvious that resume from hibernation would be broken by that commit, because it added a FRAME_BEGIN macro to the assembly code that would run as the first thing after the restore kernel had jumped to the image kernel's entry point. Among other things, that macro generated a PUSH instruction that would be executed before writing the address of the original image kernel's page tables into the CR3 register of the CPU. Thus the CPU would still be using the temporary page tables created by the restore kernel when executing it and the value of its stack pointer would contain the address of a memory area that might contain image data now. In that case, the PUSH instruction would corrupt those image data pages by overwriting them with a stale value read from another CPU register.

Ironically enough, the FRAME_BEGIN macro was there all the time when the memory corruption reported by Petkov was being investigated and nobody saw the problem with it then. It looks like everyone, myself included, was mentally blinded by the fact that it was a macro and no one could see the real sequence of CPU instructions it was resolving to. Had the PUSH instruction been located directly in that code, the issue probably would have been resolved earlier without a need for a pointer to the kernel commit that introduced it. That pointer did help a lot, though, because it made everyone look at the right places in the code and the bug was readily fixed by Josh Poimboeuf. His fix went into Linux 4.8-rc1 as kernel commit 4ce827b4cc58.

That would have ended the x86-64 hibernation saga, had KASLR not been extended during the Linux 4.8-rc1 merge window. That did happen, however, and it affected Petkov again, breaking resume from hibernation for him on another machine. He noticed that unsetting the new CONFIG_RANDOMIZE_MEMORY kernel configuration option (set by default) made hibernation work again on that system, so the investigation of the problem focused on the interactions between hibernation and the new KASLR-related changes.

After those changes, KASLR on x86-64 randomizes not only the (physical) base address of the kernel text mapping, but also the (virtual) base address of the kernel identity mapping, among other things. That obviously might not play well with resume from hibernation which, in principle, might not be prepared to deal with differences in kernel identity mapping base address between the restore and image kernels. Indeed, that turned out to be the case; two problems in that area were quickly found by KASLR developer Thomas Garnier, who posted prototype patches to fix them.

First, the assembly code carrying out the switch over to temporary page tables during resume from hibernation contained a direct reference to the __PAGE_OFFSET symbol, used with the assumption that it would always resolve to a number. However, with CONFIG_RANDOMIZE_MEMORY set that symbol resolves to a variable name and the code generated in that case was invalid. Clearly, it was necessary to avoid using __PAGE_OFFSET this way, but Garnier's prototype patch did that with the help of preprocessor directives, which wasn't particularly clean. There was a better way: pass the physical rather than the virtual address of the page tables to the assembly code. That physical address might be computed by the code written in C and passed to the assembly in the same variable that previously had been used to pass the virtual address of the temporary page tables. With that, the problematic reference to __PAGE_OFFSET from assembly would simply go away, so I posted a patch making that change which landed in Linux 4.8-rc1 as kernel commit c226fab47429.

Second, the kernel_ident_mapping_init() function called by the low-level code that creates temporary page tables during resume from hibernation made an assumption regarding the alignment of the base address of the kernel identity mapping that generally wasn't satisfied with CONFIG_RANDOMIZE_MEMORY set. That was easy enough to fix, but Garnier's prototype patch overlooked a corner case that was pointed out by Yinghai Lu, who posted his own version of that fix. Lu's patch worked, but it increased the complexity of the code in question which wasn't strictly necessary, so I prepared and posted yet another version of it that was approved by everyone involved and went into Linux 4.8-rc2 as kernel commit e4630fdd4763.

Still, those two fixes turned out to be insufficient to make the issue reported by Petkov go away. Moreover, the same issue was reported by Jiri Kosina in the meantime (the symptom seemed to be a triple fault during resume meaning, probably, an unhandled page fault). It was puzzling because it was reproducible on the affected systems 100% of the time, while other, similar, systems hibernated and resumed without any problems at all.

Fortunately, I had a test system that was similar to Petkov's failing one, so I was able to use his configuration file to generate a kernel for it. That allowed me to reproduce the problem locally and to verify that it was triggered by setting the CONFIG_DEBUG_LOCK_ALLOC configuration option. It still was not particularly clear why and how that option might lead to the observed failure, but Garnier was also able to reproduce it, and he found the reason why it appeared. That turned out to be a bug in the hibernation core introduced during the Linux 3.16 development cycle that caused a tracing function to be called before the processor state had been restored completely. As a result, a stale value of the GS register was used by that tracing function; that led to the observed triple fault, which Garnier was able to fix by simply changing the ordering of the code in question. That fix went into Linux 4.8-rc2 as kernel commit 62822e2ec4ad.

Working, at last

That finally made hibernation work for Petkov and Kosina again, even with both CONFIG_RANDOMIZE_MEMORY and CONFIG_DEBUG_LOCK_ALLOC set; only one thing remained unknown: why would CONFIG_DEBUG_LOCK_ALLOC make a difference before? That was explained by Kosina, who looked at the assembly output generated by the compiler for the affected code both with and without CONFIG_DEBUG_LOCK_ALLOC set and found that it was different in those two cases. Next, he was able to track the difference down to the definition of the __DECLARE_TRACE() macro, which generated additional code with CONFIG_DEBUG_LOCK_ALLOC set; that additional code used GS-relative addressing, which would lead to the observed failure if the GS value was stale.

In the end, in Linux 4.8-rc3 (and later) resume from hibernation on x86-64 works at last and it works with KASLR enabled. It took a couple of months to get to this point due to the nature of the bugs that needed to be fixed and due to the complexity of the affected code. As said in the beginning, that wouldn't have been possible without all of the developers and bug reporters involved and in particular I'd like to thank the following contributors for their input that shaped the final code changes: Logan Gunthorpe, Ingo Molnar, Borislav Petkov, Linus Torvalds, Chen Yu, Kees Cook, Andre Reinke, Josh Poimboeuf, Thomas Garnier, Yinghai Lu, and Jiri Kosina.

Comments (3 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 4.8-rc8 Sep 25
Greg KH Linux 4.7.5 Sep 24
Con Kolivas linux-4.7-ck5 Sep 23
Greg KH Linux 4.4.22 Sep 24

Architecture-specific

Core kernel code

Device drivers

Device driver infrastructure

Damien Le Moal ZBC / Zoned block device support Sep 26

Filesystems and block I/O

Memory management

Networking

Security-related

Miscellaneous

Page editor: Jonathan Corbet

Distributions

ARC++

By Jake Edge
September 28, 2016

X.Org Developers Conference

At the 2016 X.Org Developers Conference (XDC) in Helsinki, David Reveman gave a talk about the ARC++ project, which allows Android apps to run unchanged on Chrome OS. In order to make that work, there was some significant impedance matching that needed to be done. Reveman described how it all worked in a session on the first day of the conference.

[David Reveman]

The name "ARC++" comes from a previous project, called the "App Runtime for Chrome" or ARC, that was launched in 2014. It was a plugin for the Chrome browser, but required developers to change their apps to run in that environment. The plugin had to emulate multiple Android layers, which had performance implications. In the end, ARC "never really took off", he said.

So a new project was started, ARC++, with the goal of allowing access to all of the Play Store apps without requiring changes to those apps and with minimal changes to the underlying Android framework. The goals also included keeping Chrome OS secure, while maintaining the Chrome OS update model. In ARC++, Android apps are isolated from Chrome OS as much as possible, by using Linux containers. The apps run in the containers, while Chrome OS runs as it normally does.

Reveman then went into an overview of the graphics stack for ARC++. It is a complicated stack, as can be seen in the diagram from his slides [PDF] below on the left. A YouTube video of the talk is also available for those interested in further details.

Android apps typically use the hardware-accelerated Canvas API that has been available since Android 4.0. Some other apps, especially games, use OpenGL ES (GLES) directly, though they may use the new Vulkan 3D graphics API in the future.

[Graphics overview]

Everything in Android is rendered to a Surface; those Surfaces are produced by apps and placed into a queue that is consumed by SurfaceFlinger. The gralloc hardware abstraction layer (HAL) is used to allocate the buffers that underlie Surfaces, both in Android and ARC++. For ARC++, gralloc and the GLES driver use the Direct Rendering Manager (DRM) subsystem in the kernel for rendering. That allows apps to use fully accelerated GLES or to use other rendering APIs (e.g. Canvas) as needed. Some day, the apps may use Vulkan, but ARC++ doesn't care so long as the target is a gralloc buffer, he said.

For compositing in Android, Surfaces are sent to SurfaceFlinger, which uses GLES to do the compositing. For ARC++, though, the Hardware Composer HAL (HWComposer) handles all of the Surfaces. They are forwarded to Chrome OS for compositing along with the rest of the Chrome OS user interface.

For window management, ARC++ takes advantage of some of the recent multi-window work that has been done for Android. Certain operations are handled by Android and the others are managed by Chrome OS. The absolute positioning and resizing of windows are done by Android, while maximize, minimize, and full-screen operations are managed by Chrome OS. In addition, app switching, multiple profiles, screen magnifiers, and the like are handled by Chrome OS.

DRM and kernel modesetting (KMS) are used on Chrome OS. DRM is also used by Android and that is what allows efficiently integrating Android and Chrome OS, Reveman said. For both, DRM is used for rendering and buffer allocation; that is what allows easily sharing graphics buffers between the two. Chrome OS is a DRM master, so it can program the display controller, while Android does not need the modesetting capabilities, as it can just needs access to the GPU via a render node.

Low-level input and graphics on Chrome OS are handled by the Ozone abstraction layer that targets everything from embedded system-on-chip (SoC) graphics to X11 and its alternatives (e.g. Wayland). It uses a GpuMemoryBuffer object to hold DRM buffers that have been allocated using gralloc on the Android side or DRM itself on the Chrome OS side. That abstraction allows platform-independent code, such as the Chrome browser compositor, to take advantage of low-level graphics buffers.

The pixel formats (i.e. the color schemes and sizes used for in-memory pixel representation) in Chrome OS are limited and more had to be added for Android apps. In order for a DRM buffer to be imported into Chrome OS from Android, the pixel format of the buffer has to be supported. Some formats were only supported by falling back to converting them in software, though that is rare now, he said.

Exosphere is a Chrome OS component that allows other clients to connect to the user interface. It protects Chrome OS from potentially malicious clients, such as Android apps, by doing validation of the operations requested. It is built on top of the GpuMemoryBuffer framework.

Applications on Chrome OS run within the Chrome browser. Its compositor has a multi-process architecture, with one browser process that starts a renderer process for each tab. Those renderer processes produce frames that get sent back to the browser process. One difference between Chrome and Android is in synchronization. It is relatively simple for Chrome, where there is just one process that talks to the DRM driver from a single thread. In ARC++, though, multiple threads in the Android container make things more complicated. Right now, there is something of a "fence dance" that is done to ensure Android does not reuse a buffer before the GPU has finished with it; in the future, it is expected that explicit synchronization will allow that dance to be removed.

For the graphics, window management, and input communication between Android and Chrome OS, the Wayland protocol was chosen. There are a number of benefits to that approach, including Wayland's limited API that allows easier validation from a security point of view, Reveman said. Most of the interfaces needed were already present, but ARC++ did add a few. Another advantage is that Wayland is well-tested and has a set of existing clients that could be used to test and validate the ARC++ implementation.

The project is currently going through the process of deciding which of the new interfaces should go upstream and which should be discarded in favor of upstream. Some existing Wayland interfaces did not do quite what was needed for ARC++ and the developers did not have time to work with upstream at the time. There is also interest in adding a few more interfaces, for things like explicit synchronization for releasing buffers and presentation timing, as well as protected buffers for digital rights management.

The code for ARC++ can be found in the Chromium source tree. For example, Exosphere can be found here and the Wayland extensions can be found in this repository.

Reveman gave a few demonstrations of ARC++, including the Play Store app running on Chrome OS and multiple YouTube apps running while switching between them. There is gamepad support as well. When running a system in developer mode, you can have normal Wayland applications communicate with the compositor and run in Chrome OS when it is running an environment that allows running regular Linux applications on a Chrome OS device, such as crouton.

In answer to some questions from the audience, Reveman said that there were really no problems for regular Wayland applications due to the extensions made to the protocol. Chrome OS supports regular Wayland just fine; applications could even take advantage of the ARC++ extensions, though he doesn't recommend doing that. So far, there is a single Wayland application on Chrome OS—Android—and there are no plans to change that right now, but that could perhaps change down the road.

[I would like to thank the X.Org Foundation for sponsoring my travel to Helsinki for XDC.]

Comments (7 posted)

Brief items

Distribution quotes of the week

You are taking the word "sanctioned" to mean a lot more than anyone wants it to mean. We tried to wordsmith it but no English word gives the meaning that enough people recognize to mean the following:

Here is a list of repositories that other people have found will help you meet certain needs. Fedora makes no guarantee that it won't eat your system, but we also don't make any such guarantee about anything we ship. We try our best but someday the Grue is going to eat you no matter what. Thanks for playing.

Stephen J Smoogen

Debian is the worlds best cave-surveying distro. :-)
Wookey (Thanks to Paul Wise)

Comments (none posted)

Debian Project mourns the loss of Kristoffer H. Rose

Ana Guerrero Lopez sadly reports that Kristoffer H. Rose died on September 17. "Kristoffer was a Debian contributor from the very early days of the project, and the upstream author of several packages that are still in the Debian archive nowadays, such as the LaTeX package Xy-pic and FlexML. On his return to the project after several years' absence, many of us had the pleasure of meeting Kristoffer during DebConf15 in Heidelberg. The Debian Project honours his good work and strong dedication to Debian and Free Software. Kristoffer's broad technical knowledge and his ability to share that knowledge with others will be missed. The contributions of Kristoffer will not be forgotten, and the high standards of his work will continue to serve as an inspiration to others."

Full Story (comments: none)

Ubuntu 16.10 (Yakkety Yak) Final Beta released

The Ubuntu team has announced the final beta release of Ubuntu 16.10 Desktop, Server, and Cloud products. Beta images are also available for Kubuntu, Lubuntu, Ubuntu GNOME, Ubuntu Kylin, Ubuntu MATE, and Ubuntu Studio. The final release is expected on October 13.

Full Story (comments: none)

Distribution News

Ubuntu family

Ubuntu Online Summit

The next Ubuntu Online Summit will take place November 15-16. "At the event we are going to celebrate the 16.10 release and all the great things which are new and get to talk about what's coming up in Ubuntu 17.04."

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Firefox OS, B2G OS, and Gecko

Ari Jaaksi and David Bryant posted a note to the B2G (Boot to Gecko) OS community looking at the end of Firefox OS development and at what happens to the code base going forward. "In the spring and summer of 2016 the Connected Devices team dug deeper into opportunities for Firefox OS. They concluded that Firefox OS TV was a project to be run by our commercial partner and not a project to be led by Mozilla. Further, Firefox OS was determined to not be sufficiently useful for ongoing Connected Devices work to justify the effort to maintain it. This meant that development of the Firefox OS stack was no longer a part of Connected Devices, or Mozilla at all. Firefox OS 2.6 would be the last release from Mozilla. Today we are announcing the next phase in that evolution. While work at Mozilla on Firefox OS has ceased, we very much need to continue to evolve the underlying code that comprises Gecko, our web platform engine, as part of the ongoing development of Firefox. In order to evolve quickly and enable substantial new architectural changes in Gecko, Mozilla’s Platform Engineering organization needs to remove all B2G-related code from mozilla-central. This certainly has consequences for B2G OS. For the community to continue working on B2G OS they will have to maintain a code base that includes a full version of Gecko, so will need to fork Gecko and proceed with development on their own, separate branch." (Thanks to Paul Wise)

Comments (3 posted)

Page editor: Rebecca Sobol

Development

Systemd programming, 30 months later

September 27, 2016

This article was contributed by Neil Brown

Some time ago, we published a pair of articles about systemd programming that extolled the value of providing high-quality unit files in upstream packages. The hope was that all distributions would use them and that problems could be fixed centrally rather than each distribution fixing its own problems independently. Now, 30 months later, it seems like a good time to see how well that worked out for nfs-utils, the focus of much of that discussion. Did distributors benefit from upstream unit files, and what sort of problems were encountered?

Systemd unit files for nfs-utils first appeared in nfs-utils-1.3.0, released in March 2014. Since then, there have been 26 commits that touched files in the systemd subdirectory; some of those commits are less interesting than others. Two, for example, make changes to the set of unit files that are installed when you run "make install". If distributors maintained their unit files separately (like they used to maintain init scripts separately), this wouldn't have been an issue at all, so these cannot be seen as a particular win for upstreaming.

Most of the changes of interest are refinements to the ordering and dependencies between various services, which is hardly surprising given that dependencies and ordering are a big part of what systemd provides. With init scripts we didn't need to think about ordering very much, as those scripts ran the commands in the proper order. Systemd starts different services in parallel as much as possible, so it should be no surprise that more thought needs to be given to ordering and more bugs in that area are to be expected.

As hoped, the fixes came from a range of sources, including one commit from an Ubuntu developer that removed the default dependency on basic.target. That enabled the NFS service to start earlier, which is particularly useful when /var is mounted via NFS. Another, from a Red Hat developer, removed an ordering cycle caused by the nfs-client.target inexplicably being told to start before the GSS services it relies on, rather than after. A third, from the developer of OSTree, made sure that /var/lib/nfs/rpc-pipefs wasn't mounted until after the systemd-tmpfiles.service had a chance to create that directory. This is important in configurations where /var is not permanent.

Each of these changes involved subtle ordering dependencies that were not easy to foresee when the unit files were initially assembled. Some of them have the potential to benefit many users by improving robustness or startup time. Others have much narrower applicability, but still benefit developers by documenting the needs that others have. This makes it less likely that future changes will break working use cases and can allow delayed collaboration, as the final example will show.

rpcbind dependencies

There were two changes deserving of special note, partly because they required multiple attempts to get right and partly because they both involve dependencies that are affected by the configuration of the NFS services; they take quite different approaches to handling those dependencies. The first of these changes revised the dependency on rpcbind, which is a lookup service that maps an ONC-RPC service number into a Internet port number. When RPC services start, they choose a port number and register with rpcbind, so it can tell clients which port each service can be reached on.

When version 2 or version 3 of NFS is in use, rpcbind is required. It is necessary for three auxiliary protocols (MOUNT, LOCK, and STATUS), and is the preferred way to find the NFS service, though in practice that service always uses port 2049. When only version 4 of NFS is in use, rpcbind is not necessary, since NFSv4 incorporates all the functionality that was previously included in the three extra protocols and it mandates the use of port 2049. Some system administrators prefer not to run unnecessary daemons and so don't want rpcbind started when only NFSv4 is configured. There are two requirements to bear in mind when meeting this need; one is to make sure the service isn't started, the other is to ensure the main service starts even though rpcbind is missing.

As discussed in the earlier articles, systemd doesn't have much visibility into non-systemd configuration files, so it cannot easily detect if NFSv3 is enabled and start rpcbind only if it is. Instead it needs to explicitly be told to disable rpcbind with:

    systemctl mask rpcbind

There is subtlety hiding behind this command. rpcbind uses three unit files: rpcbind.target, rpcbind.service, and rpcbind.socket. Previously, I recommended using the target file to activate rpcbind but that was a mistake. Target files can be used for higher-level abstractions as described then, but there is no guarantee that they will be. rpcbind.target is defined by systemd only to provide ordering with rpcbind (or equally "portmap"). This provides compatibility with SysV init, which has a similar concept. rpcbind.target cannot be used to activate those services, and so should be ignored by nfs-utils. rpcbind.socket describes how to use socket-activation to enable rpcbind.service, the main service. nfs-utils only cares about the sockets being ready to listen, so it should only have (and now does only have) dependencies on rpcbind.socket.

Masking rpcbind ensures that rpcbind.service doesn't run. The socket activation is not directly affected, but systemd sorts this out soon enough. Systemd will still listen on the various sockets at first but, as soon as some process tries to connect to one of those sockets, systemd will notice the inconsistency and will shut down the sockets as well. So this simple and reasonably obvious command does what you might expect.

Ensuring that other services cope with rpcbind being absent is as easy as using a Wants dependency rather than a Requires dependency. These ask the service to start, but won't fail if it doesn't. Some parts of NFS only "want" rpcbind to be running, but one, rpc.statd, cannot function without it, so it still Requires rpcbind. This has the effect of implicitly disabling rpc.statd when rpcbind is masked.

It's worth spending a while reflecting on why the command is "systemctl mask" rather than "systemctl disable", as I've often come across the expectation that enable and disable are the commands to enable or disable a unit file. As a concrete example, Martin Pitt stated in Ubuntu bug 1428486 that they are "the canonical way to enable/disable a unit", but this was not the first place that I found this expectation.

The reality is that enable is the canonical way to request activation of a unit file. It doesn't actually start it ("systemctl start" will do that), and it isn't the only way to activate a unit file, as some other unit file can do so with a Requires directive. This may seem to be splitting hairs, but the distinction is more clear with the disable command, which does not disable a unit file. Instead, it only reverts any explicit request made by enable that a unit be activated. It is quite possible that a unit file will still be fully functional even after running "systemctl disable" on it.

If you want to be sure that a unit file will be activated, then "systemctl enable" is probably the right thing to do. If you want to be sure that it is not activated, then "systemctl disable" won't provide that guarantee; you need "systemctl mask" instead. This command ensures that the unit file won't run even if some other unit file Requires it. So that is the command that we use to ensure rpcbind isn't running, and it could also be used to ensure rpc.statd isn't running, though that isn't really needed as masking rpcbind effectively masked rpc.statd as mentioned.

Ordering nfsd with respect to filesystem mounting using a generator

One dependency for the NFS server, which is particularly obvious in hindsight, is that it should only be started after the filesystems that it is exporting have been mounted. Without this ordering, an NFS client might manage to mount the filesystem that is about to have something mounted on top of it, which can cause confusion — or worse. The default dependencies imposed by systemd will start services after local-fs.target, which ensures all local filesystems are mounted. When the commit mentioned above removed the default dependencies to allow NFS to start earlier, it explicitly added local-fs.target. So this seems well in hand.

For remote filesystems mounted over NFS, we need the reverse ordering. In particular, if a filesystem is NFS mounted from the local host (a "loopback" mount), the NFS server should be started before the filesystem is mounted. This is particularly important during system shutdown when ordering is reversed. If the NFS server is stopped before the loopback NFS filesystem is unmounted, that unmount can hang indefinitely.

To avoid this hang, Pitt added a dependency so that nfs-server.service would start before (and so be stopped after) remote-fs-pre.target. This ensures that the NFS server will be running whenever a loopback NFS filesystem might be mounted. This seems like it makes perfect sense, but there is a wrinkle: sometimes, filesystems that are considered by systemd to be "remote" can be exported by NFS. A particular example is filesystems mounted from a network-attached block device, such as one accessed over iSCSI.

Had I confronted the need to export iSCSI filesystems before Pitt had added the dependency on remote-fs-pre.service, I probably would have simply told systemd to start nfs-server.service "After remote-fs.target". This would have solved the iSCSI situation, but broken the loopback NFS situation. Had the unit files not been upstream, this is undoubtedly what would have happened.

Instead, a more general solution was needed. The NFS server needs to start after the mounting of any filesystems that are exported, but before any NFS filesystem is mounted. Systemd is not able to make this determination itself, but fortunately it has a flexible extension mechanism so it can have the details explained to it. Using this extension mechanism isn't quite as easy as adding a script to /etc/init.d, but perhaps that is a good thing. It should probably only be used as a last resort, but it is good to have it when that resort is needed.

Before systemd reads all its unit files, either at startup or in response to "systemctl daemon-reload", it will run any programs found in various "generator" directories such as /usr/lib/systemd/system-generators. These programs are run in parallel, are expected to complete quickly, and will normally read a foreign (i.e. non-systemd) configuration file and create new unit files or drop-ins (which extend existing unit files) in a directory given to the program, typically /run/systemd/generator. These will then be read when other unit files and drop-ins are read, so they can exercise a large degree of control over systemd.

For the nfs-server dependency, with respect to various mount points, we want to read /etc/exports and add a RequiresMountsFor= directive for each exported directory. Then we want to read /etc/fstab and add a Before=MOUNT_POINT.mount directive for each MOUNT_POINT of an nfs or nfs4 filesystem. As library code already exists for reading both of these files, this all comes to less than 200 lines of code. Once the problem is understood, the answer is easy.

Generators everywhere?

Having experienced the power of systemd generators, I immediately started to wonder how else I might use them. It is tempting to use a generator to automatically disable rpcbind when only NFSv4 is in use, but I think that is a temptation best avoided. rpcbind isn't only used by NFS. NIS, the Network Information Service (previously called "yellow pages") makes use of it, and sites could easily have their own local RPC services. It is best if disabling rpcbind remains a separate administrative decision, for which the "mask" function seems well suited.

In the earlier articles I described a modest amount of complexity required to pass local configuration through systemd to affect the parameters passed to various programs. Using a generator to process the configuration file could make all of that more transparent, or it might just replace one sort of complexity with another. While I don't agree with all the advice the systemd developers provide, this advice from the systemd.generator manual page is certainly worth considering:

Instead of heading off now and writing all kind of generators for legacy configuration file formats, please think twice! It is often a better idea to just deprecate old stuff instead of keeping it artificially alive.

Upstream now!

The evidence presented here supports the claim that keeping systemd unit files upstream can benefit all developers and users. The different experiences generated in different contexts were brought together into a single conversation so all could benefit from, and respond to, all the changes. This should not be surprising when one thinks of unit files as just another sort of code used to write the whole system. The only part that seems to be missing from upstream is a place to document the advice that "systemctl mask rpcbind" is the appropriate way to disable rpcbind and rpc-statd when only NFSv4 is in use. Maybe we need an nfs.systemd man page.

Comments (64 posted)

Brief items

Development quotes of the week

The canonical horrible, no good, very bad example of this is the JSON license, an MIT-family license plus “The Software shall be used for Good, not Evil.”. This kind of thing might be “very Crockford”. It is definitely a pain in the ass. Maybe the joke was supposed to be on the lawyers. But they laughed all the way to the bank.
Kyle E. Mitchell (Thanks to Jim Garrison)

This is about changing our developer attitude, our developer community. Maybe only we understand the battlefield, but still the fight is [in] our hands now. We must protect those we love and those we have never met. We must stand up and fight the fight for them, just the way we expect a doctor to have built the knowledge and to act ethically when taking care of us, just like we expect a civil engineer to build stuff with quality and respect for our well being, we must also act honorably like that, even if it means we [lose] money.

The developer community, hidden behind screens, have been bought by the big man money, and act to the people as blind policemen throwing bombs at the demanding population.

deltaprotocol (Thanks to Paul Wise)

Comments (1 posted)

Newsletters and articles

Development newsletters

Comments (none posted)

Mitchell: The MIT License, Line by Line

At his blog, Kyle E. Mitchell ("who is not your attorney") takes a close, line-by-line reading of the popular MIT software license. The details he points out begin on line one with the license's title: "'The MIT License' is a not a single license, but a family of license forms derived from language prepared for releases from the Massachusetts Institute of Technology. It has seen a lot of changes over the years, both for the original projects that used it, and also as a model for other projects. The Fedora Project maintains a kind of cabinet of MIT license curiosities, with insipid variations preserved in plain text like anatomical specimens in formaldehyde, tracing a wayward kind of evolution."

Despite the license being only 171 words, Mitchell finds quite a bit to expand on, such as the ambiguities of the phrase "to deal in the Software without restriction": "As a result of this mishmash of legal, industry, general-intellectual-property, and general-use terms, it isn’t clear whether The MIT License includes a patent license. The general language 'deal in' and some of the example verbs, especially 'use', point toward a patent license, albeit a very unclear one. The fact that the license comes from the copyright holder, who may or may not have patent rights in inventions in the software, as well as most of the example verbs and the definition of 'the Software' itself, all point strongly toward a copyright license." Nevertheless, Mitchell notes, "despite some crusty verbiage and lawyerly affectation, one hundred and seventy one little words can get a hell of a lot of legal work done".

Comments (6 posted)

Prodromou: Adopt a pump.io server

Evan Prodromou, creator of identi.ca and pump.io, has put a call out for interested parties to adopt the administration of public pump.io microblogging servers, which he is currently funding out of his own pocket. "Almost all of them are on $5/month Digital Ocean droplets, which makes them relatively cheap for a single person to support. If you decide you want to adopt a server, E14N will sell you the domain and all the software and data for $1. But you'll be obligated to keep the server running pump.io for at least a year, and if you decide you don't want to run it, you have to sell it back to me." There are currently around 25 servers in the federated network initially started by Prodromou, which does not count other pump.io instances. He notes that one important exception is the identi.ca site, which is significantly larger than the rest, and which he would like to find a trusted non-profit organization to maintain.

Comments (6 posted)

Page editor: Rebecca Sobol

Announcements

Brief items

Announcing the KDE Advisory Board

KDE e.V. introduces the KDE Advisory Board. "One of the core goals of the Advisory Board is to provide KDE with insights into the needs of the various organizations that surround us. We are very aware that we need the ability to combine our efforts for greater impact and the only way we can do that is by adopting a more diverse view from outside of our organization on topics that are relevant to us. This will allow all of us to benefit from one another's experience."

Comments (none posted)

Articles of interest

Garrett: Microsoft aren't forcing Lenovo to block free operating systems

Matthew Garrett looks at the real problem behind the inability of some Lenovo laptops to run Linux. "The real problem here is that Intel do very little to ensure that free operating systems work well on their consumer hardware - we still have no information from Intel on how to configure systems to ensure good power management, we have no support for storage devices in "RAID" mode and we have no indication that this is going to get better in future. If Intel had provided that support, this issue would never have occurred."

Comments (31 posted)

Calls for Presentations

LibrePlanet call for proposals

The next LibrePlant will be held March 25-26, 2017 in the Boston, MA area. "This year, the theme of LibrePlanet is "The Roots of Freedom." This encompasses the historical "roots" of the free software movement -- the Four Freedoms, the GNU General Public License and copyleft, and a focus on strong security and privacy protections -- and the concept of roots as a strong foundation from which the movement grows." The call for proposals closes November 14.

Full Story (comments: none)

CFP Deadlines: September 29, 2016 to November 28, 2016

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
September 30 November 12
November 13
T-Dose Eindhoven, Netherlands
September 30 December 3 NoSlidesConf Bologna, Italy
September 30 November 5
November 6
OpenFest 2016 Sofia, Bulgaria
September 30 November 29
November 30
5th RISC-V Workshop Mountain View, CA, USA
September 30 December 27
December 30
Chaos Communication Congress Hamburg, Germany
October 1 October 22 2016 Columbus Code Camp Columbus, OH, USA
October 19 November 19 eloop 2016 Stuttgart, Germany
October 25 May 8
May 11
O'Reilly Open Source Convention Austin, TX, USA
October 26 November 5 Barcelona Perl Workshop Barcelona, Spain
October 28 November 25
November 27
Pycon Argentina 2016 Bahía Blanca, Argentina
October 30 February 17 Swiss Python Summit Rapperswil, Switzerland
October 31 February 4
February 5
FOSDEM 2017 Brussels, Belgium
November 11 November 11
November 12
Linux Piter St. Petersburg, Russia
November 11 January 27
January 29
DevConf.cz 2017 Brno, Czech Republic
November 13 December 10 Mini Debian Conference Japan 2016 Tokyo, Japan
November 15 March 2
March 5
Southern California Linux Expo Pasadena, CA, USA
November 15 March 28
March 31
PGConf US 2017 Jersey City, NJ, USA
November 18 February 18
February 19
PyCaribbean Bayamón, Puerto Rico, USA
November 20 December 10
December 11
SciPy India Bombay, India
November 21 January 16 Linux.Conf.Au 2017 Sysadmin Miniconf Hobart, Tas, Australia
November 21 January 16
January 17
LCA Kernel Miniconf Hobart, Australia

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Netdev 1.2 updates

Netdev 1.2 takes place October 5-7 in Tokyo, Japan. The final program is available, plus some travel tips, and more.

Full Story (comments: none)

SFLC 2016 Annual Conference at Columbia Law School

The Software Freedom Law Center has announced the program for its conference, to be held October 28 in New York, NY. "We have assembled what we think will be a very lively and interesting program, which you can find summarized below. The ideas are free as in freedom, and---as always---attendance, NYS continuing legal education credits, lunch, and the various drinks are free as in beer."

Full Story (comments: none)

Events: September 29, 2016 to November 28, 2016

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
September 27
September 29
OpenDaylight Summit Seattle, WA, USA
September 28
September 30
Kernel Recipes 2016 Paris, France
September 28
October 1
systemd.conf 2016 Berlin, Germany
September 30
October 2
Hackers Congress Paralelní Polis Prague, Czech Republic
October 1
October 2
openSUSE.Asia Summit Yogyakarta, Indonesia
October 3
October 5
OpenMP Conference Nara, Japan
October 4
October 6
LinuxCon Europe Berlin, Germany
October 4
October 6
ContainerCon Europe Berlin, Germany
October 5
October 7
International Workshop on OpenMP Nara, Japan
October 5
October 7
Netdev 1.2 Tokyo, Japan
October 6
October 7
PyConZA 2016 Cape Town, South Africa
October 7
October 8
Ohio LinuxFest 2016 Columbus, OH, USA
October 8
October 9
Gentoo Miniconf 2016 Prague, Czech Republic
October 8
October 9
LinuxDays 2016 Prague, Czechia
October 10
October 11
GStreamer Conference Berlin, Germany
October 11 Real-Time Summit 2016 Berlin, Germany
October 11
October 13
Embedded Linux Conference Europe Berlin, Germany
October 12 Tracing Summit Berlin, Germany
October 13 OpenWrt Summit Berlin, Germany
October 13
October 14
Lua Workshop 2016 San Francisco, CA, USA
October 17
October 19
O'Reilly Open Source Convention London, UK
October 18
October 20
Qt World Summit 2016 San Francisco, CA, USA
October 21
October 23
Software Freedom Kosovo 2016 Prishtina, Kosovo
October 22 2016 Columbus Code Camp Columbus, OH, USA
October 22
October 23
Datenspuren 2016 Dresden, Germany
October 25
October 28
OpenStack Summit Barcelona, Spain
October 26
October 27
All Things Open Raleigh, NC, USA
October 27
October 28
Rust Belt Rust Pittsburgh, PA, USA
October 28
October 30
PyCon CZ 2016 Brno, Czech Republic
October 29
October 30
PyCon.de 2016 Munich, Germany
October 29
October 30
PyCon HK 2016 Hong Kong, Hong Kong
October 31 PyCon Finland 2016 Helsinki, Finland
October 31
November 1
Linux Kernel Summit Santa Fe, NM, USA
October 31
November 2
O’Reilly Security Conference New York, NY, USA
November 1
November 4
PostgreSQL Conference Europe 2016 Tallin, Estonia
November 1
November 4
Linux Plumbers Conference Santa Fe, NM, USA
November 3 Bristech Conference 2016 Bristol, UK
November 4
November 6
FUDCon Phnom Penh Phnom Penh, Cambodia
November 5 Barcelona Perl Workshop Barcelona, Spain
November 5
November 6
OpenFest 2016 Sofia, Bulgaria
November 7
November 9
Velocity Amsterdam Amsterdam, Netherlands
November 9
November 11
O’Reilly Security Conference EU Amsterdam, Netherlands
November 11
November 12
Seattle GNU/Linux Conference Seattle, WA, USA
November 11
November 12
Linux Piter St. Petersburg, Russia
November 12
November 13
T-Dose Eindhoven, Netherlands
November 12
November 13
Mini-DebConf Cambridge, UK
November 12
November 13
PyCon Canada 2016 Toronto, Canada
November 13
November 18
The International Conference for High Performance Computing, Networking, Storage and Analysis Salt Lake City, UT, USA
November 14
November 16
PGConfSV 2016 San Francisco, CA, USA
November 14 The Third Workshop on the LLVM Compiler Infrastructure in HPC Salt Lake City, UT, USA
November 14
November 18
Tcl/Tk Conference Houston, TX, USA
November 16
November 18
ApacheCon Europe Seville, Spain
November 16
November 17
Paris Open Source Summit Paris, France
November 17 NLUUG (Fall conference) Bunnik, The Netherlands
November 18
November 20
GNU Health Conference 2016 Las Palmas, Spain
November 18
November 20
UbuCon Europe 2016 Essen, Germany
November 19 eloop 2016 Stuttgart, Germany
November 21
November 22
Velocity Beijing Beijing, China
November 24 OWASP Gothenburg Day Gothenburg, Sweden
November 25
November 27
Pycon Argentina 2016 Bahía Blanca, Argentina

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds