LWN.net Weekly Edition for August 25, 2016 [LWN.net]

25 Years of Linux — so far

By Jonathan Corbet
August 24, 2016

On August 25, 1991, an obscure student in Finland named Linus Benedict Torvalds posted a message to the comp.os.minix Usenet newsgroup saying that he was working on a free operating system as a project to learn about the x86 architecture. He cannot possibly have known that he was launching a project that would change the computing industry in fundamental ways. Twenty-five years later, it is fair to say that none of us foresaw where Linux would go — a lesson that should be taken to heart when trying to imagine where it might go from here.

At the time of the announcement, Linux was vaporware; the first source release wouldn't come for another month. It wasn't even named "Linux"; we can all be happy that the original name ("Freax") didn't stick. When the code did come out, it was a mere 10,000 lines long; the kernel community now adds that much code over the course of about three days. There was no network stack, only Finnish keyboards were supported, many basic system calls were absent, and Linus didn't think it would ever be possible to port the kernel to a non-x86 architecture. It was, in other words, a toy system, not something that seemed poised to take over the world.

Some context

The computing industry in 1991 looked a little different than it does now. A whole set of Unix-based vendors had succeeded in displacing much of the minicomputer market but, in the process, they had turned Unix into numerous incompatible proprietary systems, each of which had its own problems and none of which could be fixed by its users. Unix, in moving down from minicomputers, had become much more widespread, but it also lost the code-sharing culture that had helped to make it Unix in the first place. The consequences of the Unix wars were already being felt, and we were being told by the trade press that the upcoming Windows NT release would be the end of Unix altogether. Unix vendors were developing NT-based systems, and the industry was being prepared for a Microsoft-only future.

Meanwhile, the GNU project had been underway for the better part of a decade. Impressive progress had been made on GCC and a whole set of low-level command-line utilities, but Richard Stallman's vision of an entirely free operating system remained unrealized and, in many minds, unattainable. We could put the GNU utilities on our proprietary Unix workstations and use them to build other free components — notably the X Window System — but we couldn't get away from that proprietary base. 32-Bit x86-based computers were becoming available at reasonable prices, but the Unix systems available on them were just as proprietary as the rest; there appeared to be little hope of a freely available BSD system at that time.

Linux jumped into this void with a kernel that was designed for 32-bit processors, a free license, and the ability to make use of the user-level free software that was already out there. Most importantly, Linux had a maintainer who was happy to take significant changes from others, and the Internet had become widespread enough to enable the creation of a large (for the time) development community. Suddenly, we had our free system that anybody could improve, and many people did. Before long, the gaps in Linux started to be filled.

Over the following years amazing things happened. Proprietary Unix did indeed die off as expected, but Microsoft's takeover of the rest of the computing industry did not quite go as planned. An industry that was doing its best to go completely closed was forced (after years of mocking and derision) to adopt a more open development model. Those of us who worked on Linux — the many thousands who worked at all levels, not just on the kernel — have changed the world in a huge and mostly positive way.

Forward to the present

A quarter of a century later, many things look very much the same. Linus is still running the project and many of the developers who contributed in the early days are still actively involved. We still have a free kernel that can serve as the base for a completely free operating system. Richard Stallman is still pushing for all software to be free. Much code is still developed by posting patches to mailing lists, much to the dismay of the younger GitHub generation. But a lot has changed over those years as well.

Linux in the early days was a decidedly noncommercial undertaking; few people made any sort of a reasonable living from it until the mid-to-late 1990s. It was a hobby, a way to have a reasonable operating system on commodity hardware, and a way to retain control over our computing environment. Some saw Linux as a weapon to use in the fight against "evil" companies like Microsoft but, for most of the community, it is probably fair to say that those companies weren't the enemy; instead, they were simply irrelevant. They were not offering a system that we wanted, so we were building our own instead.

The entry of corporations into Linux development was viewed with a fair amount of concern and trepidation in the early days. The early hiring of Alan Cox by Red Hat had users worried (needlessly) about his ability to continue contributing to the kernel in the ways he thought best. Linus actively avoided working for Linux-oriented companies. As the corporate world started to take note of our noncommercial system, there were a lot of fears that it would be co-opted and its spirit would be lost.

But, without companies, Linux would not be what it is now. We depended on them early on to create and support distributions for us. The community was singularly unsuccessful at creating a proper web browser for Linux until the collapse of Netscape jump-started the development of the tool we now call Firefox. Corporate support for scalability work (making the kernel perform on "large" four-processor systems, for example) was key to having a kernel that performs well on today's consumer-level devices. A community that did not attract (and welcome) corporate participation would not have created the system that we are running now.

We have managed to avoid many of the worst-case outcomes from heavy corporate participation so far. Rent-seeking efforts like the SCO lawsuits have been defeated. We have not gotten off for free on the patent front, but neither have we suffered the outright disaster that many feared. Companies have managed to drive some projects into the ground, but the freedom to fork a mismanaged project has often come to the rescue. In general, a lot of the outcomes that people feared have not come to be.

Sometimes, though, it can be hard to avoid feeling that the companies have taken over and that, perhaps, some of the spirit has indeed been lost. The bulk of free-software development is now done on somebody's payroll; some software is well supported indeed, but other projects that have been unable to find a corporate benefactor languish. As we have seen with projects like OpenSSL or GnuPG, it's not just the obscure projects that fall by the wayside; important infrastructure can also go without support. Changing Linux may not require corporate permission but, often, it seems to require corporate interest and funding.

Linux has done well indeed from the involvement of companies; they have taken us far beyond the apparent limits on what purely voluntary developers can do. Still, it is hard, sometimes, to avoid feeling that the free-software development model, meant to change the world and assure our freedom, has mostly become a tool for companies to cast off some of their development and support costs and undercut their competitors' revenue streams. That is almost certainly not something we could have avoided, but, without care, it could take a lot of the spark out of the free-software community.

The next 25 years

Back in 1991, it would have been difficult indeed to look forward and envision the world we live in today. Any attempts to describe the world of 2041 will be equally vain. All we can do is think about where we would like to be and try to get there.

Corporate participation in free-software development isn't going away, or, at least, so we must hope. But we have to try not to sell out to it entirely. A crucial piece of that is not allowing any single company to control any important project on its own. Developers who work on independent projects tend to think of themselves as affiliated with the project first, and their employer second; that results in a strong incentive to avoid compromising the project's goals in favor of what today's employer wants. Single-company projects are never really under the community's control; independent projects can be.

We need to think about what we want from our licensing. Copyleft has been an important part of how our base of free software was developed, but there are many who are saying that copyleft is dying, and they may be right. Even in projects that are covered by copyleft licenses, companies (which tend to own the bulk of the copyrights now) have been markedly resistant to enforcing those licenses. If the GPL is routinely ignored, it might as well be a permissive license. The experience of the last few decades shows that a lot of great free software can be developed under permissive licenses, and perhaps that is the future. But we should not wander blindly into that future without an open-eyed discussion.

Linux owns much of the computing world at this point, but the continued dominance of Linux should not be taken for granted. A nearly useless Linux kernel grew to the point that it pushed aside established competition; similarly, the toy system we laugh at today might just supersede Linux in the coming years. If that system is free software and truly better, then perhaps its success will be for the best, but there is no guarantee of either. If we want a future full of free software, we will have to earn it, just as we have earned what we have now.

And, most of all, we need to keep in mind why we embarked on this project in the first place, and why we're still doing it 25 years later. If developing Linux is just another job, it will certainly provide employment for a while but it will end up being no more than just another job. If, instead, it is a project to build superior technology that is free in every sense and fully under our control, then it can remain a project that can change the world for the better. We have built something great; with work, focus, and passion we can build something greater yet. It may well be that the first quarter of a century was just the beginning.

Comments (29 posted)

Designing mass-transit support for GNOME Maps

By Nathan Willis
August 24, 2016

GUADEC

At GUADEC 2016 in Karlsruhe, Germany, Andreas Nilsson explained the methodology he employed to implement a new feature for the GNOME Maps application: support for routing trips through public transportation networks. The use of mass transit, as it turns out, differs significantly from how users plan travel routes for walking, cycling, or driving.

The problem space

The transit-routing project began with bug number 764107, Nilsson said. GNOME Maps supports route planning for travel by foot, by bicycle, and by car, but that left out a large number of possible users. Nilsson was intrigued by the idea of working on the problem, he said, because he grew up in a small village in Sweden that, essentially, had no public transportation. There was a bus out of town that left three times a day, but that was it. Now that he lives in a major metropolitan area (Gothenburg), there are mass-transit lines everywhere.

So he drew up some initial implementation ideas, presuming that he could employ his standard process: consider the use case, mock up some designs, then code it. Then, however, he had a conversation about mass transit with his girlfriend (who is from Rio de Janeiro), and quickly discovered that the two of them had wildly different expectations about how a mass-transit planner should operate.

He then began to look for research on how mass transit is used, only to discover that there was nothing useful available at the level he needed—namely, anything revealing how people plan their trips. After a few more conversations, he decided that the only way to move forward was to conduct his own end-user research, and interview a variety of people about route-planning and mass-transit usage.

User research and testing

There is still no standardized approach for conducting such user research within free-software projects, so Nilsson developed his own. Starting with family members, friends, and co-workers, he conducted a range of interviews over the course of several weeks.

In addition to the basics of planning trips, each interview included questions about other transportation systems (e.g., whether or not the person owns a car and, thus, has a mix of transport options available), what existing services and mobile apps the person uses, and whether the person prefers certain transit methods over others.

As it turned out, the answers not only covered the expected ground, but they revealed additional information Nilsson had not considered. For example, he had planned to have a "prefer this transit method" option, but one interviewee indicated that she planned her trips with (in a sense) a negative preference: she tries to avoid train lines whenever possible, because they give her motion sickness.

Nilsson took the interview results, developed "user personas" (hypothetical user scenarios), and proceeded to develop the UI mock-ups as originally intended. An audience member asked why the user-persona step was necessary, since many designers do not use it. Nilsson replied that he finds it helpful to avoid letting his own opinions unduly influence whether or not a feature makes it into the eventual code. "It's harder to say 'I don't like this' and 'we don't need this feature'."

The transit-routing feature has since been implemented in GNOME Maps based on Nilsson's work, mainly by Marcus Lundblad. Significant testing has followed, particularly where the wording and layout of directions is concerned. The feature should be available in the next stable GNOME Maps release.

Lessons learned

Behind the scenes, it uses Open Trip Planner to compute routes. That is a free-software web service that uses publicly available transit data published in the General Transit Feed Specification (GTFS) format designed by Google, on top of an Open Street Map base map layer.

Any transit system that releases GTFS data is supported, and the information in the database is exactly as detailed (and as fresh) as the available GTFS data. Another audience member asked whether or not the system distinguished between various networks of transit (such as trains and trams in the same city, or Tokyo's multiple independent subway services). Nilsson replied that such information should be distinguished within GTFS, so GNOME Maps will use it automatically.

A lengthy question-and-answer period took up the remainder of the session, much of it focused on how GNOME can better employ user research when developing applications. Nilsson told one audience member that crafting the question set was not easy; he started by looking at other transit-planning implementations and asking "why this?" about many of the design choices.

Allan Day asked what he had learned about conducting user interviews. Nilsson replied that it is important to not talk too much, for several reasons. First, talking too much can inadvertently steer the interviewee's responses. Second, whenever there is an "awkward pause" most interviewees will naturally start talking more themselves, and the more they talk, the more they reveal about what they are thinking. Day added that he hopes GNOME can build up a guidebook for developers to use when conducting user research and interviewing; Nilsson added that he thinks the project will get better at the process as it keeps conducting research.

There were also a few questions about privacy and other GNOME Maps features. One audience member expressed concern about Google Maps's feature of marking locations as "home" or "work;" Nilsson replied that he has not implemented any such feature in GNOME Maps. Someone else asked whether the route planner showed pricing information, since that can be important when planning a trip. Nilsson responded that the idea came up in the interview process, but it has not yet been incorporated into the application. It could be tricky to implement in a reliable manner, given the volatility of prices.

[The author would like to thank the GNOME Foundation for travel assistance to attend GUADEC 2016.]

Comments (15 posted)

A different sort of "Fake Linus Torvalds"

By Jake Edge
August 24, 2016

A Linux Foundation publicity scheme once (in)famously created a "Fake Linus Torvalds" on Twitter, but a different sort of fake has more recently appeared. A message posted to the linux-kernel mailing list on August 15 announced the existence of a PGP key with the same short key ID as that of Torvalds's real key—something that could potentially lead programs and users to confuse the two keys. The problem with key ID collisions has been known for some time, but the message may have served to raise the profile of the dangers of using short PGP IDs.

PGP keys are typically used for encryption or for digitally signing data of some sort. Those signatures can be used to show that a private key corresponding to a particular public key was used, which strongly implies that the owner of the public key was the one who did it—as long as the private key remains private, of course. Signatures are often used on software distributions of various sorts, including packages, kernels, and, sometimes, commits in Git repositories.

These days, the PGP program itself is not widely used, though the standards (such as OpenPGP) it spawned have been picked up and carried forward by projects like GNU Privacy Guard (GnuPG or GPG). The public key for a PGP user is usually represented as an unwieldy blob of text, however; without some kind of "extra" knowledge, there is no way to know that a given key is really owned by the user it purports to come from. There are two somewhat-related mechanisms to address those problems: keyservers and the web of trust.

Keyservers provide a way for users to get someone else's public key, while the web of trust is a way to provide the user some level of trust that key comes from who it purports to. In order to have a key that is more trusted, users will try to get their key signed by other users' keys. When a new key is examined (normally by GPG or some other program), those signatures can be checked to see if the keys used in the signing are already trusted or if they belong to the "strong set" (a group of well-connected keys within the web of trust). Based on that examination, users can choose a trust level they place in the key.

Unfortunately, all of that is somewhat complex and hard for those who are not particularly technically savvy to understand. So tools are meant to help simplify some of that. But one of those simplifications can lead to problems such that users (both savvy and not) may be tricked into using (and trusting) keys that are not owned by the person or organization they think.

Keys have a "fingerprint" that uniquely identifies them, but they are relatively long hexadecimal strings (20 bytes, so 40 characters), which makes them unwieldy as well—at least for day-to-day use. For that reason, shorter substrings called key IDs (usually either four or eight bytes worth of hexadecimal) are often used to "identify" keys. Ten or fifteen years ago, even four-byte IDs were relatively safe, but these days it is rather easy to generate a key with a key ID that collides with an existing key. That's exactly what was done with Torvalds's key as noted in the mailing post (a key for "Fake Greg Kroah-Hartman" was similarly outed in the message).

In fact, a project called "Evil 32" has created a collision for every 32-bit key ID in the strong set. It used its scallion program to create those collisions, each in roughly four seconds using a GPU. Key collisions might not be that big of a problem, except that GPG and other tools don't treat them as an error, so users can end up with the wrong key. GPG certainly warns that untrusted keys are being used, but that is a relatively common warning in "normal" GPG use so it goes unnoticed.

Evil 32 has an example of how the problem might manifest itself. It uses a package from Puppet Labs for the demonstration (though, now, the instructions at Puppet Labs use its full key fingerprint to avoid the problem). When asking the keyserver for the key ID provided (using the --recv-keys option), GPG would actually accept multiple keys with that key ID and add them to the keyring. Because the signature file contained only the key ID at that time, either of the keys could be used to verify the contents of the package. Thus, a version with a backdoor, say, and signed with the attacker's colliding key could be downloaded and would pass a verification step.

At some level though, the root problem is that the web of trust isn't really being used the way it was envisioned (or, as some would say, the way it should be used). If users were only trusting keys with signatures of other trusted entities or that had other indications of trustworthiness and GPG were configured to reject untrusted keys, the problem would largely not exist. But, for the most part, the "there is no indication that the signature belongs to the owner" warning message is expected by users—if it is even seen.

Given that GPG is used by other encryption tools, some of which also try to simplify the process for novice users, the fact that multiple keys match a particular key ID may be completely hidden by the interface. That's good for reducing complexity, perhaps, but not so good for security and package integrity. GPG has a well-earned reputation for being difficult to use correctly, though it must be said that alternatives don't seem to be overtaking it any way.

Kroah-Hartman reacted to the revelation of his fake key with some cogent observations about the situation:

Yes, there is now a "fake" short fingerprint for my kernel signing key out there on the key servers, and yes, it's not really mine, and yes, we know who did it, and yes, it's revoked, and no, it wasn't just targeted at kernel developers, but at all 24000 keys in the "strong" ring of PGP trust, and yes something like this has been possible for a very long time now so it's not really that much news, and yes, gpg really is horrible to use and almost impossible to use correctly.

As he noted, this problem has been known for some time. There is a blog post from 2011 about it that clearly indicates it is a known problem at that point. A recent post that LWN linked to in June noted colliding key IDs had been found in the wild. The longtime existence of "vanity" key IDs (those that spell out some word or are based on an interesting number) clearly shows the problem—if people can choose their key IDs, nothing stops them from choosing someone else's. In the end, this most recent episode just provided yet another reason for users of PGP keys to pay attention and either use full key fingerprints or the web of trust—perhaps both, though that is probably simply overkill.

Comments (13 posted)

Security against Election Hacking (Freedom to Tinker)

Over at the Freedom to Tinker blog, Andrew Appel has a two-part series on security attacks and defenses for the upcoming elections in the US (though some of it will obviously be applicable elsewhere too). Part 1 looks at the voting and counting process with an eye toward ways to verify what the computers involved are reporting, but doing so without using the computers themselves (having and verifying the audit trail, essentially). Part 2 looks at the so-called cyberdefense teams and how their efforts are actually harming all of our security (voting and otherwise) by hoarding bugs rather than reporting them to get them fixed.

With optical-scan voting, the voter fills in the bubbles next to the names of her selected candidates on paper ballot; then she feeds the op-scan ballot into the optical-scan computer. The computer counts the vote, and the paper ballot is kept in a sealed ballot box. The computer could be hacked, in which case (when the polls close) the voting-machine lies about how many votes were cast for each candidate. But we can recount the physical pieces of paper marked by the voter’s own hands; that recount doesn’t rely on any computer. Instead of doing a full recount of every precinct in the state, we can spot-check just a few ballot boxes to make sure they 100% agree with the op-scan computers’ totals.

Problem: What if it’s not an optical-scan computer, what if it’s a paperless touchscreen (“DRE, Direct-Recording Electronic) voting computer? Then whatever numbers the voting computer says, at the close of the polls, are completely under the control of the computer program in there. If the computer is hacked, then the hacker gets to decide what numbers are reported. There are no paper ballots to audit or recount. All DRE (paperless touchscreen) voting computers are susceptible to this kind of hacking. This is our biggest problem.

Comments (2 posted)

cracklib2: code execution

Package(s):

cracklib2

CVE #(s):

CVE-2016-6318

Created:

August 22, 2016

Updated:

December 12, 2016

Description:

From the Debian-LTS advisory:

It was discovered that there was a stack-based buffer overflow when parsing large GECOS fields in cracklib2, a pro-active password checker library.

Alerts:

Mageia	MGASA-2016-0302	cracklib	2016-09-16
openSUSE	openSUSE-SU-2016:2204-1	cracklib	2016-08-31
Debian-LTS	DLA-599-1	cracklib2	2016-08-20
Fedora	FEDORA-2016-b601141219	cracklib	2016-12-11
Fedora	FEDORA-2016-bfa785e39e	cracklib	2016-12-11
Gentoo	201612-25	cracklib	2016-12-08

Comments (none posted)

eog: out-of-bounds write

Package(s):

eog

CVE #(s):

CVE-2016-6855

Created:

August 24, 2016

Updated:

September 6, 2016

Description:

From the bug report:

An out-of-bounds write vulnerability in eog was found when processing specially crafted SVG file. Due to passing the error message containing invalid UTF-8 character to GMarkup, out-of-bounds access is triggered.

Alerts:

openSUSE	openSUSE-SU-2016:2242-1	eog	2016-09-05
Mageia	MGASA-2016-0297	eog	2016-08-31
Debian-LTS	DLA-605-1	eog	2016-08-29
Ubuntu	USN-3069-1	eog	2016-08-25
Fedora	FEDORA-2016-0f8779baa6	eog	2016-08-24
Fedora	FEDORA-2016-5abbc35b6a	eog	2016-08-24

Comments (none posted)

firewalld: authentication bypass

Package(s):

firewalld

CVE #(s):

CVE-2016-5410

Created:

August 22, 2016

Updated:

January 30, 2017

Description:

From the Red Hat bugzilla entry:

FirewallD provides dbus api for modification of configuration after user has been authenticated via polkit. This does not apply for 5 methods which can be called by any logged user using dbus api or firewall-cmd cli interface. Any predefined policy can be used, server or desktop. list of concerned dbus methods in firewalld.py: addPassthrough, removePassthrough, addEntry, removeEntry, and setEntries. Any locally logged in user, could use the above firewalld commands to tamper or change the firewall settings.

Alerts:

Oracle	ELSA-2016-2597	firewalld	2016-11-10
Red Hat	RHSA-2016:2597-02	firewalld	2016-11-03
Fedora	FEDORA-2016-de55d2c2c9	firewalld	2016-08-19
Gentoo	201701-70	firewalld	2017-01-29
Scientific Linux	SLSA-2016:2597-2	firewalld	2016-12-14

Comments (none posted)

glibc: denial of service

Package(s):

glibc

CVE #(s):

CVE-2016-6323

Created:

August 22, 2016

Updated:

October 20, 2016

Description:

From the glibc bugzilla entry:

Since [__startcontext] transfers to a different stack it should be marked .cantunwind, so that the EABI unwinder does not try to unwind past it. This can cause _Unwind_Backtrace (used by backtrace_full in libbacktrace) to infloop.

also from Florian Weimer on oss-security:

Andreas Schwab of SuSE reported and fixed a glibc bug where the makecontext function would create an execution context which is incompatible with the unwinder, causing it to hang when the generation of a backtrace is attempted:

Alerts:

Fedora	FEDORA-2016-b4c1b24a74	glibc-arm-linux-gnu	2016-10-19
Fedora	FEDORA-2016-7e57edc4cc	glibc-arm-linux-gnu	2016-10-19
openSUSE	openSUSE-SU-2016:2443-1	glibc	2016-10-04
Fedora	FEDORA-2016-87dde780b8	glibc	2016-09-02
Fedora	FEDORA-2016-5f050a0a6d	glibc	2016-08-19

Comments (none posted)

gnupg: flawed random number generation

Package(s):

gnupg

CVE #(s):

CVE-2016-6313

Created:

August 18, 2016

Updated:

December 2, 2016

Description:

Felix Doerre and Vladimir Klebanov from the Karlsruhe Institute of Technology discovered a flaw in the mixing functions of GnuPG's random number generator. An attacker who obtains 4640 bits from the RNG can trivially predict the next 160 bits of output. A first analysis on the impact of this bug for GnuPG shows that existing RSA keys are not weakened. For DSA and Elgamal keys it is also unlikely that the private key can be predicted from other public information.

Alerts:

CentOS	CESA-2016:2674	libgcrypt	2016-11-12
Oracle	ELSA-2016-2674	libgcrypt	2016-11-10
Scientific Linux	SLSA-2016:2674-1	libgcrypt	2016-11-08
Oracle	ELSA-2016-2674	libgcrypt	2016-11-07
Red Hat	RHSA-2016:2674-01	libgcrypt	2016-11-08
Gentoo	201610-04	libgcrypt	2016-10-10
openSUSE	openSUSE-SU-2016:2423-1	libgcrypt	2016-09-30
Arch Linux	ASA-201609-14	lib32-libgcrypt	2016-09-17
Fedora	FEDORA-2016-3a0195918f	gnupg	2016-09-14
Fedora	FEDORA-2016-2b4ecfa79f	libgcrypt	2016-09-07
openSUSE	openSUSE-SU-2016:2208-1	libgcrypt	2016-08-31
Mageia	MGASA-2016-0292	gnupg/libgcrypt	2016-08-31
Debian-LTS	DLA-602-1	gnupg	2016-08-29
Fedora	FEDORA-2016-9864953aa3	gnupg	2016-08-26
Slackware	SSA:2016-236-02	libgcrypt	2016-08-23
Slackware	SSA:2016-236-01	gnupg	2016-08-23
Debian-LTS	DLA-600-1	libgcrypt11	2016-08-23
Arch Linux	ASA-201608-18	libgcrypt	2016-08-22
Fedora	FEDORA-2016-81aab0aff9	libgcrypt	2016-08-20
Ubuntu	USN-3065-1	libgcrypt11, libgcrypt20	2016-08-18
Ubuntu	USN-3064-1	gnupg	2016-08-18
Debian	DSA-3650-1	libgcrypt20	2016-08-17
Debian	DSA-3649-1	gnupg	2016-08-17
Gentoo	201612-01	gnupg	2016-12-02

Comments (none posted)

kernel: use-after-free

Package(s):

kernel

CVE #(s):

CVE-2016-6828

Created:

August 23, 2016

Updated:

August 24, 2016

Description:

From the Red Hat bug report:

A use after free vulnerability was found in tcp_xmit_retransmit_queue and other tcp_* functions.

Alerts:

Mageia	MGASA-2016-0364	kernel-tmb	2016-11-04
openSUSE	openSUSE-SU-2016:2625-1	kernel	2016-10-25
Mageia	MGASA-2016-0347	kernel	2016-10-20
Ubuntu	USN-3097-2	linux-ti-omap4	2016-10-13
Ubuntu	USN-3099-4	linux-snapdragon	2016-10-11
Ubuntu	USN-3099-3	linux-raspi2	2016-10-11
Ubuntu	USN-3099-2	linux-lts-xenial	2016-10-11
Ubuntu	USN-3098-2	linux-lts-trusty	2016-10-10
Ubuntu	USN-3097-1	kernel	2016-10-10
Ubuntu	USN-3098-1	kernel	2016-10-10
Ubuntu	USN-3099-1	kernel	2016-10-11
openSUSE	openSUSE-SU-2016:2290-1	kernel	2016-09-12
SUSE	SUSE-SU-2017:0494-1	the Linux Kernel	2017-02-17
SUSE	SUSE-SU-2017:0471-1	kernel	2017-02-15
Fedora	FEDORA-2016-f1adaaadc6	kernel	2016-09-02
Fedora	FEDORA-2016-2e5ebfed6d	kernel	2016-09-02
Debian-LTS	DLA-609-1	kernel	2016-09-03
Debian	DSA-3659-1	kernel	2016-09-04
Fedora	FEDORA-2016-723350dd75	kernel	2016-08-23
Fedora	FEDORA-2016-5e24d8c350	kernel	2016-08-23
SUSE	SUSE-SU-2017:0333-1	kernel	2017-01-30
CentOS	CESA-2017:0086	kernel	2017-01-19
Scientific Linux	SLSA-2017:0086-1	kernel	2017-01-17
Oracle	ELSA-2017-0086	kernel	2017-01-17
Red Hat	RHSA-2017:0113-01	kernel-rt	2017-01-17
Red Hat	RHSA-2017:0091-01	kernel-rt	2017-01-17
Red Hat	RHSA-2017:0086-01	kernel	2017-01-17
Scientific Linux	SLSA-2017:0036-1	kernel	2017-01-12
Oracle	ELSA-2017-3508	kernel 4.1.12	2017-01-12
Oracle	ELSA-2017-3508	kernel 4.1.12	2017-01-12
Oracle	ELSA-2017-3509	kernel 3.8.13	2017-01-12
Oracle	ELSA-2017-3509	kernel 3.8.13	2017-01-12
Oracle	ELSA-2017-3510	kernel 2.6.39	2017-01-12
Oracle	ELSA-2017-3510	kernel 2.6.39	2017-01-12
CentOS	CESA-2017:0036	kernel	2017-01-12
Oracle	ELSA-2017-0036	kernel	2017-01-10
Red Hat	RHSA-2017:0036-01	kernel	2017-01-10
SUSE	SUSE-SU-2016:3304-1	kernel	2016-12-30
SUSE	SUSE-SU-2016:3069-1	kernel	2016-12-09
openSUSE	openSUSE-SU-2016:3021-1	kernel	2016-12-06
SUSE	SUSE-SU-2016:2976-1	the Linux Kernel	2016-12-02
SUSE	SUSE-SU-2016:2912-1	kernel	2016-11-25

Comments (none posted)

kernel: multiple vulnerabilities

Package(s):

kernel

CVE #(s):

CVE-2015-3288 CVE-2012-6701

Created:

August 24, 2016

Updated:

August 24, 2016

Description:

From the openSUSE advisory:

CVE-2015-3288 - A security flaw was found in the Linux kernel that there was a way to arbitrary change zero page memory.

From the CVE entry:

Integer overflow in fs/aio.c in the Linux kernel before 3.4.1 allows local users to cause a denial of service or possibly have unspecified other impact via a large AIO iovec.

Alerts:

Ubuntu	USN-3127-2	linux-lts-trusty	2016-11-11
Ubuntu	USN-3127-1	kernel	2016-11-11
openSUSE	openSUSE-SU-2016:2144-1	kernel	2016-08-24

Comments (none posted)

knot: denial of service

Package(s):

knot

CVE #(s):

CVE-2016-6171

Created:

August 22, 2016

Updated:

August 24, 2016

Description:

From the Red Hat bugzilla entry:

It was found that knot does not implement reasonable restrictions for zone sizes. This allows an explicitly configured primary DNS server for a zone to crash a secondary DNS server, affecting service of other zones hosted on the same secondary server.

Alerts:

Fedora	FEDORA-2016-66c0c2105b	knot	2016-08-19
Fedora	FEDORA-2016-3479f8e060	knot	2016-08-19

Comments (none posted)

mingw-lcms2: heap memory leak

Package(s):

mingw-lcms2

CVE #(s):

CVE-2016-10165

Created:

August 24, 2016

Updated:

January 31, 2017

Description:

From the bug report:

An out-of-bounds read in cmstypes.c in Type_MLU_Read function was found, leading to heap memory leak triggered by crafted ICC profile.

Alerts:

Mageia	MGASA-2016-0303	lcms2	2016-09-16
Fedora	FEDORA-2016-8e55114267	lcms2	2016-09-04
Fedora	FEDORA-2016-1ebd9e116b	lcms2	2016-08-27
Fedora	FEDORA-2016-24c2453d6c	mingw-lcms2	2016-08-24
openSUSE	openSUSE-SU-2017:0336-1	lcms2	2017-01-31
Debian	DSA-3774-1	lcms2	2017-01-29
Debian-LTS	DLA-803-1	lcms2	2017-01-26

Comments (none posted)

pagure: cross-site scripting

Package(s):

pagure

CVE #(s):

CVE-2016-1000037

Created:

August 23, 2016

Updated:

August 24, 2016

Description:

From the Red Hat bug report:

It was found that Pagure served uploaded files from its attachment endpoint with content types that instructed the browser to parse HTML files, which could lead to Cross-Site Scripting attacks.

Alerts:

Fedora

FEDORA-2016-40d5f1d3c2

pagure

2016-08-23

Comments (none posted)

suckless-tools: screen locking bypass

Package(s):

suckless-tools

CVE #(s):

CVE-2016-6866

Created:

August 22, 2016

Updated:

November 21, 2016

Description:

From the Debian-LTS advisory:

It was discovered that the slock screen locking tool would segfault when the user's account had been disabled. slock called crypt(3) and used the return value for strcmp(3) without checking to see if the return value of crypt(3) was a NULL pointer. If the hash returned by (getspnam()->sp_pwdp) was invalid, crypt(3) would return NULL and set errno to EINVAL. This would cause slock to segfault which leaves the machine unprotected.

Alerts:

Mageia	MGASA-2016-0308	slock	2016-09-21
Fedora	FEDORA-2016-7e817cbf55	slock	2016-09-09
Fedora	FEDORA-2016-985b68721b	slock	2016-09-09
Debian-LTS	DLA-598-1	suckless-tools	2016-08-20
Arch Linux	ASA-201611-21	slock	2016-11-21

Comments (none posted)

xen: denial of service

Package(s):

xen

CVE #(s):

CVE-2016-4963

Created:

August 18, 2016

Updated:

August 24, 2016

Description:

From the SUSE advisory:

CVE-2016-4963: The libxl device-handling allowed local guest OS users with access to the driver domain to cause a denial of service (management tool confusion) by manipulating information in the backend directories in xenstore (bsc#979670).

Alerts:

SUSE	SUSE-SU-2016:2533-1	xen	2016-10-13
openSUSE	openSUSE-SU-2016:2497-1	xen	2016-10-11
openSUSE	openSUSE-SU-2016:2494-1	xen	2016-10-11
SUSE	SUSE-SU-2016:2100-1	xen	2016-08-18
SUSE	SUSE-SU-2016:2093-1	xen	2016-08-17
Mageia	MGASA-2017-0012	xen	2017-01-09

Comments (none posted)

Kernel release status

The current development kernel is 4.8-rc3, released on August 21. According to Linus: "It all looks pretty sane, I'm not seeing anything hugely scary here."

Stable updates: 4.7.2, 4.4.19, and 3.14.77 were released on August 21.

Comments (none posted)

Quotes of the week

The latest Jason Bourne movie was sufficiently bad that I spent time thinking how the tree_lock could be batched during reclaim.

— Mel Gorman shows how memory-management development is done.

I talked a lot with Linus about design at this time, but never really participated in the kernel work (partly because disagreeing with Linus is a high-stress thing).

— Lars Wirzenius looks back

I want the code, and I want the company that produced that code to join our community. So far we are doing really well in achieving that goal.

— Greg Kroah-Hartman

Comments (13 posted)

Restartable sequences restarted

By Jonathan Corbet
August 24, 2016

"Restartable sequences" is starting to look a bit like one of those bright ideas that floats around on the kernel list for years, but which never quite seems to make it into the mainline. In this case, the idea was first proposed over one year ago without, yet, having made appreciable progress toward merging; activity on this patch set died down after a while. But development on restartable sequences has picked up again under a new developer who has come up with yet another API for the feature.

As has happened in the kernel, scalability pressures are driving some user-space applications toward the use of lockless algorithms. In kernel space, such algorithms tend to be based on either disabling preemption or retrying an operation after contention is detected. Disabling preemption in user space is not an option, so retries are the primary option remaining. That is where restartable sequences come in; they combine a kernel-facilitated mechanism for detecting possible contention with a means to quickly force a retry when contention happens.

The current version of restartable sequences, as posted by Mathieu Desnoyers, retains the core idea of its predecessors. A restartable sequence is based around a short segment of code; only the final instruction of that segment is allowed to have side effects visible outside of the current thread. There is also an abort sequence, called to clean up and retry should the thread be preempted while executing the sequence. The specifics have changed, though.

Code using restartable sequences needs to start with an rseq structure:

    struct rseq {
    	int32_t cpu_id;
	uint32_t event_counter;
        struct rseq_cs *rseq_cs;
    };

(The actual structure is a bit more complex; various architecture-specific details have been omitted here in the interest of readability.) The cpu_id field always contains the number of the CPU on which the thread is running; event_counter is incremented whenever the thread is preempted — but only if rseq_cs is not null. The purpose of rseq_cs will be discussed below.

This structure must be registered with the kernel before restartable sequences can be used; the operative system call is:

    int rseq(struct rseq *rseq, int flags);

Only one rseq structure can be registered at a time in any given thread, but that structure can be registered multiple times, and the kernel will keep track of how many registrations (and unregistrations) there have been. The flags argument must be zero when registering a new structure. Unregistration is done by passing a null pointer for the rseq structure; setting flags to RSEQ_FORCE_UNREGISTER will cause the immediate removal of the structure, even if it has been registered multiple times.

In the past there have been concerns about how the restartable sequences feature would work when there are multiple users within an application (libraries, for example) that do not know about each other. If those users fight over which rseq structure is used, there will be problems with this interface as well; if, instead, they can all agree on the same structure, all will be well. Restartable sequences must be simple, so it makes no sense for code running within one to call another function at all, much less one that would start its own sequence. So there can only be a single sequence running at any given time.

To ensure that all users share a single rseq structure, the documentation recommends that each user declare it as a weak symbol and name it __rseq_abi. The linker will then ensure that, if there are multiple declarations within a given program, they will all refer to the same structure.

The other half of the puzzle is the rseq_cs structure pointed to from within the rseq structure above. This structure looks like (again, with some simplification applied):

    struct rseq_cs {
        void *start_ip;
	void *post_commit_ip;
	void *abort_ip;
    };

This structure describes an actual critical section that runs in the restartable mode. Here, start_ip is the address of the first instruction in the section, and post_commit_ip is the first instruction beyond the end of the section; any code running between those two instructions is running within the critical section. The abort_ip pointer is the address of the cleanup code to be executed should the thread be preempted while executing within the section.

With those pieces, a restartable sequence is run using something like this sequence of steps (assuming that the rseq structure is already registered):

The event_counter field from the rseq structure is read and saved.
The rseq_cs pointer in the rseq structure is set to point to the rseq_cs structure describing the critical section to be executed.
The event_counter is read again and compared to the value read previously; if the values do not match, the rseq_cs field should be cleared and the process must be restarted from the beginning.
The critical section can now be executed. In most cases, only the final instruction in the critical section should have visible side effects.
The rseq_cs field should be set to NULL.

If execution makes it past the end of the section, then all is well. If, instead, the thread is preempted while running within the critical section, the kernel will cause it to jump to the abort_ip address. The code found there should clean up and prepare to retry.

In principle, that is all there is to it. In practice, applications using this feature must still include some assembly code to set up the various instruction pointers; there is some complexity involved in making it all work properly. Those interested in examples can have a look at the self-tests included with the patch and, in particular, the rather frightening assembly-in-CPP code found here and here.

There have not been many comments on the implementation this time around; it seems that, perhaps, things are finally getting to a point where the developers who are paying attention are reasonably happy. The next obstacle, though, may be Linus, who wants more evidence that this is a feature that will actually be used. Convincing him is likely to require demonstrating some real-world code that benefits from the feature and benchmarks to prove that it is all worthwhile. Since restartable sequences are said to have been in use in places like Google for some time, that proof should be possible to come by. If the developers involved follow through, perhaps this sequence of patches will not need to be restarted too many more times.

Comments (5 posted)

Btrfs and high-speed devices

By Jake Edge
August 24, 2016

LinuxCon North America

At LinuxCon North America in Toronto, Chris Mason relayed some of the experiences that his employer, Facebook, has had using Btrfs, especially with regard to its performance on high-speed solid-state storage devices (SSDs). While Mason was the primary developer early on in the history of Btrfs, he is one of a few maintainers of the filesystem now, and the project has seen contributions from around 70 developers throughout the Linux community in the last year.

He is on the kernel team at Facebook; one of the main reasons the company wanted to hire him was because it wanted to use Btrfs in production. Being able to use Btrfs in that kind of environment is also the primary reason he chose to take the job, he said. As the company is rolling Btrfs out, it is figuring out which features it wants to use and finding things that work well and not so well.

Mason went through the usual list of high-level Btrfs features, including efficient writable snapshots, internal RAID with restriping, online device management, online scrubbing to check in the background if the CRCs are the same as when the data was written, and so on. The CRCs for both data and metadata are a feature that "saved us a lot of pain" at Facebook, he said.

The Btrfs CRC checking means that a read from a corrupted sector will cause an I/O error rather than return garbage. Facebook had some storage devices that would appear to store data correctly in a set of logical block addresses (LBAs) until the next reboot, at which point reads to those blocks would return GUID partition table (GPT) data instead. He did not name the device maker because it turned out to actually be a BIOS problem. In any case, the CRCs allowed the Facebook team to quickly figure out that the problem was not in Btrfs when it affected thousands of machines as they were rebooted for a kernel upgrade.

Volume management in Btrfs is done in terms of "chunks", which are normally 1GB in size. That is part of what allows the filesystem to handle differently sized devices for RAID volumes, for example. Volumes can have specific chunks reserved for data or metadata and different RAID levels can be applied to each (e.g. RAID-1 for the metadata and RAID-5 for the data).

But Btrfs has had some lock-contention problems; it still has some of them, he said, though there are improvements coming. The filesystem is optimized for use on SSDs, but he ran an fs_mark benchmark in a virtual machine (for comparative rather than hard numbers) creating zero-length files and found that XFS could create roughly four times the number of files per second (33,000 versus 9,000). That was "not good", but before he started tuning Btrfs, he wanted to make XFS go as fast as he could.

To that end, he looked at what XFS was blocked on, which turned out to be locks for allocating filesystem objects. By increasing the allocation groups in the filesystem when it was created (from four to sixteen to match the number of CPUs in his test system), he could increase its performance to 200,000 file-creations per second. At that point, it was mostly CPU bound and the function using the most CPU was one that could not be easily tweaked away with a mkfs option.

So then he turned to Btrfs. Using perf, he was able to see that there was lock contention on the B-tree locks. The Btrfs B-tree stores all of its data in the leaves of the tree; when it is updating the tree, it has to lock non-leaf nodes on the way to the leaf, starting with the root node. For some operations, those locks have to be held as it traverses the tree. Hopefully only the leaf needs to be locked, but sometimes that is not the case and, since everything starts at the root, it is not surprising that there is contention for that lock.

As an experiment to make Btrfs go faster, he used the subvolume feature to effectively create more root nodes. Instead of the usual one volume (with one root node), he created sixteen subvolumes so that there was one per CPU, each with its own root node and lock. That allowed Btrfs to get close to the XFS performance at 175,000 file-creations per second.

But the goal was to make the filesystem faster without resorting to subvolumes, which led to a new B-tree locking scheme. By default, Btrfs has 16KB nodes, which is not changing, but instead of being treated as a single group, each node will now be broken up into sixteen groups, each with its own lock.

He has not yet picked the best number of groups for each node, but the change allows a default Btrfs filesystem create 90,000 files per second. There are a lot of assumptions in Btrfs that there is only one lock per node, which he is working on removing. In addition, Btrfs switched to reader/writer locks a ways back and it turns out that those perform worse than expected, so he will be looking into that.

By some other measures, though, Btrfs compares favorably with XFS on the benchmark. XFS writes 120MB/second and does 3000 I/O operations/second (IOPS) for the benchmark, while Btrfs does 50MB/second and 300 IOPS to accomplish the same amount of work. That means that Btrfs is ordering things better and doing less I/O, Mason said.

The Gluster workloads at Facebook, which use rotational storage, are extremely sensitive to metadata latency to the point where one node's high latency can make the entire cluster slower than it should be. In the past, the company has used flashcache (which is similar to bcache) for both XFS and Btrfs to cache some data and metadata on SSDs, which improves the metadata latencies, but not enough.

To combat that, he has a set of patches to automatically put the Btrfs metadata on SSDs. The block layer provides information on whether the storage is rotational; for now, his patch assumes that if it is not rotational then it is fast. The patch has made a huge difference in the latencies and requires less flash storage (e.g. 450GB for 40TB filesystem) for Facebook's file workload that consists of a wide variety of file sizes. "You will need a lot more metadata if you have all 2KB files", he said.

That patch set is small (73 lines of code added), which is nice, he said. It is not entirely complete, though, as btrfs-utils needs changes to support it, but that should be a similarly sized change.

Another bottleneck he has encountered is in using the trim (or discard) command to tell SSDs about blocks that are no longer in use by the filesystem. That allows the flash translation layer to ignore those blocks when it is doing garbage collection and should, in theory, provide better performance. But many devices are slow when handling trim commands. Both XFS and Btrfs keep lists of blocks to trim, submit them as trim commands, and then must wait for those commands to complete during transaction commits, which stalls new filesystem operations. Those stalls can be huge, on the order of "tens of seconds", he said.

Ric Wheeler spoke up to say that trim is simply a request that the drive is free to ignore. He suggested that trim should not be performed during regular filesystem operations. Ted Ts'o agreed and said that the best practice for ext4 and probably other filesystems was to run the fstrim batch-trimming command regularly out of cron.

In answer to a question, Mason said that the disadvantages of not trimming are device-dependent. In some cases, it may reduce the lifetime of the device or add latencies during garbage collection, but it may also do nothing. Wheeler pointed out that if you are using thin provisioning, though, failing to trim could cause the storage to run out of space when there is actually space available.

Though it is not a flash-specific change, there have been some problems with large (> 16TB) Btrfs filesystems because of the free-space cache. Originally, free extents were not tracked, but that required scanning the entire filesystem at mount time, which was slow. When free-space was added, the cache was per-block-group and large filesystems have a lot of block groups, which meant that there was more caching on each commit. In the 4.5 kernel, Omar Sandoval added a new free-space cache (which can be enabled with -o space_cache=v2) that is "dramatically faster", with commit latencies dropping from four to zero seconds.

For the near future, he plans to finalize the new B-tree locking and improve some fsync() bottlenecks, though he thinks that the new space cache will help there. There are also some other spinlocks slowing things down that he wants to look at.

He mentioned a few of the tools that he uses to find bottlenecks. Perf is the right tool when processing is "pegged in the CPU", but finding problems when things are blocking is much harder. For that, he recommended BPF and BCC. In particular, Brendan Gregg's offcputime BPF script is useful to show both kernel and application stack traces to help show the reasons why a process is blocked. In fact, Facebook likes offcputime so much that fellow Btrfs maintainer Josef Bacik has created a way to aggregate the output of the program across multiple systems.

There were a few questions at the end of the session. One person asked whether Mason had seen any uptake of Btrfs for smaller devices. Mason said that the filesystem "needs love and care" when it is being used, which is why Facebook can use it. Someone with an ARM background would need to be working on Btrfs upstream in order to provide that kind of care if it were to be adopted on ARM-powered devices, he said.

Another asked how much faster the current design of Btrfs could go. Mason seemed quite optimistic that it could go "much faster". The metadata format is flexible, so "if things are broken, we can fix them".

The last two questions regarded two different benchmarks, both of which are interesting, but neither of which Mason has run. Flashcache versus bcache would likely provide similar numbers, he thought, but flashcache worked for Facebook so there was no need to try bcache. He also has not run benchmarks against ZFS. When he started Btrfs, ZFS was not available. There is no reason not to do so now, he said, but he hasn't, though he would be interested in the results.

[I would like to thank the Linux Foundation for travel assistance to Toronto for LinuxCon North America.]

Comments (12 posted)

Network filtering for control groups

By Jonathan Corbet
August 24, 2016

Control groups (cgroups) perform two basic functions in the kernel: they allow the hierarchical grouping of processes, and they enable the use of controllers to apply resource limits to the processes in each group. Now there is interest in extending cgroups to allow for the control of network traffic as well, but there is a significant difference of opinion over the best way to implement this control. Naturally, the discussion involves another kernel technology that seems to be spreading out into all areas: the Berkeley packet filter (BPF) virtual machine.

The objective is to be able to apply a filter to network traffic going to or from any process contained within a given cgroup. The intent may be to improve security, by restricting the traffic that a particular system service or application (contained within its own cgroup) can generate. Or it could be a desire for simple resource control or accounting. Either way, the point is to have this control at the cgroup level, something that the kernel does not support now.

One possible solution, posted by Daniel Mack, is to allow a BPF program to be attached to a cgroup. To that end, the bpf() system call is extended with a new BPF_PROG_ATTACH operation. Exactly what the program is attached to depends on the type of the program; for now the only type supported is BPF_PROG_TYPE_CGROUP_SOCKET_FILTER, but the possibility exists that other types (to make other sorts of policy decisions for cgroups) could be supported in the future. Programs may be attached as either an ingress or an egress filter, controlled by a flag passed to the bpf() call. Naturally, there is also a BPF_PROG_DETACH operation to remove a BPF program from a cgroup.

Once the program is attached, it will be run on each packet sent to or from a process in the cgroup, depending on how it was attached — though only the ingress side is implemented in the current patch set. If the program returns one, the packet will be allowed to pass; otherwise it will be dropped.

The idea is thus relatively straightforward; it is similar to the socket filters that an individual process can apply to a socket it owns now. Cgroup maintainer Tejun Heo had some quibbles with the implementation, but had no real objection to the overall design. It seems like something that could be added without a whole lot of trouble — except that one developer has different ideas.

That developer is Pablo Neira Ayuso, the maintainer of the netfilter subsystem. Perhaps unsurprisingly, he thinks that the proper solution is based on netfilter rather than BPF; in particular, he would like to see the establishment of a special table of rules that could be attached to a cgroup. In his opinion, a set of rules that can be queried with existing tools would be easier for administrators to deal with than a relatively opaque BPF program. Multiple sets of netfilter rules can be composed, while the BPF approach only allows for a single program to be attached to a cgroup, limiting flexibility in situations where more than one entity wants to add filtering rules. A netfilter-based approach could also take advantage of the connection tracking that, likely, is already being done, speeding the processing of most packets. Those reasons, he says, make netfilter the better tool for this particular job.

Daniel acknowledged the downsides of the BPF implementation, though he was less convinced about the importance of some of them. It seems that this project was looking at a netfilter-based solution early on, but chose to refocus on BPF. There were concerns that the netfilter developers did not actually want a cgroup-level hook, and that the performance of the netfilter system might not be up to the task. He summarized things this way:

The whole 'eBPF in cgroups' idea was born because through the discussions over the past months we had on all this, it became clear to me that netfilter is not the right place for filtering on local tasks. I agree the solution I am proposing in my patch set has its downsides, mostly when it comes to transparency to users, but I considered that acceptable. After all, we have eBPF users all over the place in the kernel already, and seccomp, for instance, isn't any better in that regard.

Even so, he said, he would be willing to look again at a solution based on netfilter, especially if Pablo were willing to help with the implementation — something that Pablo said he could do. BPF developer Alexei Starovoitov was rather less impressed, suggesting that a netfilter-based solution should be considered as a separate facility in the future, if a way can be found to implement it without slowing things down too much.

And that is where the discussion stands as of this writing. In a sense, netfilter and BPF were always destined to come into conflict at some point; both are, in essence, mechanisms for loading packet-filtering policy into the kernel. Even if this particular disagreement is solved without undue drama, this question is likely to come up again in other contexts. Thus far, there seem to be few bounds on places where BPF may be applicable but, perhaps, it still isn't the solution to every policy problem that comes along.

Comments (4 posted)

Semantics of MMIO mapping attributes across architectures

August 24, 2016

This article was contributed by Paul E. McKenney, Will Deacon, and Luis R. Rodriguez

Although both memory-mapped I/O (MMIO) and normal memory (RAM) are ultimately accessed using the same CPU instructions, they are used for very different purposes. Normal memory is used to store and retrieve data, of course, while MMIO is instead primarily used to communicate with I/O devices, to initiate I/O transfers and to acknowledge interrupts, for example. And while concurrent access to shared memory can be complex, programmers need not worry about what type of memory is in use, with only a few exceptions. In contrast, even in the single-threaded case, understanding the effects of MMIO read and write operations requires a detailed understanding of the specific device being accessed by those reads and writes. But the Linux kernel is not single-threaded, so we also need to understand MMIO ordering and concurrency issues.

This article looks under the hood of the Linux kernel's MMIO implementation, covering a number of topics:

MMIO introduction
MMIO access primitives
Memory types
x86 implementation
ARM64 implementation
PowerPC implementation
Summary and conclusions

MMIO introduction

MMIO offers both read and write operations. MMIO writes are used for one-way communication, causing the device to change its state, as shown in the diagram on the right. The MMIO write operation transmits the data and a portion of the address to the device, and the device uses both quantities to determine how it should change its state.

Quick quiz 1: Why can't the device make use of the full address?
Answer

The size of the MMIO write is also significant, in fact, the device might react completely differently to a single-byte MMIO write than to (say) a four-byte MMIO write. The size of the access could therefore be thought of as additional bits feeding into the device's state-change logic.

MMIO reads are used for two-way communication, causing the device to return a value based on its current state. [MMIO read] The MMIO read operation transmits a portion of the address to the device, which the device can use to determine how to query its state in order to compute the return value. Interestingly enough, the device can also change its state based on the read, and many devices do exactly that. For example, the MMIO read operation that reads a character from a serial input device would be expected to also remove that character from the device's internal queue, so that the next MMIO read would read the next input character. As with writes, the size of the MMIO read is significant.

An MMIO read operation signals its completion by returning the value from the device. In contrast, the only way to determine when an MMIO write operation has completed is to do an MMIO read to poll for completion. Such polling is completely and utterly device-dependent, sometimes even requiring time delays between the initial MMIO write and the first subsequent MMIO read.

Given that both MMIO reads and writes can change device state, ordering is extremely important and, as will be discussed below, many of the Linux kernel's MMIO access functions provide strong ordering, both with each other and with locking primitives and value-returning atomic operations. However, for devices such as frame buffers, it is not helpful to provide strict ordering, as the order of writes to independent pixels is irrelevant. In fact, write combining (WC) is an important frame-buffer performance optimization, and this optimization explicitly ignores the order of non-overlapping writes. This means that the hardware and the Linux kernel need some way of specifying which MMIO locations can and cannot tolerate reordering.

The x86 family responded to this need with memory type range registers (MTRRs), which were used to set WC cache attributes for VGA memory. MTRRs have been used to enable different caching policies for memory regions on different PCI devices. When an x86 system boots, the default cache attributes for physical memory are set by setting the model-specific register that sets the default memory type (MSR_MTRRdefType), and then MTRRs are used to modify memory in other ranges to other cache attributes, for example, uncached (UC) or WC for MMIO regions. Some BIOSes set MSR_MTRRdefType to writeback, which is a common default for DRAM. Other BIOSes might set MSR_MTRRdefType to UC, and then use MTRRs to set DRAM to writeback. One of the biggest issues with MTRRs is the limited number of them. In addition, using MTRRs on x86 requires the use of the heavyweight stop_machine() call whenever the MTRR configuration changes.

Quick quiz 2: Can you set up UC access to normal (non-MMIO) RAM?
Answer

The page attribute table (PAT) relies on paging to lift this limitation. Unfortunately, the BIOS runs in real mode, in which paging is not available, which means that the BIOS must continue to use MTRRs so x86 systems will continue to have them. However, Linux kernels can use paging, and can therefore use PAT when running on hardware providing it.

In short, MMIO reads and writes can be thought of as a message-passing communication mechanism for interacting with devices; they can be uncached for traditional device access or write combining for access to things like frame buffers and InfiniBand, and they require special attention in order to interact properly with synchronization primitives such as locks. The Linux kernel therefore provides architecture-specific primitives that implement MMIO accesses, as described in the next section.

MMIO access primitives

The readX() function does MMIO reads, with the X specifying the size of the read, so that readb() reads one byte, readw() reads two bytes, readl() reads four bytes, and, on some 64-bit systems, readq() reads eight bytes. These functions are all little-endian, but in some cases, big-endian behavior can be specified using an additional "_be" component to the X suffix.

The writeX() function does MMIO writes, with the X specifying write size as above, and again in some cases with an additional "_be" component to the X suffix.

There are also inX() and outX() functions that map back to the x86 in and out instructions, respectively. The X suffix contains the size in bits and a "be" or "le" component to specify endianness. These are sometimes mapped to MMIO on non-x86 systems. The ioreadX() and iowriteX(), where X is the number of bits to operate on, can also be used to read and write MMIO; they were added to hide the differences between in/out operations and MMIO.

Linus Torvalds created the following example on a Kernel Summit whiteboard to illustrate ordering requirements:

     1 unsigned long global = 0;
     2
     3 void locked_device_output(void)
     4 {
     5   spin_lock(&a);
     6   i = global++;
     7   writel(i, dev_slave_address);
     8   spin_unlock(&a);
     9 }

Line 5 acquires lock a, line 6 increments a global variable global under that lock, line 7 writes the previous value of global to an MMIO location at dev_slave_address, and finally line 8 releases the lock. In an ideal world, both lines 6 and 7 would be protected by the lock when locked_device_output() is invoked concurrently.

Of course, the normal variable global is protected by the lock, that being what locks are for. However, for the MMIO write to dev_slave_address, such protection requires that the implementations of spin_lock(), spin_unlock(), and writel() cooperate so as to provide the ordering required to force the writel() to dev_slave_address of the value 0 from the first locked_device_output() call to happen before the second call writes the value 1. This is what x86 does and what most developers would expect to happen. Weakly ordered systems must therefore insert whatever memory barriers are required to enforce this ordering.

Quick quiz 3: What do weakly ordered systems do to enforce this ordering?
Answer

Providing the required ordering can be expensive on weakly ordered systems. Because there are a number of situations where ordering is not required (for example, frame buffers), the Linux kernel provides relaxed variants (readX_relaxed() corresponding to readX() and writeX_relaxed() corresponding to writeX()) that do not guarantee strong ordering, which can be used as follows:

     1 unsigned long global = 0;
     2
     3 void locked_device_output(void)
     4 {
     5   spin_lock(&a);
     6   i = global++;
     7   writel_relaxed(i, dev_slave_address);
     8   spin_unlock(&a);
     9 }

Because this example uses writel_relaxed() instead of writel(), the writel_relaxed() can be reordered with the spin_unlock(), so that the write of the value 1 might well precede the write of the value 0. An MMIO write memory barrier, called mmiowb(), may be used to prevent MMIO writes from being reordered with each other or or with locking primitives and value-returning atomic operations. This mmiowb() primitive can be used as shown below:

     1 unsigned long global = 0;
     2
     3 void locked_device_output(void)
     4 {
     5   spin_lock(&a);
     6   i = global++;
     7   writel_relaxed(i, dev_slave_address);
     8   mmiowb();
     9   spin_unlock(&a);
    10 }

Again, without the mmiowb(), the writel_relaxed() call might be reordered with its counterpart from a later instance of this critical section that was running on a different CPU.

Quick quiz 4: Why can't mmiowb() be used with writel()?
Answer

A more useful version of this example might do several writel_relaxed() invocations in the critical section followed by a final mmiowb(). It is worth noting that mmiowb() is a no-op on most architectures.

However, _relaxed() accesses from a given CPU to a specific device are guaranteed to be ordered with respect to each other. Tighter semantics can of course be used: per-bus or even global, for example.

Nevertheless, the _relaxed() functions are not primitives that most device driver developers normally consider using and, even if they did, there are still some kernel calls, such as locking calls, that might nullify such relaxed effects. Its unclear if these implications have always been well thought-out throughout the entire kernel. For instance, it is now understood that PowerPC's default kernel writel() uses a memory barrier; although typically one would expect write-combining to happen in user space for frame buffers, kernel writes could nullify the write-combining effects.

Asking more developers to use the relaxed primitives when write combining might be the first instinct to address this situation, there are other possible issues which still need to be considered, however. For instance, would using a spin_lock() nullify any write-combining effects on some architectures even if relaxed primitives were used? If so, which architectures would be affected? Are we nullifying write combining in some areas in the kernel even on x86 if locks are used? To answer these questions, we must review each architecture's MMIO and locking primitive helpers and the implications of them on ordering.

Memory types

As noted earlier, this article covers the two most common flavors of MMIO, uncached MMIO and write-combining MMIO. In uncached MMIO, each read and write is independent and in some sense atomic, with no combining, prefetching, or caching of any kind. The ioremap_nocache() function is used to map uncached MMIO registers. In write-combining MMIO, both reads and writes can be both coalesced and reordered, even the non-_relaxed() reads and writes. Memory that is write combining is also normally "prefetchable", and these terms sometimes appear to be used interchangeably. The ioremap_wc() function is used to map write-combining MMIO registers.

For any given architecture, there are some questions about write combining:

What prevents reordering and combining? (Presumably mmiowb() and perhaps also mb().)
What operations flush the write buffers? (Hardware dependent, but reads from a given device typically flush prior writes to that same device.)

Five additional per-architecture questions will be addressed in tabular form:

Must non-relaxed accesses to MMIO regions be confined to lock-based critical sections? (Presumably the answer is "yes".)
Must relaxed accesses to MMIO regions be confined to lock-based critical sections? (Prudence would suggest "yes" as the answer, at least in the UC case.)
Must reads from MMIO regions be ordered with each other? (Presumably the answer is "no" for _relaxed() primitives.)
Must reads from MMIO regions be ordered with writes to other locations within the region? (Presumably the answer is "no".)
Must accesses to specific locations in MMIO regions be ordered with other accesses to that same location? (Presumably the answer is "yes", even for accesses to WC MMIO regions, at least for completely overlapping updates. Otherwise you would get old pixels on your display, after all.)

It is natural to wonder what happens if a given range of MMIO registers is mapped as write combining at one virtual address and as uncached (non-write-combining) at some other address. The answer varies across both architectures and devices, so that the current Linux-kernel stance is "don't do that" unless absolutely necessary.

Regardless of what the answers are, they clearly need to be better documented.

Existing practice

There are more than 2,000 uses of writel_relaxed() and more than 1,000 uses of readl_relaxed(), so existing practice must be taken into account: changes might be made, but not lightly. Many uses of these primitives are in architecture-specific code, but there are common-code uses in some drivers. We took a look at a few of them:

drivers/ata/ahci_brcmstb.c: This driver uses brcm_sata_readreg() and brcm_sata_writereg() to wrap readl_relaxed() and writel_relaxed(), respectively. The code appears to expect that relaxed reads and writes from/to the same device will be ordered.
drivers/crypto/atmel-aes.c: This driver uses atmel_aes_read() and atmel_aes_write() to wrap readl_relaxed() and writel_relaxed(), respectively. The code appears to expect that relaxed reads and writes from/to the same device will be ordered.
drivers/crypto/img-hash.c: This driver uses img_hash_read() and img_hash_write() to wrap readl_relaxed() and writel_relaxed(), respectively. The code appears to expect that relaxed reads and writes from/to the same device will be ordered.
drivers/crypto/ux500/cryp/cryp.c appears to expect that relaxed reads and writes from/to the same device will be ordered. At present, this driver does not seem to be used outside of ARM, but crypto IP blocks are not necessarily tied to ARM.

Although the bulk of the uses of the relaxed I/O accessors are confined to one architecture or another, it would not necessarily be wise to define CPU-family-specific changes to their semantics. Such changes are likely to cause serious problems should one of the corresponding hardware IP blocks ever be used by an implementation of the some other CPU family.

x86 implementation

The x86 mapping is as follows:

API	Implementation	Ordering
`mmiowb()`	`barrier()`	Provided by x86 ordering
`spin_unlock`()	`arch_spin_unlock()`	Provided by x86 ordering
`inb() inw() inl()`	`inb` instruction `inw` instruction `inl` instruction	See table below
`outb() outw() outl()`	`outb` instruction `outw` instruction `outl` instruction	See table below
`readb()`, `readb_relaxed()`, `ioread8() readw()`, `readw_relaxed()`, `ioread16() readl()`, `readl_relaxed()`, `ioread32() readq()`, `readq_relaxed()`	MMIO read	See table below
`writeb()`, `writeb_relaxed()`, `iowrite8() writew()`, `writew_relaxed()`, `iowrite16() writel()`, `writel_relaxed()`, `iowrite32() writeq()`, `writeq_relaxed()`	MMIO write	See table below

The readX() and writeX() definitions are built by the build_mmio_read() and build_mmio_write() macros, respectively.

The x86 answers to the other questions appear to be as follows, based on a scan through "Intel 64 and IA-32 Architectures Software Developer's Manual V3":

What prevents reordering and combining? For non-write-combining MMIO regions, everything. For write-combining MMIO regions, mmiowb(), smp_mb(), an access to a non-write-combining MMIO region, an interrupt, or a locked instruction, which is an instruction having the LOCK prefix that signals that the instruction is to be an atomic read-modify-write instruction.
What operations flush the write buffers? The same operations that prevent reordering and combining.

x86	Within `ioremap_wc()`	Within `ioremap_nocache()`	Against normal memory
`_relaxed()`	Unordered, ordered to same location.	Ordered.	Ordered.
non-`relaxed()`	Unordered, ordered to same location.	Ordered.	See [*] below.

[*]:

Accesses to ioremap_wc() memory are not ordered with accesses to normal memory unless:

Either there is an intervening smp_mb(), or
The normal-memory access uses the lock prefix, and
The I/O fabric is "sane" in that it avoids reordering and buffering invisible to the CPU. I/O fabrics that have multiple layers of I/O bus are all too often not sane.

Quick quiz 5: But if x86 always uses the same instructions for MMIO, how can the ordering semantic differ for ioremap_wc() and ioremap_nocache() regions?
Answer

To reiterate that last point, note that this all assumes sane hardware. It is possible to construct x86 systems with I/O bus structures that do not follow the above rules. Drivers written for such systems typically need to "confirm" prior MMIO writes by doing a later MMIO read that either forces ordering or verifies the state changes caused by the write. The exact confirmation method will depend on the details of the I/O device in question.

On older MTRR-only x86 systems, some frame-buffer drivers must also use arch_phys_wc_add(), because on such systems ioremap_wc() would otherwise produce an uncached non-write-combining mapping for the corresponding device. This inability of ioremap_wc() to Do The Right Thing can be due to limited numbers of MTRRs, limited MTRR size, I/O-mapping alignment constraints, page aliasing (for example, to provide both kernel- and user-mode access to MMIO registers), and because some old hardware simply cannot be shoehorned into the nice new PAT-based Linux-kernel APIs.

For more details, see commits "drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()", "drivers/video/fbdev/atyfb: Clarify ioremap() base and length used", and "drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper". Commit "drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC" is particularly instructive, as it describes how old hacks were replaced by newer less-hacky hacks based on the new API.

Those interested in page aliasing should refer to Documentation/ia64/aliasing.txt, particularly the "POTENTIAL ATTRIBUTE ALIASING CASES" section. Fortunately, most device manufacturers now dedicate one full PCI base address register (BAR) to MMIO and another for frame-buffer use, which means that developers writing drivers for modern devices can for the most part simply use the ioremap_nocache() and ioremap_wc() APIs.

One important last note: On x86 systems, spinlock-release primitives usually use a plain store instruction. This will not order accesses within ioremap_wc() regions. Although this might seem strange at first glance, it has the advantage that the effectiveness of write combining is not limited by spin_unlock() invocations.

ARM64 implementation

The arm64 mapping is as follows:

API	Implementation	Ordering
`mmiowb()`	do { } while (0)	Provided by ARM64 ordering
`spin_unlock`()	`arch_spin_unlock()`	`llsc` or `lse` instruction
`readb()`, `ioread8() readw()`, `ioread16() readl()`, `ioread32() readq()`	MMIO read	Follow MMIO read by `rmb()`
`writeb(), iowrite8() writew(), iowrite16() writel(), iowrite32() writeq()`	MMIO write	Precede MMIO write with `wmb()`
`readb_relaxed() readw_relaxed() readl_relaxed() readq_relaxed()`	MMIO read	See table below
`writeb_relaxed() writew_relaxed() writel_relaxed() writeq_relaxed()`	MMIO write	See table below

Note that although ARM does distinguish between WC and non-WC flavors of MMIO regions in terms of ordering; the type of accessor (_relaxed() vs. non-_relaxed()) also has a big role to play. Note that ARM64's non-_relaxed() accessors have ordering properties similar to total store order (TSO), that is, they order prior reads against later reads and writes, and also order prior writes against later writes, but they do not order prior writes against later reads.

The ARM64 answers to the questions are as follows:

What prevents reordering and combining? A non-relaxed MMIO access (aside from not ordering prior writes against later reads) or either mb(), rmb(), or wmb().
What operations flush the write buffers for write-combining regions? Either mb() or wmb(). But please note that this flushing has effect only within the CPU. These memory barrier do not necessarily affect any write buffers that might reside on external I/O buses.

ARM64	Within `ioremap_wc()`	Within `ioremap_nocache()`	Against normal memory
`_relaxed()`	Unordered, but fully ordered for accesses to the same address.	Unordered, but fully ordered for accesses to same device.	Unordered.
non-`relaxed()`	"TSO", but fully ordered for accesses to the same address.	"TSO", but fully ordered for accesses to same device.	See below.

In the above table, "TSO" allows prior writes to be reordered with later reads, but prevents any other reordering.

The lower right-hand cell's rules are as follows:

Prior Access	Next Access	Ordering
Non-Relaxed Read	Plain Read	Ordered (useful for reading from a DMA buffer).
Non-Relaxed Read	Plain Write	Ordered.
Non-Relaxed Write	Plain Read	Unordered.
Non-Relaxed Write	Plain Write	Unordered.
Plain Read	Non-Relaxed Read	Unordered (departure from TSO).
Plain Read	Non-Relaxed Write	Unordered (departure from TSO).
Plain Write	Non-Relaxed Read	Unordered.
Plain Write	Non-Relaxed Write	Ordered (useful for triggering DMA).

Just as with x86, it is possible to construct ARM systems with I/O bus structures that do not follow the above rules. Drivers written for such systems typically need to "confirm" prior MMIO writes by doing a later MMIO read that either forces ordering or verifies the writes' state changes. The exact confirmation method will depend on the details of the I/O device in question.

PowerPC implementation

Finally, the PowerPC mapping uses an ->io_sync field in the Linux kernel's PowerPC-specific per-CPU data. This field is set by PowerPC MMIO writes, and tested at unlock time. If this field is set, the unlock primitive executes a heavyweight sync instruction, which forces the last MMIO write to be contained within the critical section.

The mapping is as follows:

API	Implementation	Ordering
`mmiowb()`		`sync` and clear `->io_sync`
`spin_unlock`()	`arch_spin_unlock()`	If `->io_sync` set, `sync` and clear `->io_sync`
`in_8() in_be16() in_be32() in_le16() in_le32()`	MMIO read	`sync` followed by read followed by `twi;isync`
`out_8() out_be16() out_be32() out_le16() out_le32()`	MMIO write	`sync` followed by write followed by set `->io_sync`
`readb()`, `inb()`, `ioread8() readw()`, `inw()`, `ioread16() readw_be()`, `ioread16be() readl()`, `inl()`, `ioread32() readl_be()`, `ioread32be() readq() readq_be()`	`in_8() in_le16() in_be16() in_le32() in_be32() in_le64() in_be64()`	`sync` followed by read followed by `twi;isync`
`writeb()`, `outb()`, `iowrite8() writew()`, `outw()`, `iowrite16() writew_be()`, `iowrite16be() writel()`, `outl()`, `iowrite32() writel_be()`, `writel_be() writeq() writeq_be()`	`out_8() out_le16() out_be16() out_le32() out_be32() out_le64() out_be64()`	`sync` followed by write followed by set `->io_sync`

The alert reader will note the duplication of some names in the "API" and "Implementation" columns, for example, in_8(). The definitions are in arch/powerpc/include/asm/io.h and arch/powerpc/include/asm/io-defs.h.

Other implementation strategies are possible, of course. One approach would be for mmiowb() and arch_spin_unlock() to both unconditionally execute the sync instruction and to dispense with the ->io_sync flag. Another approach would be to make mmiowb() an no-op, eliminate the test and sync instruction from arch_spin_unlock(), and replace setting of ->io_sync by a sync instruction. However, both of these approaches would greatly increase the number of executions of the expensive sync instruction, so, for PowerPC, the implementation in the above table is preferred.

Currently (v4.3) PowerPC's _relaxed() interfaces operate exactly the same as do their non-relaxed counterparts. Part of the motivation for the MMIO discussion during the technical day at the 2015 Linux Kernel Summit was to determine how and to what extent PowerPC could actually relax the _relaxed() implementations. However, this article limits itself to documenting current reality.

The PowerPC answers to the questions appears to be as follows:

What prevents reordering and combining? Any MMIO access, mmiowb(), or smp_mb(). This is not a good thing, as it makes for slow frame buffers.
What operations flush the write buffers? The same operations that prevent reordering and combining.

PowerPC	Within `ioremap_wc()`	Within `ioremap_nocache()`	Against normal memory
`_relaxed()`	Fully Ordered.	Fully Ordered.	Fully Ordered.
non-`relaxed()`	Fully Ordered.	Fully Ordered.	Fully Ordered.

Summary and conclusions

MMIO should be thought of as a message-passing mechanism that communicates with hardware rather than a variant of normal memory. As such, MMIO is not only device-specific, but also specific to the hardware path between the CPU and the device. In the general case, which includes ill-considered hardware designs, even memory barriers cannot always order accesses: in some cases, the device's state must be polled to determine when a prior access has completed.

Nevertheless, the Linux kernel offers a rich set of primitives with which to interact with MMIO devices, and this article has given a brief overview of how they work and how they may be used.

Acknowledgments

We are grateful to Michael Ellerman, Gautham Shenoy, Peter Zijlstra, Andy Lutomirski, and Boqun Feng for their review and comments. We owe thanks to Toshimitsu Kani, Dave Airlie, Christoph Hellwig, and Matt Fleming for a number of important discussions, and to Jim Wasko for his support of this effort.

Answers to Quick quizzes

Quick quiz 1: Why can't the device make use of the full address?

Answer: Because part of the address is used to select the device.

Back to Quick quiz 1.

Quick quiz 2: Can you set up UC access to normal (non-MMIO) RAM?

Answer: You can, and this is in fact actually used for GPU memory for GPUs that cannot snoop the CPU caches. One example may be found in ati_create_page_map(), which uses __get_free_page() to allocate a page of DRAM and then later uses set_memory_uc() to change the cache attribute. There is also a set_memory_wc(). Although set_memory_uc() and set_memory_wc() may also be used to set up MMIO, such use is likely to be strongly discouraged. In addition, it is quite possible that the set_memory_uc() and set_memory_wc() APIs will change.

Back to Quick quiz 2.

Quick quiz 3: What do weakly ordered systems do to enforce this ordering?

Answer: They enforce this ordering by a combination of hardware and software ordering constraints. Please read on for more information, leading up to descriptions of the ARM64 and the PowerPC implementations.

Back to Quick quiz 3.

Quick quiz 4: Why can't mmiowb() be used with writel()?

Answer: Actually, they really can be used together. But there is little point in doing so because writel() already provides strong ordering. Therefore, placing an mmiowb() after a writel() has no effect other than to slow things down.

Of course, in this case it would be simpler to just use writel() instead of both writel_relaxed() and mmiowb(). However, mmiowb() is quite useful when there are multiple writel_relaxed(), all of which need to be contained within the critical section. A single mmiowb() placed between the last writel_relaxed() and the unlock will contain all of them, and with the added memory-barrier overhead incurred only once at mmiowb() time instead of once for each and every writel().

Back to Quick quiz 4.

Quick quiz 5: But if x86 always uses the same instructions for MMIO, how can the ordering semantic differ for ioremap_wc() and ioremap_nocache() regions?

Answer: Because the ordering is controlled not by the instructions, but rather by the MTRR settings (in older systems) or by PAT (in newer systems).

Back to Quick quiz 5.

Comments (4 posted)

Linus Torvalds Linux 4.8-rc3 Aug 21

Greg KH Linux 4.7.2 Aug 20

Greg KH Linux 4.4.19 Aug 20

Ben Hutchings Linux 3.16.37 Aug 23

Greg KH Linux 3.14.77 Aug 20

Ben Hutchings Linux 3.2.82 Aug 23

Thiago Jung Bauermann kexec_file_load implementation for PowerPC Aug 19

Josh Poimboeuf x86/dumpstack: rewrite x86 stack dump code Aug 18

Srinivas Pandruvada Support Intel Turbo Boost Max Technology 3.0 Aug 18

Tom Lendacky x86: Secure Memory Encryption (AMD) Aug 22

Brijesh Singh x86: Secure Encrypted Virtualization (AMD) Aug 22

mcgrof@kernel.org linux: generalize sections, ranges and linker tables Aug 19

Waiman Long locking/rwsem: Enable reader optimistic spinning Aug 18

Peter Zijlstra locking/mutex: Rewrite basic mutex Aug 23

Mathieu Desnoyers Restartable sequences system call Aug 19

Namhyung Kim virtio: Implement virtio pstore device (v3) Aug 20

Mikko Rapeli Userspace compile test and fixes for exported uapi header files Aug 22

Zhiyong Tao AUXADC: Mediatek auxadc driver Aug 18

Dawei Chien Add Mediatek thermal driver for mt2701 Aug 18

Erin Lo Add clock support for Mediatek MT2701 Aug 19

Minghsiu Tsai Add MT8173 MDP Driver Aug 19

HS Liao Mediatek MT8173 CMDQ support Aug 24

Yuan Yao dma: Add QorIQ qDMA engine driver support Aug 18

Suravee Suthikulpanit iommu/AMD: Introduce IOMMU AVIC support Aug 18

Andrew Jeffery aspeed: Add pinctrl and gpio drivers Aug 19

Stanimir Varbanov Venus remoteproc driver Aug 19

Benjamin Tissoires Synaptics RMI4 over SMBus Aug 18

Omer Khaliq hwrng/PCI/IOV: Add driver for Cavium Thunder RNG Aug 19

Nishanth Menon firmware: Add support for TI System Control Interface (TI-SCI) protocol driver Aug 19

Neil Armstrong pwm: Add Amlogic Meson SoC PWM Controller Aug 20

Chris Zhong Rockchip Type-C and DisplayPort driver Aug 21

Stanimir Varbanov Qualcomm video decoder/encoder driver Aug 22

Martin Blumenstingl meson: Meson8b and GXBB DWMAC glue driver Aug 20

Raghu Vatsavayi liquidio CN23XX support Aug 21

Jonas Gorski pinctrl: add BCM63XX pincontrol support Aug 19

Thierry Reding Initial Tegra186 support Aug 19

Mirza Krak Add support for Tegra GMI bus controller Aug 24

Noralf Trønnes drm: add SimpleDRM driver Aug 22

Alison Schofield [PATCH] iio: adc: ltc2485: add support for Linear Technology LTC2485 ADC Aug 22

John Stultz Audio support for adv7511 hdmi bridge Aug 22

York Sun driver/clk/clk-si5338: Add common clock framework driver for si5338 Aug 23

Jan Glauber i2c-octeon and i2c-thunderx driver Aug 24

Guenter Roeck Type-C Port Manager Aug 17

Rob Herring UART slave device bus Aug 17

Marek Szyprowski New feature: Framebuffer processors Aug 22

Rafał Miłecki leds: trigger: Introduce an USB port trigger Aug 24

Jonathan Corbet RFC: The beginning of a proper driver-api book Aug 22

Shaun Tancheff Block layer support ZAC/ZBC commands Aug 21

Andreas Gruenbacher Xattr inode operation removal Aug 22

Ross Zwisler re-enable DAX PMD support Aug 23

David Ahern net: diag: support SOCK_DESTROY for UDP sockets Aug 22

John Fastabend support lockless qdisc Aug 23

Jens Wiklander generic TEE subsystem Aug 22

Florian Westphal tcp: increase resilence vs. blind data injection Aug 18

Bringing OSTree to real-world desktops

By Nathan Willis
August 24, 2016

GUADEC

At GUADEC 2016 in Karlsruhe, Germany, Owen Taylor presented a talk entitled "Reworking the desktop distribution" that outlined his vision for how next-generation packaging concepts like Flatpak applications and OSTree system images can be harmonized with the more flexible model long used by Linux distributions. As it turns out, while these newer packaging concepts work well for some basic systems, the story gets more complicated if the user wants to perform certain tasks on their machine—such as develop software.

The concept

He began with the rationale for moving toward OSTree and Flatpak. Traditionally, he said, a distribution is made of "lots and lots of packages." The downside is that this means every user's system is "custom" in a sense. Thus, when somebody says that they "tested Fedora 24," it means that they tested something quite different than you might on your own Fedora 24 machine. It also means distributions and application projects are unable to distinguish between corrupted and uncorrupted systems, modified and standard systems, and other divergent cases.

Users who install distribution updates repeatedly, Taylor added, will eventually end up with some configuration that varies from what would be found on a fresh install—whether it is some changed default setting, some different service running, or "some input method that we tried and dropped a long time ago." Consequently, no one is comfortable releasing packages and saying to users "here you go; it'll have no problems, so you don't need to call me," Taylor said.

Improving on this tangled status quo is one of the goals of the OSTree project. The basic idea, he said, is that the OS image is immutably fixed and all of the applications are self-contained (be they in Flatpak format or in some other flavor of bundle). Several end-user OSes have already taken this static-OS approach, he said, namely Android, ChromeOS, One Laptop Per Child, and now Endless OS. But bringing this model to a "classic Linux user" has remained a challenge.

The first implementation

There has, however, been recent progress via the Fedora Workstation project. Fedora ships a relatively unaltered version of upstream GNOME, Taylor said, with just a few modifications (such as replacing GNOME Web with Firefox). And Atomic Host has already been shipping immutable OS images, "although they do different things in many places" compared to the immutable desktop design. So the Fedora Atomic Workstation effort has been working to combine the Atomic Host approach with Flatpak's self-contained application bundles—and Taylor has been exploring how to add support for software development.

OSTree is "like Git but for binary file trees," Taylor said, and is already used in the GNOME Continuous project. OSTree can be used to build the immutable OS images, he noted, "but how do we make this binary OS image? Out of packages." After a chuckle from the audience, he explained. Traditional package formats (like RPM and Debian) leave a lot undefined, but that does not make them useless. A distribution's kernel package, for example, holds a lot of configuration information. Furthermore, traditional package formats make it easier to distribute security fixes.

Thus, the rpm-ostree tool attempts to build on existing packaging knowledge, and construct OSTree images out of existing packages. It supports a layering facility, so that sets of packages can be overlayed on a base image. That will be the approach used to deploy updates when rebuilding the entire system is not feasible. Though, Taylor confided, "especially here at the beginning, we'll probably get some things wrong." Getting the granularity of the layers right is tricky; "the more that you layer, the more you will experience 'package pain'," he said.

Currently, adding a new layer built with rpm-ostree onto a running system requires a reboot, although that may change. There is also a new "unlock" feature for administrators. It adds a writable layer on top of the root system so that administrators can make alterations; the layer disappears on reboot.

As for Flatpaks, he said, very few of the thousand or so applications packaged for Fedora Workstation have also been packaged as Flatpaks, so the plan is to develop a tool that will automatically create Flatpaks from RPM packages. Things appear to be on track for Fedora Atomic Workstation to be released with Fedora 25; the OSTree builds are currently unofficial but will move into the official Fedora infrastructure and GNOME Software recently added support for managing Flatpak packages in addition to other formats.

About developers

Fedora Workstation is designed to be a distribution for software developers, Taylor said, but their needs differ from those of other desktop users—in ways that are challenging for the OSTree model. To develop software on Linux today, one typically has to install hundreds of packages (which vary by project); if you try to implement all of those packages as OSTree layers, Taylor said, then you'd might as well not use OSTree at all. So some other solution is called for.

The basic requirements, he explained, are, first, that the development environment should not be the same as the workstation environment. "You shouldn't have to break your GTK+ to work on it." Second, the development environment should be reproducible. Third, the environment should function on a variety of different host OSes and for both production and test systems. In addition, he noted, developers should be able to install multiple self-contained development environments in parallel.

The solution that Taylor has been exploring starts with considering different "flavors" of software development separately. For starters, he looked at two specific scenarios: web development and native GNOME application development. The development environments used for those tasks are quite different.

Web developers tend to use a lot of terminal windows and command-line tools for the frameworks of interest (Node.js, Ruby on Rails, etc.), and perhaps a code editor like Atom. "We want to accommodate this, but we want to do it in a way that's better than saying 'here's a terminal; have fun'," he explained.

He then showed a demo of a tool he created called PurpleEgg (which is purely a pre-production code name, he said). It enables web developers to set up and use development environments on an OSTree-based system by creating a separate container for each development project. Using the PurpleEgg tool, he initialized a new Django project and described the results. The tool created an empty Docker container, installed Git, Python, and other system packages, installed and configured Django inside of a Python virtual environment, and installed a project template.

The container was created with one user (created to match his user on the host system) and included a copy of his existing Git configuration. A desktop launcher was also created in the host OS that will launch the container and open up a terminal window inside it. There are still other features yet to be added, he said, such as starting a web server in the container for testing and doing more extensive Git configuration.

But the host OS remains untouched, which is the ultimate goal. Still, the web-development case was the simplest. GNOME application developers have a less predictable setup: some use JHBuild, some might use an IDE like Builder. And, whatever the setup of today looks like, it may change in the future when Flatpak usage increases. Plus, there are numerous other development fields that might expect a different solution, he said, such as scientific development, kernel hacking, mobile development, and game development. So further work is clearly called for.

The session ended with a brief question-and-answer period. Bastien Nocera pointed out that "system development" (meaning work on daemons, systemd, low-level configuration, etc.) may be yet another category of software development that needs to be treated separately. "Sometimes we would need to install an overlay and restart everything that lives," he said. "How do we handle that?" Taylor replied that OSTree also supports a "hotfix" mode that lets users make persistent changes to the system; it could enable making changes that persist after reboot.

Christian Hergert expressed concern about OSTree's extensive use of OverlayFS for layering, noting that the filesystem is not POSIX-complaint and, in fact, is so different that fixing it would amount to rewriting OverlayFS from scratch. Taylor replied that something else may be better, but also said that the OSTree layers are ultimately used for hacking, while updates to the full image are meant to be used for long-term updates, so perhaps weaker stability guarantees are acceptable.

Taylor emphasized that the PurpleEgg tool is merely a proof of concept, and there are many issues remaining to be worked out. But it could provide the template necessary to bridge the gap between the purely-static OS environment of OSTree and the per-app isolation offered by Flatpak and other application-sandboxing formats.

[The author would like to thank the GNOME Foundation for travel assistance to attend GUADEC 2016.]

Comments (5 posted)

Quote of the week

I think it's going to be somewhat painful, but, honestly, we've had a series of really excellent releases without much excitement, and I think we can afford a little — especially if the X fallback is functional and clearly documented.

— Matthew Miller

Comments (none posted)

Android 7.0 "Nougat" released

Google has announced that the Android 7.0 release has started rolling out to recent-model Nexus devices. "It introduces a brand new JIT/AOT compiler to improve software performance, make app installs faster, and take up less storage. It also adds platform support for Vulkan, a low-overhead, cross-platform API for high-performance, 3D graphics. Multi-Window support lets users run two apps at the same time, and Direct Reply so users can reply directly to notifications without having to open the app. As always, Android is built with powerful layers of security and encryption to keep your private data private, so Nougat brings new features like File-based encryption, seamless updates, and Direct Boot." See this page for a video-heavy description of new features.

Comments (15 posted)

The debian-private resolution decides nothing

The general resolution vote on the status of "classified" discussions on the debian-private mailing list has come to a conclusion — sort of. A total of 256 ballots were counted, but the final result was "further discussion." So the issue of the past contents of debian-private looks to remain unsolved indefinitely.

Comments (1 posted)

Fedora 25 to run Wayland by default

The Fedora engineering steering committee has agreed that the upcoming Fedora 25 release should use the Wayland display manager by default. "There are still some bugs that are important to solve. However, there is still time to work on them. And the legacy Xorg session option will not be removed, and will be clearly documented how to fallback in cases where users need it." If this plan holds, it may be an important step in the long-awaited move away from the X Window system.

Comments (89 posted)

In Memory of Jonathan “avenj” Portnoy

The Gentoo community is mourning the loss of Jonathan Portnoy. "Jon was an active member of the International Gentoo community, almost since its founding in 1999. He was still active until his last day. His passing has struck us deeply and with disbelief. We all remember him as a vivid and enjoyable person, easy to reach out to and energetic in all his endeavors."

Comments (16 posted)

Ubuntu Yakkety Yak frozen

The Ubuntu "Yakkety Yak" release has gone into feature freeze. "This will let us create a solid and well-groomed Yak in October that we'll all want to take a ride on for the following nine months."

Full Story (comments: none)

New BlackArch Linux 2016.08.19 released

There is a new release of BlackArch Linux available. "BlackArch Linux is an Arch-based GNU/Linux distribution for pentesters and security researchers. The BlackArch package repository is compatible with existing Arch installs." Changes include more than 100 new tools and an update to the 4.7.1 kernel.

Full Story (comments: none)

Distribution newsletters

Distrowatch Weekly, issue 675 (August 22)
openSUSE Tumbleweed – Review of the Week (August 19)
Ubuntu Kernel Team Newsletter (August 23)

Comments (none posted)

Flock 2016 in Krakow – Recap (Fedora Magazine)

Fedora Magazine has a summary of the recently concluded Flock meeting in Krakow, Poland. "Over 200 developers and enthusiasts from different continents met to learn, present, debate, plan, and celebrate. Although Fedora is the innovation source for a major Red Hat product (Red Hat Enterprise Linux), this event received 'gold' level sponsorship from a sister community — openSUSE. openSUSE serves the same function for SuSE Linux Enterprise as Fedora does for RHEL. SUSE showed the fellowship that rules in the open source world, which is why we love it!"

Comments (none posted)

Key GNOME component updates

By Nathan Willis
August 24, 2016

GUADEC

GNOME is a massive project with a lot of moving pieces that are directed by separate individuals so, at times, it can be hard to up to date with where things are moving. At GUADEC 2016 in Karlsruhe, Germany, developers provided updates on the Flatpak packaging format, the GTK+ Scenegraph Kit (GSK) API, and a new logging system for GNOME applications.

Flatpak

Alexander Larsson presented an update on the progress within the Flatpak project over the past year, followed by a look at what comes next. In addition to "the great renaming," he said, the past year saw the project move from Freedesktop.org to GitHub, add several tools, and add lots of polish.

The new features include flatpak-builder, a program that builds Flatpak packages on the command line. Previously, he said, users had to write their own scripts using low-level APIs, which certainly hindered adoption. Support was also added for AppStream metadata, so Flatpak packages now provide user-level information (such as descriptions of the program). This allowed the GNOME Software installer tool to add support for managing Flatpak packages.

Package installations are now system-wide by default (although per-user packages are still supported). Permissions are now handled by PolicyKit, which makes system-wide installs more reliable. The first iteration of portals was added, enabling users to mediate an application's requests for access to potentially risky services. The project also released a stripped-down version of the basic container used in Flatpak packages, called bubblewrap, that can be used to isolate any program.

The Flatpak package format was also extended to support portable "single-file bundles" that can be sent via email or exchanged on USB sticks, with no system infrastructure required. The first Flatpak runtimes (separate packages provide the system APIs a Flatpak requires to run) were also released; a GNOME 3.20 runtime and a generic Freedesktop.org runtime have been produced so far. New architectures are now supported as well, including 32-bit and 64-bit ARM. Finally, several external adopters have started using Flatpak bundles, notably LibreOffice and Endless Mobile; KDE has also done some initial work toward building a compatible runtime.

Moving forward, Larsson said, portals will be an area of major work. Several new portals are needed, including one for access to DConf, one for sharing media (such as over social networks), one for accessing contacts, and several new device classes (including optical drives, USB sticks, and game controllers). More infrastructure work is needed to support some of the portals, however. Wayland is still missing pieces like cross-window processing (which makes clipboard sharing problematic), PulseAudio "is completely unsafe" at present, and the Pinos camera-access API lacks a viable permissions model.

Larsson added that he hopes to see the Flatpak work dispersed more throughout the GNOME project in the future, since, at present, it is almost a solo effort. The GNOME Release Team could maintain runtimes, he suggested, and individual application teams could build their own Flatpak packages and maintain the necessary build information and metadata. Further out, Larsson speculated that the project might need to consider "long-term support" runtimes that provide a minimal set of services but would be suitable for applications that are rarely updated (such as proprietary programs). It would also be nice, he said, to offer a personal package archive (PPA)-like site for Flatpak distribution and to support paid programs.

GSK

Emannuele Bassi began his session by noting that "the GSK talk" had become a GUADEC staple, much like "the Clutter talk" was before it. GSK is Bassi's Clutter replacement, a drawing API that will allow application developers to create relatively free-form user interface elements by defining geometric primitives in hierarchical layers and transforming them using standard positioning, visibility, and animation operations.

But GSK is still in development, so, rather than cover the same ground as last year by discussing incremental progress, he chose instead to spend most of the allotted time recapping the history of how GTK+ has drawn pixels to the screen over its lifetime. The project is close to 20 years old, he said: the first release was in 1997, roughly when MP3s first hit the internet, Windows 95 dominated the OS marketplace, and the browser wars had not yet started.

At that time, "the apex of GUI toolkits was Motif," and GTK (with no "+") looked exactly like it. Each widget was backed by an X11 window, each had its own color map on the X server, and all drawing commands were sent over the wire. The 1.2 release in 1999 added the first theming support, but it was not until 2.0 in 2002 that GTK+ was "all grown up" and the project decided to "bring the light to other platforms" like Windows and Mac OS X. That meant reimplementing the X11 drawing API for those platforms, which is what led to GDK.

By the time GTK+ 2.6 came along, several new projects had been started, most notably Cairo and the XRender extension. GTK+ added support for them, replacing X11 calls with Cairo calls and moving to client-side window rendering—but support for the deprecated APIs was not removed. In 2011, GTK+ 3.0 was released, and OpenGL drawing was added—although it was not yet taking advantage of GPUs. GTK+ also added support for the CSS drawing model, which replaced the Cairo drawing model.

But the Cairo API has remained part of the 2D rendering stack, even though it is only optimal for a few specific pipelines, Bassi said. That is because it turns out that "OpenGL is bad at drawing GUI elements unless your elements look like Quake." Thus, GTK+ today continues to use Cairo "for what it's good at" while using OpenGL for what it was good at.And OpenGL has been getting better, he said; in the long run Cairo may eventually be dropped, but for the time being it will remain supported.

This approach is "the way out," he said; retaining both the Cairo and OpenGL APIs lets GTK+ make more efficient use of multiple cores, without forcing application developers to rewrite everything for a new drawing API. Moreover, maintaining support for an old API while adding a new one has always been GTK+'s approach: the same decision was made when adding GDK to X11, when adding Cairo to GDK, and when adding OpenGL to Cairo.

This history lesson, Bassi said, was meant to reassure developers that the current work on GSK would not disrupt their applications. He is still working hard on GSK itself, optimizing rendering on the GPU not the CPU. "So are we in 'the future' yet? We're pretty close," he concluded. GSK can do high-quality animations, 3D transformations, compositing and rendering CSS primitive, and more. But it is important to remember that there will be always be something new coming down the pipeline, he said.

GSK will not be available in the upcoming GTK+ 3.22 release, but it is likely to be merged for the subsequent version.

Structured logs

Philip Withnall discussed his recent work adding a structured logging facility to GLib. The library's existing logging interface is GLog, "which everyone has been using forever," he said. But GLog is rather limited, so he has been borrowing ideas about structured logging from journald and other projects in order to develop a superior replacement. The work is being done in upstream Glib, even though Withnall has been doing it on behalf of Collabora.

The upshot is that structured logging uses a set of "key: value" fields, rather than a simple string, for each log message. There are numerous benefits. In particular, structured logging allows each log message to have a unique message ID, enabling searches within the log and in the source code. There is a new writer function that replaces the old GLog log handlers, and the new Glib log API hooks into journald, so that it can tell if journald is running and whether or not log messages are being directed to the journald socket.

The new API supports context pointers in log messages, which allows logging complex state information without the use of global variables. And the API also supports color console output, he added, "which I think is probably the most important change." In light of the new features, he said, "we're encouraging people to log everything," then to do whatever filtering is necessary in the log viewer, rather than trying to fine tune what messages are logged in advance or requiring users to set an "esoteric" log level via a command-line switch.

Withnall then recapped the shortcomings of GLog. In that system, each library would install its own log handlers to do "something different" with log messages from applications. How various log domains and log levels interacted was, essentially, undefined, and there was little that a developer could do to customize behavior. Some features could be tweaked in a log handler, but others (like how fatal errors were handled or when to abort on an error) were hard coded in GLib.

There was also some global state that various libraries on a system would fight over (such as which errors were regarded as fatal), leading to unpredictability. And there was no way for unit-testing code to test for complex sequences of messages; the GTest facility could watch for a specific sequence of messages, but it cold not cope with optional messages or variation in the delivery order.

Next, he presented the new structured log format. The key-value pairs in a log message do not need to be formatted into a string immediately, so that task can be put off as long as is convenient. Keys can be namespaced to implement application-specific message features. Performance is potentially better than the old log API, he said, although it depends on how it is used. Applications that do string formatting on all of their log messages will, of course, pay the price for that. Without string formatting, there is a zero-copy path all the way to the socket output.

Porting existing programs to the new API is straightforward, he said. Developers merely need to include:

    #define G_LOG_USE_STRUCTURED
    #include <glib.h>

and add a g_message("foo") call in their code. This will pass through the message ("foo") to the structured logging facility, plus the file name, line number, and function name. Full control over the log message can be achieved with the g_log_structured() function, which lets the developer supply whatever key-value pairs are of interest.

There is also a g_log_structured_array() function ("the turbo option") that can be used to pass in an entire array of structured log fields. Among the other changes is a new G_DEBUG_HERE macro that is expanded into the standard GLib debug-message call.

Each application can also now define its own log-writer function; this is a change from the old API, in which log handlers were implemented in libraries and application code had little to no flexibility over log message content or logging policy. There are default log-writer functions provided (one that simply logs to journald and one that sends output to stdout) that application developers can extend as required.

There were a few questions from the audience at the end of the session. One person asked how message IDs are generated; Withnall replied that they can be any string that the developer wants—which prompted Christian Hergert to suggest that development tools like Builder could be configured to generate them automatically. Lennart Poettering asked if the facility could be extended to handle error codes; Withnall responded that it sounded like a feasible idea and promised to explore it.

[The author would like to thank the GNOME Foundation for travel assistance to attend GUADEC 2016.]

Comments (none posted)

Quotes of the week

Occasionally I remember that all the technologies I work on rely on DNS, and I shed a tiny tear.

— Chris Webber

‏I get more and more scared to start using Rust.... because it could just be "Persona'ed" or "FirefoxOS'ed"

— Hubert Figuière

Comments (none posted)

KDE Applications 16.08.0 is available

Version 16.08 of the KDE Applications collection has been released. Several additional applications have been ported to KDE Frameworks 5; the Kontact Suite has undergone a significant clean-up effort and now offers improved VCard 4 support, and Marble 2.0 has been added, featuring experimental support for locally rendering vector map tiles.

Full Story (comments: none)

Microsoft announces PowerShell for Linux and Open Source

Microsoft has announced the release of its PowerShell automation and scripting platform under the MIT license, complete with a GitHub repository. "Last year we started down this path by contributing to a number of open source projects (e.g. OpenSSH) and open sourcing a number of our own components including DSC resources. We learned that working closely with the community, in the code and with our backlog and issues list, allowed us prioritize and drive the development much more responsively. We’ve always worked with the community but shifting to a fine-grain, tight, feedback loop with the code, energized the team and allowed us to focus on the things that had the most impact for our customers and partners. Now we are going big by making PowerShell itself an open source project and making it available on Mac OS X, Ubuntu, CentOS/RedHat and others in the future."

Comments (78 posted)

kdenlive 16.08.0 released

The kdenlive video editor project has announced the 16.08.0 release. "Kdenlive 16.08.0 marks a milestone in the project’s history bringing it a step closer to becoming a full-fledged professional tool." Highlights include three-point editing, pre-rendering of timeline effects, Krita image support, and more.

Comments (2 posted)

KDevelop 5.0 released

Version 5.0.0 of the KDevelop integrated development environment (IDE) has been released, marking the end of a two-year development cycle. The highlight is a move to Clang for C and C++ support: "The most prominent change certainly is the move away from our own, custom C++ analysis engine. Instead, C and C++ code analysis is now performed by clang." The announcement goes on to describe other benefits of using Clang, such as more accurate diagnostics and suggested fixes for many syntax errors. KDevelop has also been ported to KDE Frameworks 5 and Qt 5, which opens up the possibility of Windows releases down the line.

Comments (4 posted)

Introducing OpenStreetView

The OpenStreetMap (OSM) project has unveiled a new project named OpenStreetView that provides "free and open street level imagery platform" compatible with OSM. A web interface and apps for Android and iOS are available; like OSM, the new project allows individual users to contribute imagery to the database. The mobile apps can also be tethered to the diagnostic ports of most cars (using a Bluetooth adapter), which enables the collection of finer-grained speed and position data for placing captured images on the map.

Comments (3 posted)

Development newsletters from the past week

What's cooking in git.git (August 18)
GNU Toolchain Update (August 24)
This Week in GTK+ (August 22)
OCaml Weekly News (August 23)
Perl Weekly (August 22)
PostgreSQL Weekly News (August 21)
Python Weekly (August 18)
Ruby Weekly (August 18)
This Week in Rust (August 23)
Wikimedia Tech News (August 22)

Comments (none posted)

Mozilla rebranding: Now for the fun part

Mozilla has launched an "open design" process to evaluate potential rebranding ideas with input and feedback from the public. Tim Murray has declared the process underway, noting "we’ve jumped off this cliff together, holding hands and bracing for the splash." Seven potential designs have been posted, and the project invites the public to "have a look at the seven options and tell us what you think." A variety of feedback mechanisms are provided.

Comments (2 posted)

Xenomai project mourns Gilles Chanteperdrix

The Xenomai project is mourning Gilles Chanteperdrix, a longtime maintainer of the realtime framework, who recently passed away. In the announcement, Philippe Gerum writes: "Gilles will forever be remembered as a true-hearted man, a brilliant mind always scratching beneath the surface, looking for elegance in the driest topics, never jaded from such accomplishment. According to Paul Valéry, “death is a trick played by the inconceivable on the conceivable”. Gilles’s absence is inconceivable to me, I can only assume that for once, he just got rest from tirelessly helping all of us."

Comments (none posted)

CFP Deadlines: August 25, 2016 to October 24, 2016

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
August 31	November 12 November 13	PyCon Canada 2016	Toronto, Canada
August 31	October 31	PyCon Finland 2016	Helsinki, Finland
September 1	November 1 November 4	Linux Plumbers Conference	Santa Fe, NM, USA
September 1	November 14	The Third Workshop on the LLVM Compiler Infrastructure in HPC	Salt Lake City, UT, USA
September 5	November 17	NLUUG (Fall conference)	Bunnik, The Netherlands
September 9	November 16 November 18	ApacheCon Europe	Seville, Spain
September 12	November 14 November 18	Tcl/Tk Conference	Houston, TX, USA
September 12	October 29 October 30	PyCon.de 2016	Munich, Germany
September 13	December 6	CHAR(16)	New York, NY, USA
September 15	October 21 October 23	Software Freedom Kosovo 2016	Prishtina, Kosovo
September 25	November 4 November 6	FUDCon Phnom Penh	Phnom Penh, Cambodia
September 30	November 12 November 13	T-Dose	Eindhoven, Netherlands
September 30	December 3	NoSlidesConf	Bologna, Italy
September 30	November 5 November 6	OpenFest 2016	Sofia, Bulgaria
September 30	November 29 November 30	5th RISC-V Workshop	Mountain View, CA, USA
September 30	December 27 December 30	Chaos Communication Congress	Hamburg, Germany
October 1	October 22	2016 Columbus Code Camp	Columbus, OH, USA
October 19	November 19	eloop 2016	Stuttgart, Germany

If the CFP deadline for your event does not appear here, please tell us about it.

Events: August 25, 2016 to October 24, 2016

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
August 24 August 26	KVM Forum 2016	Toronto, Canada
August 24 August 26	YAPC::Europe Cluj 2016	Cluj-Napoca, Romania
August 25 August 26	Xen Project Developer Summit	Toronto, Canada
August 25 August 26	Linux Security Summit 2016	Toronto, Canada
August 25 August 26	The Prometheus conference	Berlin, Germany
August 25 August 28	Linux Vacation / Eastern Europe 2016	Grodno, Belarus
August 27 September 2	Bornhack	Aakirkeby, Denmark
August 31 September 1	Hadoop Summit Melbourne	Melbourne, Australia
September 1 September 7	Nextcloud Conference	Berlin, Germany
September 1 September 8	QtCon 2016	Berlin, Germany
September 2 September 4	FSFE summit 2016	Berlin, Germany
September 7 September 9	LibreOffice Conference	Brno, Czech Republic
September 8	LLVM Cauldron	Hebden Bridge, UK
September 8 September 9	First OpenPGP conference	Cologne, Germany
September 9 September 10	RustConf 2016	Portland, OR, USA
September 9 September 11	GNU Tools Cauldron 2016	Hebden Bridge, UK
September 9 September 11	Kiwi PyCon 2016	Dunedin, New Zealand
September 9 September 15	ownCloud Contributors Conference	Berlin, Germany
September 13 September 16	PostgresOpen 2016	Dallas, TX, USA
September 15 September 17	REST Fest US 2016	Greenville, SC, USA
September 15 September 19	PyConUK 2016	Cardiff, UK
September 16 September 22	Nextcloud Conference	Berlin, Germany
September 19 September 23	Libre Application Summit	Portland, OR, USA
September 20 September 22	Velocity NY	New York, NY, USA
September 20 September 21	Lustre Administrator and Developer Workshop	Paris, France
September 20 September 23	PyCon JP 2016	Tokyo, Japan
September 21 September 23	X Developers Conference	Helsinki, Finland
September 22 September 23	European BSD Conference	Belgrade, Serbia
September 23 September 25	OpenStreetMap State of the Map 2016	Brussels, Belgium
September 23 September 25	PyCon India 2016	Delhi, India
September 26 September 27	Open Source Backup Conference	Cologne, Germany
September 26 September 28	Cloud Foundry Summit Europe	Frankfurt, Germany
September 27 September 29	OpenDaylight Summit	Seattle, WA, USA
September 28 September 30	Kernel Recipes 2016	Paris, France
September 28 October 1	systemd.conf 2016	Berlin, Germany
September 30 October 2	Hackers Congress Paralelní Polis	Prague, Czech Republic
October 1 October 2	openSUSE.Asia Summit	Yogyakarta, Indonesia
October 3 October 5	OpenMP Conference	Nara, Japan
October 4 October 6	LinuxCon Europe	Berlin, Germany
October 4 October 6	ContainerCon Europe	Berlin, Germany
October 5 October 7	International Workshop on OpenMP	Nara, Japan
October 5 October 7	Netdev 1.2	Tokyo, Japan
October 6 October 7	PyConZA 2016	Cape Town, South Africa
October 7 October 8	Ohio LinuxFest 2016	Columbus, OH, USA
October 8 October 9	Gentoo Miniconf 2016	Prague, Czech Republic
October 8 October 9	LinuxDays 2016	Prague, Czechia
October 10 October 11	GStreamer Conference	Berlin, Germany
October 11	Real-Time Summit 2016	Berlin, Germany
October 11 October 13	Embedded Linux Conference Europe	Berlin, Germany
October 12	Tracing Summit	Berlin, Germany
October 13	OpenWrt Summit	Berlin, Germany
October 13 October 14	Lua Workshop 2016	San Francisco, CA, USA
October 17 October 19	O'Reilly Open Source Convention	London, UK
October 18 October 20	Qt World Summit 2016	San Francisco, CA, USA
October 21 October 23	Software Freedom Kosovo 2016	Prishtina, Kosovo
October 22	2016 Columbus Code Camp	Columbus, OH, USA
October 22 October 23	Datenspuren 2016	Dresden, Germany

If your event does not appear here, please tell us about it.

LWN.net Weekly Edition for August 25, 2016

Some context

Forward to the present

The next 25 years

The problem space

User research and testing

Lessons learned

Security

Brief items

New vulnerabilities

cracklib2: code execution

eog: out-of-bounds write

firewalld: authentication bypass

glibc: denial of service

gnupg: flawed random number generation

kernel: use-after-free

kernel: multiple vulnerabilities

knot: denial of service

mingw-lcms2: heap memory leak

pagure: cross-site scripting

suckless-tools: screen locking bypass

xen: denial of service

Kernel development

Brief items

Kernel development news

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Networking

Security-related

Distributions

The concept

The first implementation

About developers

Brief items

Distribution News

Debian GNU/Linux

Fedora

Gentoo Linux

Ubuntu family

Other distributions

Newsletters and articles of interest

Development

Flatpak

GSK

Structured logs

Brief items

Newsletters and articles

Announcements

Brief items

Calls for Presentations

CFP Deadlines: August 25, 2016 to October 24, 2016

Upcoming Events

Events: August 25, 2016 to October 24, 2016