LWN.net Weekly Edition for November 7, 2019

Welcome to the LWN.net Weekly Edition for November 7, 2019

This edition contains the following feature content:

Digging for license information with FOSSology: an update on this venerable compliance tool.
Filesystem sandboxing with eBPF: a stacking filesystem allowing unprivileged users to set access policies.
Next steps for kernel workflow improvement: a report from a meeting held at OSS Europe.
Identifying buggy patches with machine learning: AI technology has been used to find important kernel fixes; now can it be used to avoid the need for those fixes in the first place?
Generalizing address-space isolation: separating address spaces in the kernel can increase security, but there is a cost.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Digging for license information with FOSSology

By Jake Edge
November 6, 2019

OSS EU

At Open Source Summit Europe 2019, Michael C. Jaeger and Maximilian Huber updated attendees on the FOSSology project, which is an open-source license-compliance tool. They introduced FOSSology and talked about how it can be used, but they also looked at the new features added in the last few releases. Beyond that, they presented some experiments the project has been doing with creating machine-learning models for license recognition.

FOSSology is a Linux Foundation (LF) project, Jaeger said, that started with code released by HP in 2008. It was initially a program for scanning Linux distributions for the licenses of the software they contained. The company had a lot of projects that used Linux and realized that it was scanning the same files over and over, so it came up with a server solution that would track the files that were scanned along with the licenses that were found.

For a number of years FOSSology was distributed and maintained by HP, until it became an LF project in 2015. It is easier for companies to collaborate on software in a project at an organization like the LF, he said, it makes for a safer harbor for competitors to work together—in Germany, at least. He works for Siemens AG, which is a rather large Germany company.

Breaking up archive files into their constituent files—some of which may need to be unpacked themselves—then scanning the individual source and other files for their licenses is the basic task of FOSSology. It has a powerful license scanner, he said. Its web-based interface can then give an overview of the contents—which licenses apply to various parts of the tree, for example—and allow users to drill down into the file hierarchy to the individual files to see their copyrights and license-relevant text. When looking at the file, FOSSology highlights that license-relevant text and shows a comparison with the reference text of the license it has determined for the file.

Determining the license that applies to a file is challenging, however. Files have a wide variety of license-relevant text in them, some of which is ambiguous. It depends on the kind of source code you are working with, but the scanner is unable to decide on a license for up to 30% of files it sees, so it is up to a human reviewer to tag the right license. It is then important to also track what reviewers decide on files in the FOSSology database.

The Software Package Data Exchange (SPDX) format is used to describe various things in a package, including licensing information. FOSSology can both import and export SPDX information, which allows exchanging information between two FOSSology users to share analysis work. FOSSology is one of a few tools that can consume SPDX information; it can be used to review what another party has concluded about the licensing of a code base. In addition, when a package gets updated, the previous analysis can be used as a starting point; the new dependencies and other changes can be incorporated into that rather than starting from scratch.

Releases and new features

There have been two major releases so far this year: 3.5 in April and 3.6 in September. The 3.7 release is coming, the first release candidate came out at the end of October. it is important that users' large FOSSology databases are preserved in any upgrade, so the project is careful to ensure that works before a release is done. Some other license-scanning tools have an easier job preparing for a release, Jaeger said, since they do not have databases.

One of the new features this year is a REST API that will allow FOSSology to integrate with other tools. Over time, the plan is to add to that API, but on the basis of use cases. So anyone who has a use case for compliance automation that needs additional support from FOSSology should bring it to the project, he said.

Another new feature is the OJO agent for detecting SPDX license identifiers in scanned files. Its output can be considered with other findings on the source files; if none conflict with the license it found, that license can be determined to apply to the file.

At that point, he turned things over to Huber, who started by digging in a bit more deeply on the capabilities provided by the REST API. FOSSology provides a service, he said; you upload source code to it and then scan that code. You want to be able to automate that process, so that you can script things. There is also a Python library that runs on the server, fossdriver, that can be used to do even more FOSSology operations than what is supported by the REST API. Beyond that, FOSSology is made up of individual command-line scanners and other tools that can be used standalone in various ways.

The REST API is able to handle all four steps of the typical FOSSology workflow: prepare, scan, observe, and download. For the prepare step, there are API elements for listing and creating folders as well as for uploading packages to FOSSology into a folder. Scanning can be controlled via the API; scheduling and setting options for the scan are both supported. To observe the process, there is a way to list the running jobs and retrieve their status. Lastly, there are interfaces for downloading the reports in order to view them or to integrate their output into reporting from other tools. More information can be found in the "getting started" document or the REST API documentation.

Huber then gave a "short and probably boring" demo; "boring because it's so simple". He showed the web page of an instance of SW360, which is another open-source license-compliance tool, where he added a new entry for a package to its database. He hit the "magic button" that sent that information to FOSSology, which got the code and did the scan. He switched over to looking at the FOSSology web page to see that the scanning process had completed; when he went back to SW360, it had already downloaded the report and attached it to its database entry. All of that was done using the FOSSology REST API, Huber said.

Machine learning

In the last year, the project has been looking at how machine learning could be harnessed for license identification. Normally, license identification is done by way of regular expressions and rules that are created by hand. Instead, existing FOSSology databases could provide curated data for training a machine-learning model. That model could then be used to determine the licenses that applied to new code uploaded to the program.

The first step is to identify the features in the source files that will help lead to proper conclusions of which license applies. The source code is preprocessed to extract the comments, which are cleaned up and lemmatized. The training data consisted of around 1000 license texts and 4000 license statements.

The model that was generated still has a number of problems, he said. There are some licenses that are so similar they make it difficult for the model to distinguish between them. The next improvement that the project is working on is to have a multi-stage process; the first stage would simply determine if the file is a single-license file, multi-license file, or contains no license information at all. After that, there would be a stage that could determine the license family (e.g. GPL or MIT), then a stage to distinguish between the variants within a given family.

The data set being used for training is biased, however; most real FOSSology databases have roughly 28% of the files licensed under Apache 2.0, he said. Some licenses only appear a few times in the data set, which makes it difficult to train for them. That is a problem, but the multi-stage approach helps there too. The code to build a model is available on GitHub; it requires a FOSSology installation with a populated database. It is experimental at this point, so it is not distributed with FOSSology itself.

Concluding thoughts

Huber handed the microphone back to Jaeger to wrap up the presentation. He said that FOSSology participated in the Google Summer of Code (GSoC) for 2019; the project had three GSoC participants working on various projects. FOSSology has been working on integrating with three different open-source projects as well. Software Heritage is a repository of published software, while ClearlyDefined is a repository of metadata about published software. In both cases, FOSSology has plans to interact with them via their REST APIs. The third project is not as well known, he said. Atarashi takes a new approach in scanning for licenses. Instead of using regular expressions and rules, it uses text statistics and information-retrieval techniques.

Another initiative that the project has undertaken is FOSSology Slides, which is a site for gathering slides that can be used to talk and teach about FOSSology. They are all licensed under CC BY-SA 4.0 (as are the slides [PDF] from the OSS EU talk). They can be used as is, or adapted for other uses; he encouraged anyone to contribute their FOSSology slides as well. One nice outcome of that is that some Japanese FOSSology users translated slides from FOSSology Slides to that language and contributed them back, Jaeger said. Other translations would be welcome for those who want to contribute to the project but are not software developers.

A FOSSology user in the audience pointed out that the tool is only able to analyze the code it is given, so package dependencies have to be figured out separately. Jaeger agreed, noting that FOSSology is focused on understanding the licenses in the code it is given; there are other tools that can help figure out what the dependencies are and there are no plans to add that to FOSSology. He suggested the OSS Review Toolkit (ORT) as one possibility.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend Open Source Summit Europe in Lyon, France.]

Comments (1 posted)

Filesystem sandboxing with eBPF

By Jake Edge
November 6, 2019

OSS EU

Running untrusted code in a safe manner is generally the goal of sandboxing efforts. The sandbox technique presented by Georgia Tech PhD student Ashish Bijlani at Open Source Summit Europe 2019 is no exception. He has used something of a novel scheme to allow unprivileged code to implement the sandbox policies using BPF; the policies are then enforced by the kernel.

Background

There are lots of use cases for running untrusted third-party code without risking the contents of files on the system. Two that he mentioned were web-browser plugins obtained from potentially dodgy internet sites and machine-learning code that one might like to evaluate. There is a spectrum of code that can be run, from known-good code to known-bad code; in between those is unknown, untrusted code. Known-good code can be whitelisted and known-bad code can be blacklisted, sandboxing is a technique used for that code in the middle. A sandbox is simply an isolated and controlled execution environment, he said.

Bijlani is focused on a specific type of sandbox: a filesystem sandbox. The idea is to restrict access to sensitive data when running these untrusted programs. The rules would need to be dynamic as the restrictions might need to change based on the program being run. Some examples he gave were to restrict access to the ~/.ssh/id_rsa* files or to only allow access to files of a specific type (e.g. only *.pdf for a PDF reader).

He went through some of the existing solutions to show why they did not solve his problem, comparing them on five attributes: allowing dynamic policies, usable by unprivileged users, providing fine-grained control, meeting the security needs for running untrusted code, and avoiding excessive performance overhead. Unix discretionary access control (DAC)—file permissions, essentially—is available to unprivileged users, but fails most of the other measures. Most importantly, it does not suffice to keep untrusted code from accessing files owned by the user running the code. SELinux mandatory access control (MAC) does check most of the boxes (as can be seen in the talk slides [PDF]), but is not available to unprivileged users.

Namespaces (or chroot()) can be used to isolate filesystems and parts of filesystems, but cannot enforce security policies, he said. Using LD_PRELOAD to intercept calls to filesystem operations (e.g. open() or write()) is a way for unprivileged users to enforce dynamic policies, but it can be bypassed fairly easily. System calls can be invoked directly, rather than going through the library calls, or files can be mapped with mmap(), which will allow I/O to the files without making system calls. Similarly, ptrace() can be used, but it suffers from time-of-check-to-time-of-use (TOCTTOU) races, which would allow the security protections to be bypassed.

ptrace() also suffers from high performance overhead (roughly 50%), as does the final option that Bijlani outlined: Filesystem in Userspace (FUSE). A FUSE filesystem would check all of his boxes, but it suffers from nearly 80% performance overhead. He was looking for a solution that would only add 5-10% overhead, he said.

That is what he has created with SandFS. It is a stackable filesystem that can enforce unprivileged-user-specified policies on filesystem access. A user would invoke it this way:

    $ sandfs -s sandfs.o -d /home/user /bin/bash

The sandfs binary is unprivileged; it can be run by anyone. The example above would run bash within a sandbox for accesses to the /home/user directory. The sandbox is defined by sandfs.o, which is written in C and compiled by LLVM into BPF bytecode.

He talked a bit about BPF and how it can be used, calling BPF "a key enabling technology" for SandFS. BPF maps provide a mechanism to communicate between user space and BPF programs running in the kernel; they also have a major role to play for SandFS. More details on BPF can be found in this LWN article.

Architecture

He then turned to the architecture of SandFS; there are a few different components to it, starting with the SandFS daemon and SandFS library in user space. The daemon is what the sandfs binary talks to and the library is available for those developing their own security policies. There is also a modified version of Wrapfs that is used to intercept the filesystem operations for the mounted filesystem. A set of SandFS BPF handlers are available in the kernel to implement the security checking for each of the filesystem operations that are intercepted by SandFS itself, which is the filesystem based on Wrapfs.

The basic operation is that the sandfs binary sends the BPF code to the daemon, which loads it into the kernel. If the BPF verifier does not find a problem with the code, the next step is to mount SandFS on the directory specified (/home/user in the example). Any filesystem operations will be intercepted by SandFS, which will call out to the BPF programs loaded from user space in order to get access decisions. SandFS itself does not perform I/O, it simply passes any operations that were allowed by the policies down to the lower-level filesystem (e.g. ext4 or XFS).

The policies can consult BPF maps, which can be written from user space; that allows for dynamic policies. The BPF programs passed in from user space in may look things up in the maps, such as path names, to determine whether to allow access or not; it is even possible to alter parameters to the filesystem operations based on the policies (e.g. to make all open() calls read-only). SandFS handles kernel objects, rather than parameters directly passed by user space, so it avoids any TOCTTOU problems.

In the talk, he gave two example of BPF programs that could be used to restrict access. The first would consult the BPF map for the path being used as part of the lookup() filesystem operation; if it found the path in the map, it would return -EACCES, thus providing a way for user space to restrict access to any part of the sandboxed directory. The second would look at the mode specified in open() operations, rejecting those with O_CREAT and changing the mode to O_RDONLY for the rest.

He then showed some performance numbers for a few different types of operations, comparing the time taken for them on ext4 versus SandFS. Creating a .tar.gz file of the 4.17 kernel showed the lowest overhead (4.57%, 61.05s vs. 63.84s). Decompressing and expanding the tar file had the most overhead (9.75%, 5.13s vs. 5.63s), while compiling the kernel (make ‑j 4 tinyconfig) came in at 9.28% (27.15s vs. 29.67s).

The SandFS framework could be used in a number of different ways, Bijlani said. It could restrict access to private user data such as SSH keys. It could also be used to compartmentalize certain operations of a complex application, such as a web browser; handling file and media formats could be put into separate sandboxed processes. Also, container-management systems could stack multiple layers of SandFS checks to harden the filesystem access from their containers.

He wrapped up the talk by noting the the SandFS code is available on GitHub. He has written an academic paper on it as well. In addition, he pointed to some related work that he presented at OSS North America in 2018 (slides [PDF]) and at the 2018 Linux Plumbers Conference (YouTube video).

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend Open Source Summit Europe in Lyon, France.]

Comments (24 posted)

Next steps for kernel workflow improvement

By Jonathan Corbet
November 1, 2019

OSS EU

The kernel project's email-based development process is well established and has some strong defenders, but it is also showing its age. At the 2019 Kernel Maintainers Summit, it became clear that the kernel's processes are much in need of updating, and that the maintainers are beginning to understand that. It is one thing, though, to establish goals for an improved process; it is another to actually implement that process and convince developers to use it. At the 2019 Open Source Summit Europe, a group of 20 or so maintainers and developers met in the corner of a noisy exhibition hall to try to work out what some of the first steps in that direction might be.

The meeting was organized and led by Konstantin Ryabitsev, who is in charge of kernel.org (among other responsibilities) at the Linux Foundation (LF). Developing the kernel by emailing patches is suboptimal, he said, especially when it comes to dovetailing with continuous-integration (CI) processes, but it still works well for many kernel developers. Any new processes will have to coexist with the old, or they will not be adopted. There are, it seems, some resources at the LF that can be directed toward improving the kernel's development processes, especially if it is clear that this work is something that the community wants.

Attestation

Ryabitsev's first goal didn't feature strongly at the Maintainers Summit, but is an issue that he has been concerned about for some time: improving attestation for patches so that recipients can be sure of their provenance. Currently, there is no attestation at all, so recipients have to trust that patches come from the developer whose name appears on them. We all assume that maintainers are watching carefully and can catch spoofed emails, but the truth of the matter is that it is relatively easy to sneak malicious code past a maintainer. So an attacker could conceivably find a way to add a vulnerability to the kernel.

The first problem to solve is thus, according to Ryabitsev, to fix attestation. Linus Torvalds does verify the signed tags that are associated with pull requests, he said, so that part of the process is taken care of. But there are no signatures on actual patches, and no consensus on how they might be added.

His proposal is to introduce signatures on emailed patches as well. The mechanism used would be minisign, not GnuPG; one of the big advantages of minisign is that the attached signatures are much shorter than those created by GnuPG. Steve Rostedt interrupted at this point to question the value of this approach; he said that an attack, to be successful, would have to involve a relatively complex patch written in a style that mimics that of the purported author. It would be a big effort, he said; anybody with the resources to do that could also crack the encryption scheme used for attestation.

Ryabitsev responded, though, that minisign is "real cryptography" and not easy to crack; there are far easier ways to get bad code into the kernel than breaking the encryption. The hard part with this scheme, instead, is with identity tracking. GnuPG, like PGP before it, is based on the "web of trust" idea, but the web of trust has proved to be mostly unworkable over the years and people are giving up on it. Newer schemes tend to be based, like SSH, on a "trust on first use" (or TOFU) model, where a new key is trusted (and remembered) when it is first encountered, but changes in keys require close scrutiny. He suggested using a TOFU approach in an attestation mechanism for Git as well.

Rafael Wysocki was also skeptical, asserting that this scheme does not solve the problem; it only moves it elsewhere. An attacker could create an identity and build trust over time before submitting something malicious; the proposed scheme adds complexity but doesn't really fix anything, he said. Ryabitsev disagreed, though; building trust requires time and resources, but an attacker could spoof a trusted developer now.

Frank Rowand asked whether maintainers would be expected to strip signatures before committing patches. The signature, Ryabitsev answered, would go below the "‑‑‑" line in the changelog, so it would be automatically stripped at commit time. But the key used would also be noted in a local database and verified the next time a patch shows up from the same developer. Rostedt pointed out that one-time submitters would not have a key in this database; Ryabitsev replied that, since those developers are not coming back, it doesn't really matter. This scheme is about trusting ongoing developers.

He would like minisign-based attestation to become part of Git; tools like git format-patch would just add it automatically. Rowand pointed out that a lot of developers use relatively old versions of Git, so it would take years to roll this capability out to everybody. He said that GnuPG should be used instead; developers have it and the kernel's web of trust already exists. But Ryabitsev said that GnuPG is a poor tool for signing patches; the attached signature is often larger than the patch itself, and list archival mechanisms tend to strip it out. To be truly useful, signatures on patches need to be unobtrusive.

Like much of what was discussed in this meeting, signature use would be opt-in, at least initially. Ryabitsev is thinking about writing a bot that would watch the mailing lists and gently suggest to developers who are not using signatures that they might want to start. He asked the group whether this scheme as a whole was a good idea and got almost universal agreement (Rowand being the exception). So he intends to try to get the needed support added to Git.

Base-tree information

A common question asked of patch submitters is: "which tree was this made against?". That information is often needed to successfully apply a patch, and CI systems need it to be able to do automatic testing. But that "base-tree information" is not included with patches currently; fixing that is high on many developers' wish lists. Dmitry Vyukov asked whether it would be better to add this feature to Git and wait for it to be adopted, or to create a wrapper script that developers could use now. It turns out, though, that the ‑‑base option works in Git now, it's just a matter of getting submitters to use it. Vyukov agreed that this is the hardest part; he suggested creating a wrapper that would supply this option automatically.

There was a bit of a side discussion on whether Torvalds would complain about the base-tree information, as he does when tags like Change-id show up in patches. The problem, though, is not really the extra tag, it's the perceived uselessness of the information. If the base-tree information is useful, there should not be complaints.

It was pointed out that the base-tree information might not always be helpful to others; that base could be in a private tree, for example. At other times, though, it could be useful indeed. Rostedt pointed out that the "tip" tree used for x86 (and beyond) maintenance has a dozen or so branches in it; knowing which branch a patch applies to would be helpful. Everybody seemed to agree that this information should be there, and that the checkpatch.pl script should be made to check for it. There may eventually be a bot to nag developers who omit this information from their patches, but care would have to be taken to prevent it from generating too much noise.

Beyond email

For a number of reasons, requiring all kernel patches to be sent by email looks like a policy with a limited future. Switching to a "forge" service, along the lines of GitHub or GitLab, is an idea without universal appeal, though, especially in the short term. But there is desire for a solution that could let some developers move beyond email while maintaining the current workflow overall. The first step in that direction is likely to be some sort of Git-to-email bridge. Ryabitsev pointed out, though, that there is no consensus on what such a bridge might look like.

One option could be a special Git repository that developers could push to; any patch series pushed there would be turned into a series of emails and sent to the appropriate addresses. Ryabitsev does not like that idea, though; any such system would be a central point of failure that could go down at inopportune times. Another option would be some sort of web service that could be pointed at a public repository; once again, it would generate an email series and submit it. This solution falls down in another way, though: it is unable to support attestation. A third alternative is to create a command-line tool that can turn a pull request into an emailed series.

There are a number of hard problems to be solved here, he said, with many tradeoffs to be considered. But the easiest solution appears to be the command-line tool, perhaps integrated with an tool like GitGitGadget. There is also a tool under development at sourcehut that is worth a look. He might support such a tool by exposing an SMTP service specifically for mailing patches to kernel.org addresses.

That led to the concept of "feeds" — services that provide access to patches and more. The lore.kernel.org service has been running for a while now; it has quickly become an indispensable part of the kernel development process. Ryabitsev would, though, like to create something with similar functionality that does not need a mailing list behind it. Developers could use it to create their own patch feeds; CI systems could also export feeds describing the tests they have run and the results. Then it would be possible to, for example, automatically annotate patches with data on how they have been tested and by who. Bots could use this information to figure out which tests they should run, avoiding those that have already been run elsewhere. Feeds would be archived and mirrored so they could be checked years in the future. Feeds would be able to support attestation, record Acked-by tags, and more.

But that still leaves the problem of actually creating all of this tooling and making it easy to use. Nobody is going to want all of these feeds in their inbox, so it will have to be possible to establish filters. Size also matters: lore.kernel.org currently requires about 200GB of disk space, which is a bit unwieldy to download to one's laptop. But lore contains a lot of ancient history that developers will not normally need, so the database could be much smaller.

Ryabitsev is currently working with the maintainer of public-inbox on the development of some of these tools. There is, he said, some development time that is available at the LF over the next six months; what should he aim to achieve in that time? Building something with Docker would be convenient for many, but the "old-school developers" don't want to deal with Docker. Should it be a command-line or web-based tool? Fans of command-line tools tend to be more vocal, but that does not mean that they are a majority.

Perhaps, he said, the way to start would be to make it easy to set up a local Patchwork instance. There was a wandering discussion on how subsystems with group maintainership could be supported, but that is not a problem that can be solved in the next six months, he said. Further discussion on how the tools should be developed was deferred to the kernel workflows mailing list.

As time ran out there was some quick discussion of CI systems, including GitLab, Gerrit, and more. The kernel clearly needs more CI testing, so Ryabitsev wants to be sure that it is all integrated into any new tooling. He would like to be able to provide a feed describing what each of these systems is doing. These forge systems mostly provide an API for event data now; what is needed is a set of translator bots that could pull those events together into a public-inbox feed for anybody who is interested. CI systems would be able to consume this data, and others could follow it without having to have an account on each CI system.

The emails sent by CI systems now are just noise to many recipients, he said; as more of these systems come online that problem will get worse. Creating a feed solves the problem by putting CI reports where only the people who want them have to see them. It is a hard thing to do well, he said, and he is not sure how his solution will work, but he wants to try. Email is a terrible way to integrate with systems that need structured data, so he's looking to replace the email message bus with a more structured, feed-based system.

The session broke up with a statement that, if the community asks for this kind of tooling, there is a real possibility that the LF will find a way to fund its development.

See also: Han-Wen Nienhuys's notes from the meeting.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to the event.]

Comments (22 posted)

Identifying buggy patches with machine learning

By Jonathan Corbet
November 4, 2019

OSS EU

The stable kernel releases are meant to contain as many important fixes as possible; to that end, the stable maintainers have been making use of a machine-learning system to identify patches that should be considered for a stable update. This exercise has had some success but, at the 2019 Open Source Summit Europe, Sasha Levin asked whether this process could be improved further. Might it be possible for a machine-learning system to identify patches that create bugs and intercept them, so that the fixes never become necessary?

Any kernel patch that fixes a bug, Levin began, should include a tag marking it for the stable updates. Relying on that tag turns out to miss a lot of important fixes, though. About 3-4% of the mainline patch stream was being marked, but the number of patches that should be put into the stable releases is closer to 20% of the total. Rather than try to get developers to mark more patches, he developed his machine-learning system to identify fixes in the mainline patch stream automatically and queue them for manual review.

This system uses a number of heuristics, he said. If the changelog contains language like "fixes" or "causes a panic", it's likely to be an important fix. Shorter patches tend to be candidates. Another indicator is the addition of code like:

    if (x == NULL)
        return -ESOMETHING;

In the end, it does turn out to be possible to automatically identify a number of fixes. But if that can be done, could it be possible to use a similar system to find bugs? That turns out to be a harder problem. Levin complained that nobody includes text like "this commit has a bug" or "this will crash your server" in their changelogs — a complaint that Andrew Morton beat him to by many years. Just looking at code constructs can only catch the simplest bugs, and there are already static-analysis systems using that technique. So he needed to look for something else.

That "something else" turns out to be review and testing — or the lack thereof. A lot can be learned by looking at the reviews that patches get. Are there a lot of details in the review? Is there an indication that the reviewer actually tried the code? Does it go beyond typographic errors? Sentiment analysis can also be used to get a sense for how the reviewer felt about the patch in general.

Not all reviewers are equal, so this system needs to qualify each reviewer. Over time, it is possible to conclude how likely it is that a patch reviewed by a given developer contains a bug. The role of the reviewer also matters; if the reviewer is a maintainer of — or frequent contributor to — the relevant subsystem, their review should carry more weight.

A system can look at how long any given patch has been present in linux-next, how many iterations it has been through, and what the "quality" of the conversation around it was. Output from automated testing systems has a place, but only to an extent; KernelCI is a more reliable tester for ARM patches, but the 0day system is better for x86 patches. Manual testing tends to be a good indicator of patch quality; if a patch indicates that it has been tested on thousands of machines in somebody's data center for months, it is relatively unlikely to contain a bug.

Then, one can also try to look at code quality, but that is hard to quantify. Looking at the number of problems found in the original posting of a patch might offer some useful information. But Levin is unsure about how much can be achieved in this area.

Once the data of interest has been identified, it is necessary to create a training set for the system. That is made up of a big pile of patches, of course, along with a flag saying whether each contains a bug or not. The Fixes tags in patches can help here, but not all bugs really matter for the purposes of this system; spelling fixes or theoretical races are not the sort of problem he is looking for. In the end, he took a simple approach, training the system on patches that were later reverted or which have a Fixes tag pointing to them.

That led to some interesting information about where and when bugs are introduced. He had thought that bugs would generally be added during the merge window, then fixed in the later -rc releases, but that turned out to be wrong. On a lines-of-code basis, a patch merged for one of the late -rc releases is three times more likely to introduce a bug than a merge-window patch.

Patches queued for the merge window, it seems, are relatively well tested. Those added late in the cycle, instead, are there to fix some other problem and generally get much less testing — or none at all. Levin said that things shouldn't be this way. There is no reason to rush fixes late in the development cycle; nobody runs mainline kernels in production anyway, so it is better to give those patches more testing then push them into the stable updates when they are really ready. Developers should, he said, trust the system more and engage in less "late-rc wild-west stuff".

Levin complained to Linus Torvalds about this dynamic; Torvalds agreed with the explanation but said that the system was designed that way. Late-cycle problems tend to be more complex, so the fixes will also be more complex and more likely to create a new bug. Levin agreed that this is the case, but disagreed with the conclusion; he thinks that the community should be more strict with late-cycle patches.

Back to the machine-learning system, he said that he is currently using it to flag patches that need more careful review; that has enabled him to find a number of bugs in fixes that were destined for stable updates. Parts of this system are also being used to qualify patches for the stable releases. The goal of detecting buggy patches in general still seems rather distant, though.

Levin concluded with some thoughts on improving the kernel development process. The late-rc problem needs to be addressed; we know there is a problem there, he said, so we should do something about it. Testing of kernel patches needs to be improved across the board; the testing capability we have now is rather limited. More testing needs to happen on actual hardware to be truly useful. He would also like to see some standardization in the policy for the acceptance of patches, including how long they should be in linux-next, the signoffs and reviews needed, etc. These policies currently vary widely from one subsystem to the next, and some maintainers seem to not care much at all. That, he said, is not ideal and needs to be improved.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to the event.]

Comments (17 posted)

Generalizing address-space isolation

By Jonathan Corbet
November 5, 2019

OSS EU

Linux systems have traditionally run with a single address space that is shared by user and kernel space. That changed with the advent of the Meltdown vulnerability, which forced the merging of kernel page-table isolation (KPTI) at the end of 2017. But, Mike Rapoport said during his 2019 Open Source Summit Europe talk, that may not be the end of the story for address-space isolation. There is a good case to be made for increasing the separation of address spaces, but implementing that may require some fundamental changes in how kernel memory management works.

Currently, Linux systems still use a single address space, at least when they are running in kernel mode. It is efficient and convenient to have everything visible, but there are security benefits to be had from splitting the address space apart. Memory that is not actually mapped is a lot harder for an attacker to get at. The first step in that direction was KPTI. It has performance costs, especially around transitions between user and kernel space, but there was no other option that would address the Meltdown problem. For many, that's all the address-space isolation they would like to see, but that hasn't stopped Rapoport from working to expand its use.

Beyond KPTI

Recently, he tried to extend this idea with system-call address-space isolation, which implemented a restricted address space for system calls. When a system call is invoked, most of the code and data space within the kernel is initially unmapped and inaccessible; any access to that space will generate a page fault. The kernel can then check to determine whether it thinks the access is safe; if so, the address in question is mapped, otherwise the calling process will be killed.

There are potentially a few use cases for this kind of protection, but the immediate objective was to defend against return-oriented programming (ROP) attacks. If the target of a jump matches a known symbol in the kernel, the jump can be considered safe and allowed to proceed; the page containing that address will be mapped for the duration of the call. ROP attacks work by jumping into code that is not associated with a kernel symbol, so most of them would be blocked by this mechanism. Mapping the occasional page for safe references will make some code available to ROP attacks again, but it will still be a fraction of the entire kernel text (which is available in current kernels).

These patches have not been merged, though, for a number of reasons. One is that nobody really knows how to check data accesses for safety; the known-symbol test used for text is not applicable to data. A system call with invalid parameters can still result in mapping a fair amount of code, making ROP gadgets available to an attacker. This patch also slowed execution considerably, which always makes acceptance harder.

The L1TF and MDS speculative-execution vulnerabilities bring some challenges of their own. In particular, they allow a host system to be attacked from guests, and are thus frightening to cloud providers. The defense, for now, is to disable hyperthreading, but that can have a significant performance cost. A better solution, Rapoport said, might be another form of address-space isolation; in this case, it would be a special kernel mapping used whenever control passes into the kernel from a guest system. This "KVM isolation" mechanism was posted by Alexandre Chartre in May, but has not been merged.

Other address-space isolation ideas are also circulating. One of these would be to map specific kernel data only for the process that needs to access it. That would be done by setting up a private range in the kernel page tables. Kernel code could allocate memory in this space with new functions like kmalloc_proclocal(). For extra isolation, memory allocated in this way would be removed from the kernel's "direct map", which is a linear mapping of all of the system's physical memory. Taking pages out of the direct map has its own performance issues, though, since it requires breaking up huge pages into small pages.

Then, there are user-exclusive mappings — user-space mappings that are only visible to the owning process. These could be used to store secrets (cryptographic keys, for example) in memory where they could not be (easily) accessed from outside. Once again, this memory would be removed from the direct map; it would also not be swappable. The MAP_EXCLUSIVE patch series implementing this behavior was posted recently.

Finally, Rapoport also mentioned namespace-based isolation: kernel memory that is tied to a specific namespace and which is only visible to processes running within that namespace. This turns out to get tricky when dealing with network namespaces, though. The sk_buff structures used to represent packets would be obvious candidates for isolation, but they also often cross namespace boundaries.

Generalizing address-space isolation

While each of the address-space isolation mechanisms described above is different, there are some common factors between all of them. They are all concerned with creating a restricted address space from existing memory, then making this space available when entering the proper execution context. So Rapoport is working on an in-kernel API to support address-space isolation mechanisms in general. That is going to require some interesting changes, though.

The kernel's memory-management code currently assumes that the mm_struct structure attached to a process is equivalent to that process's page tables, but that connection will need to be broken. A new pg_table structure will need to be created to represent page tables; there will also be an associated API to manipulate these page tables. A particular challenge will be the creation of a mechanism that can safely free kernel page tables.

Creating the restricted contexts is, instead, relatively easy. Some, like KPTI, are set up at boot time; others will need to be established at the right time: process creation, association with a namespace, etc. The context-switch code will need to be able to switch between restricted address spaces; again, switching the kernel's page tables is likely to be tricky. There will need to be code to free these restricted address spaces as well, with appropriate care taken to avoid the inconvenience that would result from, say, freeing the main kernel page tables.

Once the infrastructure is in place, the kernel will need to gain support for private memory allocations. Functions like alloc_page() and kmalloc() will need to gain awareness of the context into which memory is being allocated; there will be a new __GFP_EXCLUSIVE flag to request an allocation into a restricted context. Once again, pages so allocated will need to be removed from the kernel's direct mapping (and return once they are freed). Extra care will need to be taken with objects that need to cross context boundaries.

Finally, the slab caches will also need to be enhanced to support this behavior. Some of the necessary mechanism is already there in the form of the caching used by the memory controller. Slab memory is often freed from a context other than the one in which it was allocated, though, leading to a number of potential traps.

Rapoport concluded by stating that address-space isolation needs to be investigated; it offers a way of significantly reducing the kernel's attack surface, even in the presence of hardware bugs. Whether the security gained this way justifies the extra complexity required to implement it is something that will have to be evaluated as the patches take shape. Expect to see some interesting patches on the mailing lists in the near future as this work is developed.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to the event.]

Comments (16 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: RHEL 8.1; Git 2.24; Python 12-month releases; Quotes; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>