LWN.net Weekly Edition for April 8, 2021

Welcome to the LWN.net Weekly Edition for April 8, 2021

This edition contains the following feature content:

Resurrecting DWF: another attempt at making a CVE system that works for the open-source community.
Scanning for secrets: preventing credentials from being exposed in source repositories.
The multi-generational LRU: a fundamental change to the kernel's page-replacement mechanism promises better decisions and reduced CPU consumption.
The future of GCC plugins in the kernel: the plugin mechanism is unloved, but there is no replacement for it yet.
Killing off /dev/kmem: a dangerous kernel interface may finally be on its way out.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Resurrecting DWF

By Jake Edge
April 7, 2021

Five years ago, we looked at an effort to assist in the assignment of Common Vulnerabilities and Exposures (CVE) IDs, especially for open-source projects. Developers in the free-software world have often found it difficult to obtain CVE IDs for the vulnerabilities that they find. The Distributed Weakness Filing (DWF) project was meant to reduce the friction in the CVE-assignment process, but it never really got off the ground. In a blog post, Josh Bressers said that DWF was hampered by trying to follow the rules for CVEs. That has led to a plan to restart DWF, but this time without the "yoke of legacy CVE".

A bit of history

The CVE system was started in 1999 as a way to track and provide identification numbers for vulnerabilities. It is far easier and more robust to talk about CVE-xxxx-yyyy, rather than "that buffer overflow in the xyzzy project", especially when "that buffer overflow" might refer to any of several (or more) vulnerabilities. CVE IDs provide a specificity that was largely missing prior to their introduction; these days, of course, the highest profile bugs also have catchy names, web sites, logos, stickers, and branded shot glasses.

By the time DWF was introduced in 2016, there were lots of complaints about problems with the CVE system, though there was no real consensus on what should be done to fix it. There were a number of suggestions for ways to improve the CVE system; one relatively concrete approach was DWF, which was announced by Kurt Seifried:

I have increasingly noticed problems with Mitre's handling of the CVE database. This has come to a head now that I have multiple, confirmed, public reports of security researchers being unable to get CVE numbers assigned to them in a timely manner, if at all. As such the solution is simple:
We need a distributed, scale out method for assigning vulnerability identifiers that is as compatible with the existing CVE system as possible. Not just in terms of format but in terms of process and usage. As such I took on the task, creating the DWF system and getting a number of other people involved (Larry Cashdollar, Zachary Wikholm, Josh Bressers, etc.). My goal is to create a simple system for assigning vulnerability identifiers that relies on the community and not a single entity or organization. Additionally I want to reduce the time and effort needed to get identifiers, something best achieved by pushing assigning out to as close to the vulnerability discover/handling as possible.

The complaints eventually led the CVE maintainers to change the process for requesting (and receiving) a CVE ID. Before the change announcement, projects could request a CVE by posting to the oss-security mailing list; a CVE-request web form was set up for projects to use instead.

Even today, it is not hard to find examples of CVE requests going unanswered for two weeks or more. While that may be irritating and is definitely sub-optimal, Bresser points to a bigger problem in his post: CVEs are not keeping pace with the number of vulnerabilities. Based on data from his cve-analysis tools, the number of CVEs assigned was fairly flat for a few years until 2017, which is when the CVE-request web form was added:

We can see a noticeable uptick in 2017. For most of us paying attention this was what we were waiting for, the CVE project was finally going to start growing!
And then it didn't.

Since that spike in 2017, CVE assignments have been flat or even declining slightly, but no one would seriously argue that this reflects the number of actual vulnerabilities. For part of the explanation of why, Bressers pointed to a 2020 article that looked at the role played by CVE Numbering Authorities (CNAs), which are companies and other organizations that get blocks of CVE IDs that they can issue for vulnerabilities in their projects and products. The number of CNAs has expanded over the years, but slowly; by 2016, there were only 22 CNAs, while today there are 159.

The point in that article is that the CNA system lets a CNA decide what should get an ID. This is sort of how security worked in 1999 when security researchers were often treated as adversaries. This is no longer the case, yet researchers have no easy way to request and use their own CVE IDs. If you report a vulnerability to an organization, and they decide it doesn't get a CVE ID, it's very difficult to get one yourself. So many researchers just don't bother.

A DWF reboot

So, after a while, it became clear that the CVE situation was not getting any better, and that DWF was not solving those problems either. Bressers, Seifried, and others rethought things, which is what led to a new version of DWF:

The idea was to make CVE data work more like open source. We can call that DWF version 1. DWF version 1 didn't work out. There are many reasons for that, but the single biggest is that DWF tried to follow the legacy CVE rules which ended up strangling the project to death. Open source doesn't work when buried under artificial constraints. Open source communities need to be able to move around and [breathe] so they can thrive and grow.

The intent is for DWF to get to "100% automated CVE IDs", which has not quite happened, though you should "be able to get a CVE ID in less than ten seconds". There is a form at https://iwantacve.org, which requires a GitHub account to sign in with, then:

You enter a few details into a web form, and you get a candidate, or CAN ID. A human then double checks things, approves it, then the bot flips it to a DWF CVE ID assuming it looks good. Things generally look good because the form makes it easy to do the right thing. And this is just version one of the form! It will keep getting better. If you submit IDs a few times you will get added to the allowlist and just get a DWF CVE ID right away skipping the CAN step. Long term there won't be any humans involved because humans are slow, need to sleep, and get burnt out.

The project has carved off a piece of the CVE-ID namespace to issue its IDs, though that piece is far removed from the five-digit CVE IDs that the CNAs issue. The first identifier issued is CVE-2021-1000000, which was tracked via this issue in the DWF GitHub repository. There is some amount of controversy over the use of the normal CVE identifier format, but that is by design; the project thinks that the term "CVE" has become a generic term that simply means "vulnerability". Meanwhile, DWF believes that it makes sense to have all of the IDs for vulnerabilities live in the same namespace.

The CVE project seems to feel differently, as it put out a tweet disavowing CVE IDs that do not come from the CNAs. CVE board member Tod Beardsley also filed a pull request to change the identifiers to DWF-xxxx-yyyyy, which "will disambiguate vulnerability identifiers sourced from the DWF project from those produced by the federation of CVE Numbering Authorities, and avoid any confusion in downstream users of these identifiers". Seifried closed the request, quoting Bressers's comment on a separate suggestion for a different ID name:

CVE now means "vulnerability" in the same way a tissue is a kleenex. If we create a "new" naming scheme we end up with https://xkcd.com/927/ But if we reuse an existing naming scheme, there isn't an increase in identifiers names.

Bressers also noted that by choosing a range in the ID space starting at one million, there should be no logistical problems with numbering collisions: "possibly ever, but at least for decades".

The DWF project is broken up into three separate pieces, each with its own repository. The dwflist repository holds the JSON version each CVE that has been issued. At the time of this writing, there are eight entries from several different reporters, though no new CVEs have been assigned since March 18.

The dwf-workflow repository is the place "where conversations are meant to be held". It is currently a GitHub repository, but that could change depending on where the community wants to take things. It has the FAQ for the project, along with some other documentation. In addition, some conversations do seem to be getting started in the issues. Bressers described the workflow repository this way:

Everything is meant to happen in public. This is the best place to start out if you want to help. Feel free to just jump in and create issues to ask questions or make suggestions.

Finally, there is the inevitable dwf-request code repository, which holds the Node.js code for the web form and a Python bot for actually assigning IDs based on the GitHub issues that get created from the form. "Neither is spectacular code. It's not meant to be, it's an open source project and will get better with time. It's good enough."

The intent of DWF is clear: to have a community-driven process for assigning IDs to vulnerabilities, rather than the largely corporate-driven process that exists today. The community being targeted is explicitly including security researchers, who may not be well-represented in the CVE project. It would also include those interested in ensuring that vulnerability tags are not being haphazardly applied based on the commercial interests of those doing the tagging—community Linux distributions, for example. Whether there is enough "push" from those groups to sustain an approach like DWF remains to be seen, however.

Comments (41 posted)

Scanning for secrets

By Jake Edge
April 7, 2021

Projects, even of the open-source variety, sometimes have secrets that need to be maintained. They can range from things like signing keys, which are (or should be) securely stored away from the project's code, to credentials and tokens for access to various web-based services, such as cloud-hosting services or the Python Package Index (PyPI). These credentials are sometimes needed by instances of the running code, and some others benefit from being stored "near" the code, but these types of credentials are not meant to be distributed outside of the project. They can sometimes mistakenly be added to a public repository, however, which is a slip that attackers are most definitely on the lookout for. The big repository-hosting services like GitHub and GitLab are well-placed to scan for these kinds of secrets being committed to project repositories—and they do.

Source-code repositories represent something of an attractive nuisance for storing this kind of information; project developers need the information close to hand and, obviously, the Git repository qualifies. But there are a few problems with that, of course. Those secrets are only meant to be used by the project itself, so publicizing them may violate the terms of service for a web service (e.g. Twitter or Google Maps) or, far worse, allow using the project's cloud infrastructure to mine cryptocurrency or allow anyone to publish code as if it came from the project itself. Also, once secrets get committed and pushed to the public repository, they become part of the immutable history of the repository. Undoing that is difficult and doesn't actually put the toothpaste back in the tube; anyone who cloned or pulled from the repository before it gets scrubbed still has the secret information.

Once a project recognizes that it has inadvertently released a secret via its source-code repository, it needs to have the issuer revoke the credential and, presumably issue a new one. But there may be a lengthy window of time before the mistake is noticed; even if it is noticed quickly, it may take some time to get the issuer to revoke the secret. All of that is best avoided, if possible.

Over the years, there have been various problems that stemmed from credentials being committed to Git repositories and published on GitHub. An article from five years ago talks about a data breach at Uber using Amazon Web Services (AWS) credentials that were mistakenly committed at GitHub; a much larger, later breach used stolen credentials to access a private GitHub repository that had additional AWS tokens. The article also points to a Detectify blog post describing how the company found Slack tokens by scanning GitHub repositories; these kinds of problems go further back than that, of course. A 2019 paper [PDF] shows that the problem has not really abated, which is no real surprise.

GitHub has been scanning for secrets since 2015; it began by looking for its own OAuth tokens. In 2018, the company expanded its scanning to look for other types of tokens and credentials:

Since April, we’ve worked with cloud service providers in private beta to scan all changes to public repositories and public Gists for credentials (GitHub doesn’t scan private code). Each candidate credential is sent to the provider, including some basic metadata such as the repository name and the commit that introduced the credential. The provider can then validate the credential and decide if the credential should be revoked depending on the associated risks to the user or provider. Either way, the provider typically contacts the owner of the credential, letting them know what occurred and what action was taken.

These days, GitHub has a long list of secret types that it scans for, which are listed in its secret-scanning documentation. When it finds matches in new commits (or in the history of newly added repositories), it contacts the credential issuer via an automated HTTP POST to an issuer-supplied URL; the issuer can check the validity of the secret and determine what actions to take. Those could include revocation, notification of the owner of the secret, and possibly issuing a replacement secret.

GitHub actively solicits service providers to join the program. In order to do so, they need to set up an endpoint to receive the HTTP POST and provide GitHub with a regular expression to be used to look for matches. In order to eliminate attackers misusing the scanning-message URL, the messages sent to the endpoint are signed with a GitHub key that can (should) be verified before processing the "secret revealed" POST.

A recent note on the GitHub blog announced the addition of PyPI tokens to the secret-scanning feature. GitHub and PyPI teamed up to help protect projects from these kinds of mistakes:

From today, GitHub will scan every commit to a public repository for exposed PyPI API tokens. We will forward any tokens we find to PyPI, who will automatically disable them and notify their owners. The end-to-end process takes just a few seconds.

GitHub is not alone; GitLab also added secret scanning as part of its 11.9 release in 2019. Since much of the code that underlies GitLab is open source, the code for the "secret detection" feature is available in a GitLab repository. GitLab is an "open core" project, so much of its code is available, unlike GitHub, which is proprietary. So far, at least, it would not appear that the fully open-source repository-hosting service SourceHut has implemented a similar feature.

The GitLab scanner is based on the Gitleaks tool, which will scan a Git repository for various regular expressions stored in a TOML configuration file. It is written in Go and can be run in a number of different ways, including on local files that have not yet been committed. Doing so regularly could potentially prevent the secrets from ever getting committed at all, of course.

The GitLab scanning documentation has a list of what kinds of secrets it looks for, which is shorter than GitHub's list, but does include some different types of secrets. GitLab's scanning looks for things like SSH and PGP private keys, passwords in URLs, and US Social Security numbers. The gitleaks.toml file shows a few more scanning targets that have not yet made it onto the list, including PyPI upload tokens.

It is better, of course, if secrets never actually make it into a repository at all. Second best would be to catch them on the committer's system before they have pushed their changes to the central repository; it may be somewhat painful to do, but the offending commit(s) can be completely removed from the history at that point via a rebase operation. Either of those require some kind of local scanning (perhaps with Gitleaks) that gets run as part of the development process. Having a backstop at the repository-hosting service, though, undoubtedly helps give projects some peace of mind.

Comments (12 posted)

The multi-generational LRU

By Jonathan Corbet
April 2, 2021

One of the key tasks assigned to the memory-management subsystem is to optimize the system's use of the available memory; that means pushing out pages containing unused data so that they can be put to better use elsewhere. Predicting which pages will be accessed in the near future is a tricky task, and the kernel has evolved a number of mechanisms designed to improve its chances of guessing right. But the kernel not only often gets it wrong, it also can expend a lot of CPU time to make the incorrect choice. The multi-generational LRU patch set posted by Yu Zhao is an attempt to improve that situation.

In general, the kernel cannot know which pages will be accessed in the near future, so it must rely on the next-best indicator: the set of pages that have been used recently. Chances are that pages that have been accessed in the recent past will be useful again in the future, but there are exceptions. Consider, for example, an application that is reading sequentially through a file. Each page of the file will be put into the page cache as it is read, but the application will never need it again; in this case, recent access is not a sign that the page will be used again soon.

The kernel tracks pages using a pair of least-recently-used (LRU) lists. Pages that have been recently accessed are kept on the "active" list, with just-accessed pages put at the head of the list. Pages are taken off the tail of the list if they have not been accessed recently and placed at the head of the "inactive" list. That list is a sort of purgatory; if some process accesses a page on the inactive list, it will be promoted back to the active list. Some pages, like those from the sequentially read file described above, start life on the inactive list, meaning they will be reclaimed relatively quickly if there is no further need for them.

There are more details, of course. It's worth noting that there are actually two pairs of lists, one for anonymous pages and one for file-backed pages. If memory control groups are in use, there is a whole set of LRU lists for each active group.

Zhao's patch set identifies a number of problems with the current state of affairs. The active/inactive sorting is too coarse for accurate decision making, and pages often end up on the wrong lists anyway. The use of independent lists in control groups makes it hard for the kernel to compare the relative age of pages across groups. The kernel has a longstanding bias toward evicting file-backed pages for a number of reasons, which can cause useful file-backed pages to be tossed while idle anonymous pages remain in memory. This problem has gotten worse in cloud-computing environments, where clients have relatively little local storage and, thus, relatively few file-backed pages in the first place. Meanwhile, the scanning of anonymous pages is expensive, partly because it uses a complex reverse-mapping mechanism that does not perform well when a lot of scanning must be done.

Closing the generation gap

The multi-generational LRU patches try to address these problems with two fundamental changes:

Add more LRU lists to cover a range of page ages between the current active and inactive lists; these lists are called "generations".
Change the way page scanning is done to reduce its overhead.

Newly activated pages are assigned to the youngest generation (though there are some exceptions described below). Over time, the memory-management subsystem will scan over a process's pages to determine whether each has been used since the last scan; any that have remained idle are moved to the next older generation. Pages of any generation that show activity are moved back to the youngest generation.

The result of this work is a spectrum of page ages, from those quite recently accessed to those that have not been used in some time. The number of generations can be configured into the kernel; that number seems to be as small as four for phones to several times that for cloud-based servers.

When the time comes to reclaim pages, only the oldest generation need be considered. The "oldest generation" can be different for anonymous and file-backed pages; anonymous pages can be harder to reclaim in general (they must always be written to swap) and the new code retains some of the bias toward reclaiming file-backed pages more aggressively. So file-backed pages may not escape reclaim for as many generations as anonymous pages do. The current patch only allows reclaim of file-backed pages to get one generation ahead of that for anonymous pages, though.

The multi-generational mechanism, it is claimed, is more accurate than the current two-list approach; by the time a page makes it to the oldest generation, its chances of being unneeded are rather higher than they are for pages on the inactive list. That, in turn, means that these pages can be reclaimed more aggressively, making more memory available for tasks that will actually make use of it. This mechanism allows for ready comparison of the ages of anonymous and file-backed pages, and, by tracking the creation time of each generation, of the ages of pages in different control groups; this information is lost in current kernels. That, in turn, makes it easier to identify and reclaim idle anonymous pages.

The other claimed advantage is in the change to how pages are scanned. Pages are accessed via the page-table entries (PTEs) in every process that has them mapped; the "recently accessed" bit lives in those page-table entries. Current kernels, though, scan through the pages themselves, and must use reverse-mapping to find and check the associated PTEs; that is expensive. The multi-generational LRU code, instead, scans over PTEs directly, an approach with better locality. A hook in the scheduler helps to track processes that have actually run since the last scan, so idle processes can be skipped.

The multi-generational LRU also benefits from skipping many of the heuristics that are used in current kernels to decide which pages should be reclaimed. There are still a few, though. For example, when a page is first established, its generation is picked with these rules:

Pages that are being faulted in are assigned to the youngest generation, as one would expect.
The activation of pages that are unmapped (pages resident in memory but with no PTEs pointing to them; these can include pages chosen for reclaim but not actually reclaimed before being referenced again) are added to the second-youngest generation. This is seemingly done to avoid making the youngest generation look too big, which might delay further page scanning until the next generation can be created.
Pages that are being reclaimed, but which must persist while their contents are written to backing store, are added to the second-oldest generation. That prevents another attempt to reclaim them while the writeback is underway.
Pages that are being deactivated go into the oldest generation. That is also the fate of pages that were brought in by the readahead mechanism; reading those pages is a speculative act on the kernel's part in the first place, with no guarantee that they will ever be useful.

There are a few knobs exported to user space to control this mechanism, including the ability to turn the multi-generational code off entirely; see this documentation patch for more information.

Generational change

The end result of all this work, it is claimed, is that page reclaim is much more efficient and better targeted than before. Systems like Android, when using this code, record fewer low-memory kills (when an app process is killed due to memory pressure), Chrome OS shows fewer out-of-memory kills, and server systems are better able to use available memory. It looks like an improvement all around.

Given that, one might wonder why the multi-generational algorithm is kept separate from the rest of the memory-management code and is made optional. It is, in essence, an independent approach to page aging and reclaim that exists alongside the current LRU lists. The answer, presumably, is that there are a lot of workloads out there, and some of them may not benefit from the multi-generational approach. There will need to be a lot more testing done to learn where the multi-generational LRU falls down and what might need to be done to keep that from happening.

The multi-generational LRU might eventually win over the memory-management developers, most of whom have not yet commented on this patch set. It does seem likely, though, that it will need to demonstrate better performance (or at least a lack of performance regressions) across the bulk of the workloads out there, to the point that it could be considered as a replacement for the current LRU rather than an addition to it. The idea of maintaining two separate LRU schemes is going to be a hard sell in the kernel community; it would be far better to just switch over completely to the multi-generational LRU if it is truly better.

Answering that question is certain to be a long process. Even relatively small memory-management changes can take a while to merge; it is just too easy to introduce performance penalties for some users. This change is not "relatively small", so the bar for inclusion will be high. But if the multi-generational LRU lives up to its claims, it may just be able to clear that bar — eventually.

Comments (12 posted)

The future of GCC plugins in the kernel

April 1, 2021

This article was contributed by Marta Rybczyńska

The process of hardening the kernel can benefit in a number of ways from support by the compiler. In recent years, the Kernel Self Protection Project has brought this support from the grsecurity/PaX patch set into the kernel in the form of GCC plugins; LWN looked into that process back in 2017. A recent discussion has highlighted the fact that the use of GCC plugins brings disadvantages as well, and some developers would prefer to see those plugins replaced.

The discussion started when Josh Poimboeuf reported an issue he encountered when building out-of-tree modules with GCC plugins enabled. In his case, the compilation would fail when the GCC version used to compile the module was even slightly different from the one used to build the kernel. He included a patch to change the error he received into a warning and disable the affected plugin. Later in the thread, Justin Forbes explained how the problematic configuration came about; it happens within the Fedora continuous-integration system, which starts by building a current toolchain snapshot. Other jobs then compile out-of-tree modules with the new toolchain, without recompiling the kernel itself. Since GCC plugins were enabled, all jobs with out-of-tree modules have been failing.

The idea of changing the error into a warning was met with a negative response from the kernel build-system maintainer, Masahiro Yamada, who stated: "We are based on the assumption that we use the same compiler for in-tree and out-of-tree". Poimboeuf responded that what he sees in real-world configurations doesn't match that assumption. Other kernel developers agreed with Yamada, though; Greg Kroah-Hartman wrote:

Have you not noticed include/linux/compiler.h and all of the different changes/workarounds we do for different versions of gcc/clang/intel compilers? We have never guaranteed that a kernel module would work that was built with a different compiler than the main kernel, and I doubt we can start now.

In addition, Yamada pointed out that the use of the same compiler version for both the kernel and its modules has been accepted as an assumption in previous discussions. With clear disapproval from the kernel developers, the discussion seemed closed at that point.

The dislike for GCC plugins

It restarted, however, when Poimboeuf came back a few days later with another idea for solving his problem: recompiling all plugins when the GCC version changes. This was refused by Yamada, who noted that Ubuntu does not have the GCC mismatch problem, so the problem seemed to be specific to Fedora. Linus Torvalds also disagreed with the proposal, but for another reason. For him there is no technical reason to recompile everything when the GCC version changes, but he expressed his concern on the usage and design of the GCC plugins in general. In a followup message he explained his reasoning in strong words:

The kernel gcc plugins _will_ go away eventually. They are an unmitigated disaster. They always have been. I'm sorry I ever merged that support. It's not only a maintenance nightmare, it's just a horrible thing and interface in the first place. It's literally BAD TECHNOLOGY.

For Torvalds, the right way to implement such plugins is at the intermediate representation (IR) level, but GCC plugins were designed differently for non-technical reasons (out of fear for non-free plugins, which LWN covered back in 2008). People who are interested in plugins should use Clang, as it has a clean IR and easily allows adding similar checks at the IR level, he said.

GCC plugins and their Clang equivalents

However, the removal of the kernel's GCC plugins does not seem likely in the near future. Kees Cook commented on the current status of the GCC plugins, their Clang equivalents, and why there is a user community for at least some of them. A number of the capabilities provided by the GCC plugins are not yet available with Clang — which many distributors are not using to build the kernel anyway.

Currently the kernel supports the following plugins (located in scripts/gcc-plugins/):

cyc_complexity computes the cyclomatic complexity of a function; it is one of the two initial example plugins, and likely has no users.
latent_entropy adds entropy from the CPU execution. Cook sees no uses of it, especially since the addition of the the jitter entropy mechanism. There is no Clang support planned.
The per-task stack protector for arm32 provides stack protection for 32-bit ARM platforms; no Clang equivalent exists today even for 64-bit systems, Cook said.
randstruct randomly changes the order of fields in kernel data structures that contain only function pointers, or are explicitly marked with __randomize_layout. There are two versions of this plugin: one complete and one restricted. The restricted version only changes the order of elements contained within the same cache line, which reduces the performance cost, but also the protection level. A Clang version was submitted, but is stalled. Cook noted that security-conscious end users tend to enable this plugin, but distributors do not.
sancov (which Cook didn't mention) helps fuzzing coverage by inserting a call to __sanitizer_cov_trace_pc() at the start of each basic block; it is used to determine which code blocks are being exercised.
stackleak traces the kernel's stack depth so that it can overwrite the used stack with a pattern when returning to user space. There is no Clang support planned for now.
structleak initializes structures that could be passed to user space. Clang has it implemented as the -ftrivial-auto-var-init=zero option; GCC is likely to gain support for that option as well at some point.

The end result is that there is probably a reason to keep these plugins around for a while yet.

Meanwhile, there were a couple of positive outcomes from the discussion. Along the way, it was realized that the plugins, which are highly sensitive to the GCC version they were built for, were not being rebuilt when that version changes. That had evidently been the case since the plugins were first added; that problem was fixed by Yamada, despite his rejection of this idea earlier in the discussion. As a solution for Poimboeuf's original problem, the developers finally agreed to show a warning when there is a GCC version mismatch between the kernel and modules. It will be up to the user to decide if the difference is minor and safe, or if it is necessary to recompile the kernel.

The problem of the GCC version mismatch was not the only one noticed by Poimboeuf; he also pointed out the plugin build-system's dependency on the (optional) gcc-plugin-devel package. Even if the user has the same GCC version as used for the kernel compilation, but they do not have this package, plugins will be silently disabled, though the kernel compilation will succeed without any warning. This problem has not been addressed further.

Conclusions

The discussion covered a number of problems with the GCC plugins. It likely means that developers should be careful when enabling them. Poimboeuf's original problem got a solution of sorts in the form of a warning, which might start showing up in some systems. Users might be able to ignore the warning if the two GCC versions are close. When enabling plugins, developers should be careful to install gcc-plugin-devel first, otherwise their modules may be compiled in an unexpected way.

The future of GCC plugins in the kernel is not set in stone yet. Clang seems to be a preferred option for the hardening work, and this direction has been encouraged by Torvalds, but the existing GCC plugins (with one exception) do not have Clang equivalents. It seems that they will stay for at least some time.

Comments (1 posted)

Killing off /dev/kmem

By Jonathan Corbet
April 5, 2021

The recent proposal from David Hildenbrand to remove support for the /dev/kmem special file has not sparked a lot of discussion. Perhaps that is because today's youngsters, lacking an understanding of history, may be wondering what that file is in the first place and, thus, be unclear on why it may matter. Chances are that /dev/kmem will not be missed, but in passing it takes away a venerable part of the Unix kernel interface.

/dev/kmem provides access to the kernel's address space; it can be read from or written to like an ordinary file, or mapped into a process's address space. Needless to say, there are some mild security implications arising from providing that sort of access; even read access to this file is generally enough to expose credentials and allow an attacker to take over a system. As a result, protections on /dev/kmem have always tended to be restrictive, but it remains the sort of open back door into the kernel that makes anybody who worries about security worry even more.

It is a rare Linux system that enables /dev/kmem now. As of the 2.6.26 kernel release in July 2008, the kernel only implements this special file if the CONFIG_DEVKMEM configuration option is enabled. One will have to look long and hard for a distributor that enables this option in 2021; most of them disabled it many years ago. So its disappearance from the kernel is unlikely to create much discomfort.

It's worth noting that Linux systems still support /dev/mem (without the "k"), which once provided similar access to all of the memory in the system. It has long been restricted to I/O memory; system RAM is off limits. The occasional user-space device driver still needs /dev/mem to function, but it's otherwise unused.

One may well wonder why a dangerous interface like /dev/kmem existed in the first place. The kernel goes out of its way to hide its memory from the rest of the system; creating a special file to circumvent that hiding seems like a step in the wrong direction. The answer, in short, is that once there was no other way to get many types of information out of the kernel.

As an example, consider the "load average" numbers printed by tools like top, uptime, or w; they indicate the average length of the CPU run queues over periods of one, five, and 15 minutes. In the distant past, when computers were scarce and it was common to run many tasks on the same machine, jobs that were not time-critical would often consult the load average and defer their work if it was too high. It was the sort of courtesy that was much appreciated by the other users of the machine, of which there may have been dozens.

But how does one determine the current load average? Unix kernels have maintained those statistics for decades, but they originally kept that information to themselves. User-space code that wanted to know this number would have to do the following:

Read the symbol table from the executable image of the current kernel to determine the location of the avenrun array.
Open /dev/kmem and seek to that location.
Read the avenrun array into a user-space buffer.

Code from that era can be hard to find, but the truly masochistic can wade through what must be one of the deeper circles of #ifdef hell to find an implementation toward the bottom of this version of getloadavg() from an early GNU make release. In a current Linux system, instead, all that is needed is to read a line from /proc/loadavg.

This kind of grubbing around in kernel memory was not limited to the load-average array. Tools with more complex information requirements also had to dig around in /dev/kmem; see, for example, the 2.9BSD implementation of ps. That was just how things were done in those days.

Rooting through the kernel's memory for information about the system has a number of problems beyond the need to implement /dev/kmem. Changes to the kernel could break user space in surprising ways. Multiple reads were often needed to get a complete picture, but that picture could change while the reads were taking place, leading to more surprises. The move away from /dev/kmem and toward well-defined kernel interfaces, such as /proc, sysfs, and various system calls, has cleaned this situation up — and made it possible to disable /dev/kmem.

Now, it seems that /dev/kmem will go away entirely. Linus Torvalds said that he would "happily do this for the next merge window", but he wanted confirmation that distributors are, indeed, not enabling it now. There have been a few responses for specific distributions, but nobody has said that /dev/kmem is still in use anywhere. If there are users of this interface out there, they will want to make their existence known in the near future. Failing that, this back door into kernel memory will soon be removed entirely — but, then, your editor once predicted that it would be removed for 2.6.14, so one never knows.

Comments (31 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Security things in 5.9; LineageOS 18.1; Django 3.2; Xinuos sues IBM; US Supreme Court rules for Google; Quotes; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>