Leading items

Welcome to the LWN.net Weekly Edition for September 27, 2018

This edition contains the following feature content:

The kernel's code of conduct, one week later: some of the questions around the new code of conduct are coming into focus; answers may take longer.
Archiving web sites: a survey of tools for making copies of complex sites.
Progress on Zinc (thus WireGuard): getting an important WireGuard prerequisite merged.
Time namespaces: a new type of namespace to control a container's view of time.
Software-tag-based KASAN: the KASAN debugging tool may see an interesting enhancement on ARM64 systems.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The kernel's code of conduct, one week later

By Jonathan Corbet
September 26, 2018

The dust has begun to settle after the abrupt decisions by Linus Torvalds to take a break from kernel maintainership and to adopt a code of conduct for the community as a whole. Unsurprisingly, the development community, most of which was not consulted prior to the adoption of this code, has a lot of questions about it and a number of concerns. While many of the answers to those questions will be a while in coming, a few things are beginning to come into focus.

It is worth starting with one important point that last week's article failed to mention: the new code of conduct is not actually new to the community as a whole. In particular, the DRM (graphics) subsystem adopted the freedesktop.org code of conduct in April 2017. This code, like the code for the kernel as a whole, is derived from the Contributor Covenant text. There have not been any problems of note arising from the use of this code in that subsystem to date. Your editor has been told that the DRM community's successful use of this code was a direct contributor to Torvalds's choice of this particular code as a starting point for the kernel.

One area of concern in the public discussion has been over the prohibition of the posting of "private information", explicitly including email addresses. Some maintainers read that as a prohibition against including tags in patches, most of which contain such addresses. Some, like Signed-off-by or Reviewed-by, are provided by the person involved and should be relatively uncontroversial. Others, like Cc or Reported-by, are likely to be added by a developer or maintainer. Unsurprisingly, maintainers do not like the idea of unwittingly violating the code simply by doing their jobs. Mauro Carvalho Chehab has argued this point, saying: "We should solve this quickly, as otherwise maintainers may need to postpone asking for pulling from any branches on trees that contain patches with such tags".

Those who have looked at the issue seem to be under the impression that, by posting an email to a public list, one has "published" one's own email address and it is no longer private information. This appears to be true even under some of the stricter privacy legislation found around the world. So the use of these addresses in patch tags would not appear to be problematic. Still, the prohibition as expressed in the code appears to be better suited to projects using online hosting sites rather than email for their patch flow. At some point, it would probably make sense to amend the code to clarify the intended rules.

Another area of concern is that the code places the responsibility for enforcement on all maintainers. There are now numerous kernel maintainers who never asked for this responsibility, and who lack a clear idea of what they are actually responsible for. As Shuah Khan put it: "I have to not only worry about the quality of code and technical aspects, but also be responsible for behavior of individuals I might not have any control or sway over". Khan is not alone in wondering what this requirement actually means for maintainership going forward.

Those questions will be harder to answer. In a community as large as the kernel, the responsibility for enforcing the rules must necessarily be distributed across the maintainers. It is not up to one person (or some sort of elected board) to ensure that patches live up to the coding style rules or have proper signoffs. The same will have to be true for the kernel's conduct standards. But if the responsibility for calling out abusive behavior lies solely on the shoulders of the maintainers, it is easy to predict that maintainer turnover will increase in the future. If the community truly wants to be a more welcoming place, it will have to be up to all members to encourage each other toward better behavior when it is needed.

In truth, that encouragement is all that is needed a great deal of the time, and anybody can do it. If the more repressive measures envisioned in the code of conduct are invoked with any sort of frequency, something will have gone badly wrong. With any luck at all, most maintainers will never see anything requiring more than an admonition and a request for more attention in the future. For those who do run into something worse, the project as a whole will need to provide resources and support. Consulting with the Technical Advisory Board is one such resource, but it is likely to be insufficient.

There have been some suggestions for changes to the code of conduct already, and more are sure to come. The code is almost certain to evolve to better fit the kernel community, but the process by which that evolution can happen has not been worked out — or even thought about much. Future changes will require discussion and widespread acceptance; they cannot just be applied like the current code was.

In summary, there are a lot of questions, and many of the answers have yet to be worked out. Getting there will take some time. It seems likely that there will be significant discussion of the topic at the Maintainers Summit in October, but there may not be many hard answers coming from there. After 27 years, we still haven't finished bashing the kernel into shape; the code of conduct and its associated processes should come together rather more quickly than that, but some patience may still be required.

Finally, it is worth being aware of the fear, uncertainty, and doubt (FUD) attacks against the kernel community that the code has brought about. Some developers feel better about the code than others at this point, but their concerns are expressed in a rather different manner than the various trolling messages that have appeared on linux-kernel, and which have seemingly been taken seriously by the mainstream press. We are not in that much trouble, and we do not, for example, have actual developers asserting a hypothetical right to revoke the GPL licensing from their contributions.

Reading some of those emails (not to mention some of the unpleasant stuff found on the wider Internet), it's hard not to feel that our community is under attack from the outside. Hopefully those people will soon get bored and go back to trying to stir up trouble elsewhere. To the extent that we can encourage their departure by not responding to obvious trolling emails, the better off we will be. As a community, we are far stronger than those who would seek to tear us apart.

Comments (100 posted)

Archiving web sites

September 25, 2018

This article was contributed by Antoine Beaupré

I recently took a deep dive into web site archival for friends who were worried about losing control over the hosting of their work online in the face of poor system administration or hostile removal. This makes web site archival an essential instrument in the toolbox of any system administrator. As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.

Converting simple sites

The days of handcrafted HTML web sites are long gone. Now web sites are dynamic and built on the fly using the latest JavaScript, PHP, or Python framework. As a result, the sites are more fragile: a database crash, spurious upgrade, or unpatched vulnerability might lose data. In my previous life as web developer, I had to come to terms with the idea that customers expect web sites to basically work forever. This expectation matches poorly with "move fast and break things" attitude of web development. Working with the Drupal content-management system (CMS) was particularly challenging in that regard as major upgrades deliberately break compatibility with third-party modules, which implies a costly upgrade process that clients could seldom afford. The solution was to archive those sites: take a living, dynamic web site and turn it into plain HTML files that any web server can serve forever. This process is useful for your own dynamic sites but also for third-party sites that are outside of your control and you might want to safeguard.

For simple or static sites, the venerable Wget program works well. The incantation to mirror a full web site, however, is byzantine:

    $ nice wget --mirror --execute robots=off --no-verbose --convert-links \
                --backup-converted --page-requisites --adjust-extension \
                --base=./ --directory-prefix=./ --span-hosts \
                --domains=www.example.com,example.com http://www.example.com/

The above downloads the content of the web page, but also crawls everything within the specified domains. Before you run this against your favorite site, consider the impact such a crawl might have on the site. The above command line deliberately ignores robots.txt rules, as is now common practice for archivists, and hammer the website as fast as it can. Most crawlers have options to pause between hits and limit bandwidth usage to avoid overwhelming the target site.

The above command will also fetch "page requisites" like style sheets (CSS), images, and scripts. The downloaded page contents are modified so that links point to the local copy as well. Any web server can host the resulting file set, which results in a static copy of the original web site.

That is, when things go well. Anyone who has ever worked with a computer knows that things seldom go according to plan; all sorts of things can make the procedure derail in interesting ways. For example, it was trendy for a while to have calendar blocks in web sites. A CMS would generate those on the fly and make crawlers go into an infinite loop trying to retrieve all of the pages. Crafty archivers can resort to regular expressions (e.g. Wget has a --reject-regex option) to ignore problematic resources. Another option, if the administration interface for the web site is accessible, is to disable calendars, login forms, comment forms, and other dynamic areas. Once the site becomes static, those will stop working anyway, so it makes sense to remove such clutter from the original site as well.

JavaScript doom

Unfortunately, some web sites are built with much more than pure HTML. In single-page sites, for example, the web browser builds the content itself by executing a small JavaScript program. A simple user agent like Wget will struggle to reconstruct a meaningful static copy of those sites as it does not support JavaScript at all. In theory, web sites should be using progressive enhancement to have content and functionality available without JavaScript but those directives are rarely followed, as anyone using plugins like NoScript or uMatrix will confirm.

Traditional archival methods sometimes fail in the dumbest way. When trying to build an offsite backup of a local newspaper (pamplemousse.ca), I found that WordPress adds query strings (e.g. ?ver=1.12.4) at the end of JavaScript includes. This confuses content-type detection in the web servers that serve the archive, which rely on the file extension to send the right Content-Type header. When such an archive is loaded in a web browser, it fails to load scripts, which breaks dynamic websites.

As the web moves toward using the browser as a virtual machine to run arbitrary code, archival methods relying on pure HTML parsing need to adapt. The solution for such problems is to record (and replay) the HTTP headers delivered by the server during the crawl and indeed professional archivists use just such an approach.

Creating and displaying WARC files

At the Internet Archive, Brewster Kahle and Mike Burner designed the ARC (for "ARChive") file format in 1996 to provide a way to aggregate the millions of small files produced by their archival efforts. The format was eventually standardized as the WARC ("Web ARChive") specification that was released as an ISO standard in 2009 and revised in 2017. The standardization effort was led by the International Internet Preservation Consortium (IIPC), which is an "international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future", according to Wikipedia; it includes members such as the US Library of Congress and the Internet Archive. The latter uses the WARC format internally in its Java-based Heritrix crawler.

A WARC file aggregates multiple resources like HTTP headers, file contents, and other metadata in a single compressed archive. Conveniently, Wget actually supports the file format with the --warc parameter. Unfortunately, web browsers cannot render WARC files directly, so a viewer or some conversion is necessary to access the archive. The simplest such viewer I have found is pywb, a Python package that runs a simple webserver to offer a Wayback-Machine-like interface to browse the contents of WARC files. The following set of commands will render a WARC file on http://localhost:8080/:

    $ pip install pywb
    $ wb-manager init example
    $ wb-manager add example crawl.warc.gz
    $ wayback

This tool was, incidentally, built by the folks behind the Webrecorder service, which can use a web browser to save dynamic page contents.

Unfortunately, pywb has trouble loading WARC files generated by Wget because it followed an inconsistency in the 1.0 specification, which was fixed in the 1.1 specification. Until Wget or pywb fix those problems, WARC files produced by Wget are not reliable enough for my uses, so I have looked at other alternatives. A crawler that got my attention is simply called crawl. Here is how it is invoked:

    $ crawl https://example.com/

(It does say "very simple" in the README.) The program does support some command-line options, but most of its defaults are sane: it will fetch page requirements from other domains (unless the -exclude-related flag is used), but does not recurse out of the domain. By default, it fires up ten parallel connections to the remote site, a setting that can be changed with the -c flag. But, best of all, the resulting WARC files load perfectly in pywb.

Future work and alternatives

There are plenty more resources for using WARC files. In particular, there's a Wget drop-in replacement called Wpull that is specifically designed for archiving web sites. It has experimental support for PhantomJS and youtube-dl integration that should allow downloading more complex JavaScript sites and streaming multimedia, respectively. The software is the basis for an elaborate archival tool called ArchiveBot, which is used by the "loose collective of rogue archivists, programmers, writers and loudmouths" at ArchiveTeam in its struggle to "save the history before it's lost forever". It seems that PhantomJS integration does not work as well as the team wants, so ArchiveTeam also uses a rag-tag bunch of other tools to mirror more complex sites. For example, snscrape will crawl a social media profile to generate a list of pages to send into ArchiveBot. Another tool the team employs is crocoite, which uses the Chrome browser in headless mode to archive JavaScript-heavy sites.

This article would also not be complete without a nod to the HTTrack project, the "website copier". Working similarly to Wget, HTTrack creates local copies of remote web sites but unfortunately does not support WARC output. Its interactive aspects might be of more interest to novice users unfamiliar with the command line.

In the same vein, during my research I found a full rewrite of Wget called Wget2 that has support for multi-threaded operation, which might make it faster than its predecessor. It is missing some features from Wget, however, most notably reject patterns, WARC output, and FTP support but adds RSS, DNS caching, and improved TLS support.

Finally, my personal dream for these kinds of tools would be to have them integrated with my existing bookmark system. I currently keep interesting links in Wallabag, a self-hosted "read it later" service designed as a free-software alternative to Pocket (now owned by Mozilla). But Wallabag, by design, creates only a "readable" version of the article instead of a full copy. In some cases, the "readable version" is actually unreadable and Wallabag sometimes fails to parse the article. Instead, other tools like bookmark-archiver or reminiscence save a screenshot of the page along with full HTML but, unfortunately, no WARC file that would allow an even more faithful replay.

The sad truth of my experiences with mirrors and archival is that data dies. Fortunately, amateur archivists have tools at their disposal to keep interesting content alive online. For those who do not want to go through that trouble, the Internet Archive seems to be here to stay and Archive Team is obviously working on a backup of the Internet Archive itself.

Comments (30 posted)

Progress on Zinc (thus WireGuard)

By Jake Edge
September 26, 2018

When last we looked at the WireGuard VPN code and its progress toward mainline inclusion, said progress was impeded by disagreements about the new "Zinc" cryptographic library that is added by the WireGuard patches. Since that August look, several more versions of WireGuard and Zinc have been posted; it would seem that Zinc is getting closer to being accepted. Once that happens, the networking developers are poised to review that portion of the code, which likely will lead to WireGuard in the kernel some time in the next development cycle or two.

Jason Donenfeld posted Zinc v3 as part of an updated WireGuard posting on September 10. Of the versions he has posted since our article (up to v6 as of this writing), v3 has gotten most of the comments. One of the main complaints about Zinc is that it creates a new crypto API in the kernel without really addressing why the existing one would not work for WireGuard. As Ard Biesheuvel put it:

In spite of the wall of text, you fail to point out exactly why the existing AEAD [authenticated encryption with associated data] API in unsuitable, and why fixing it is not an option.

As I pointed out in a previous version, I don't think we need a separate crypto API/library in the kernel, and I don't think you have convinced anyone else yet either.

Perhaps you can devote /your/ rare talent and energy to improving what we already have for everybody's sake, rather than providing a completely separate crypto stack that only benefits WireGuard (unless you yourself port the existing crypto API software algorithms to this crypto stack first and present *that* work as a convincing case in itself)

But Greg Kroah-Hartman said that the current crypto API is too hard to use for many parts of the kernel, which leads to simpler, private implementations of crypto primitives sprinkled all over the kernel tree. He suggested that the existing crypto API be switched over time to use the Zinc primitives where that is possible. But Eric Biggers was concerned that no conversions of that sort have been done, which means there could be undiscovered problems in the Zinc API that will make it difficult to do so: "I don't think it makes sense to merge all this stuff without doing the conversions, or at the very least demonstrating how they will be done".

Donenfeld said that he is willing to do those conversions, but wants to get the series merged first. "I'd really prefer to land this series through net-next, and then after we can turn our attention to integrating this into the existing crypto API". But, as Andrew Lunn pointed out, that may be putting the cart ahead of the horse. He noted that the networking developers have not had a serious look at the WireGuard patches and won't "until the controversial part of the code, Zinc, has been sorted out". He also predicted that networking maintainer David Miller would not take the code into his tree without an Acked-by from the crypto maintainers.

Miller confirmed that assessment and clarified that even though he is listed as one of the two crypto maintainers, he would be looking for an ack from the other maintainer, Herbert Xu, as "I haven't done a serious review of crypto code in ages". Xu has been quiet, so far, on the Zinc patches, with one exception.

Donenfeld feels that Biesheuvel is not pleased with the existence of Zinc. When Biesheuvel listed the additional items that he thinks need to be addressed in Zinc, Donenfeld's response is prefaced by a number of worries about Biesheuvel: that he is "generally hostile to this whole initiative", is trying to "stall it indefinitely", and perhaps will just keep bikeshedding "until Zinc copies lots of the same design decisions from the present crypto API". Donenfeld did also say that he hoped these were all just fears and did not truly reflect what was happening. But that was when Xu stepped in to make it clear that he values the review that Biesheuvel has been doing:

That may be your view but from what I've read Ard has been very constructive in pointing out issues in your submission. If your response to criticism is to dismiss them as hostile then I'm afraid that we will not be able to progress on this patch series.

Please keep in mind that this is a large project that has to support multiple users on one hand (not just WireGuard) and complex hardware acceleration drivers on the other. Ard has been one of the most prolific contributors to the crypto code and his review should be taken seriously.

For his part, Biesheuvel tried to make his intentions clear to Donenfeld in another part of the thread:

I am not an 'entrenched crypto API guy that is out to get you'. I am a core ARM developer that takes an interest in crypto, shares your concern about the usability of the crypto API, and tries to ensure that what we end up is maintainable and usable for everybody.

But the main technical objections that Biggers and Biesheuvel have raised were still being hashed out in the thread. Andy Lutomirski suggested that Donenfeld add a conversion of one of the algorithms in the existing crypto API to use Zinc as part of the patch set. After a bit of resistance, Donenfeld agreed. Conversions of the Poly1305 hash and ChaCha20 cipher (which are what WireGuard uses) in the existing crypto subsystem were part of the WireGuard v4 patch set.

Along the way, there have also been discussions about the OpenSSL implementations of some of the primitives (i.e. CRYPTOGAMS) that were incorporated into Zinc. These are written in assembly, but are actually generated from Perl scripts. Donenfeld has modified the assembly output in order to make it comply with kernel coding standards but has also made some other adjustments. That makes it difficult to keep the implementation in sync with any changes that OpenSSL might make, Biesheuvel noted. "Dumping 10,000s of lines of generated assembler in the kernel tree like that is really unacceptable IMO." Donenfeld said that he disagrees with the characterization of the code, but that getting his changes into the OpenSSL upstream is desirable.

Beyond that, there were some minor licensing concerns (and the resulting SPDX identifiers) with some files, which have seemingly been resolved. Similarly, some performance problems were noted and addressed. In short, Zinc is starting to look like something that could be merged. Donenfeld posted v6 of the WireGuard patch set on September 25.

Zinc is still awaiting an ack from Xu, though it is not clear how much he has scrutinized the code at this point. Once that happens, though, the networking side of the patch set can be reviewed by Miller and other networking developers. If all goes well, it will end up in the mainline before too long—but that still means at least four, or more likely seven, months from now. Whenever it comes, it is clear that WireGuard is eagerly anticipated by many.

Comments (11 posted)

Time namespaces

By Jonathan Corbet
September 21, 2018

The kernel's namespace abstraction allows different groups of processes to have different views of the system. This feature is most often used with containers; it allows each container to have its own view of the set of running processes, the network environment, the filesystem hierarchy, and more. One aspect of the system that remains universal, though, is the concept of the system time. The recently posted time namespace patch set (from Dmitry Safonov with a lot of work by Andrei Vagin) seeks to change that.

Creating a virtualized view of the system time is not a new concept; Jeff Dike posted an implementation back in 2006 to support his user-mode Linux project. Those patches were not merged at the time but, since then, the use of containers has taken off and the interest has increased. One might view time as a universal concept, but there are use cases for a per-container notion of time; they can be as simple as testing software at different points in time. The driving force behind this patch set, though, is likely to be problems associated with the checkpointing of processes and migrating them between physical hosts. When a process is restarted, it should have a consistent view of time, and that may require applying some adjustments at restart time.

The implementation is straightforward enough. Each time namespace contains a set of offsets to be added to the system's notion of the current time. The kernel maintains a number of clocks with different characteristics (documented here), each of which can have a different offset. Some of these clocks, such as CLOCK_MONOTONIC, have an undefined start point that will vary from one running system to the next, so they will need their own offsets to maintain consistent behavior for a container that has been migrated. System calls that adjust the system time will, when called outside of the root time namespace, adjust the namespace-specific offsets instead.

There is one small complication, in that some of the time-related system calls are implemented as virtual system calls on some architectures for performance reasons. Querying the current time can be a frequent operation, so it can be worth the trouble to answer such queries without actually entering the kernel. Making the virtual system calls aware of time namespaces requires making the clock offsets available to user space; the good news is that there is a small piece of the address space called the "VVAR page" (even though it is larger than one page) meant to hold just this kind of data. The time namespace work adds another page to this VVAR region to hold the time offsets, allowing calls like gettimeofday() to continue to work without entering the kernel.

Namespace maintainer Eric Biederman has expressed support for time namespaces, but he has also suggested some changes. His observation is that the timekeeper structure used within the kernel to implement the various clocks already contains a set of offsets relating those clocks to the hardware's idea of the current time. Rather than adding a second layer of offsets, he suggested, each namespace could be given its own timekeeper structure and the offsets found there could be tweaked instead. That might add to the complexity of the implementation, but this approach would have some advantages. Most of the kernel's current timekeeping code would just work with namespaces, allowing better testing overall with fewer special cases. Integrating namespaces at this level would also allow each container to run its own NTP process, and different containers could, for example, use different leap-second policies.

Biederman raised the possibility of security issues if time namespaces can be used to manipulate dates on files in filesystems, though he was not sure if that actually mattered. He also suggested that access to the realtime clock (the hardware clock that, in the end, drives the system's timekeeping) should perhaps be left out of the time namespace until it is clear that there are actual use cases for it. If that use case does arise, he said, some thought will have to be given to how the realtime clock, which is a global resource, should be presented to non-root namespaces.

There are, in other words, a few details remaining to be worked out regarding how time namespaces will work. There do not, however, appear to be any real obstacles to a solution, so chances are good that the kernel's collection of namespaces will be enhanced by time namespaces sometime in the not-too-distant future. Given how long the idea has been around, one might say it's about time.

Comments (19 posted)

Software-tag-based KASAN

By Jonathan Corbet
September 26, 2018

The kernel address sanitizer (KASAN) is a kernel debugging tool meant to catch incorrect use of kernel pointers. It is an effective tool, if the number of KASAN-based bug reports showing up on the mailing lists is any indication. The downside of KASAN is a significant increase in the amount of memory used by a running system. The software-tag-based mode proposed by Andrey Konovalov has the potential to address that problem, but it brings some limitations of its own.

KASAN works by allocating a shadow memory map to describe the addressability of the kernel's virtual address space. Each byte in the shadow map corresponds to eight bytes of address space and indicates how many of those eight bytes (if any) are currently accessible to the kernel. When the kernel allocates or frees a range of memory, the shadow map is updated accordingly. Using some instrumentation inserted by the compiler, KASAN checks each kernel pointer dereference against the shadow map to ensure that the kernel is meant to be accessing the pointed-to memory. If the shadow map indicates a problem, an error is raised.

It is an effective technique and, thanks to the support from the compiler, the run-time CPU overhead is tolerable in many settings. But the shadow map requires a great deal of memory, and that does affect the usability of KASAN in the real world, especially when it is used on memory-constrained systems. This overhead is particularly painful for users who would like to run KASAN on production systems as an additional security measure.

The new mode uses a different approach that takes advantage of an ARM64 feature called top-byte ignore (TBI). A 64-bit pointer allows for a large address space, rather larger than is actually needed on current systems, even if a web browser is running. When TBI is enabled, the system's memory-management unit will ignore the top byte of any address, allowing that byte to be used to store eight bits of arbitrary information. One possible use for that byte is to ensure that pointers into memory are pointing where they were intended to.

In the software-tag-based mode, KASAN still allocates the memory map, but with some changes. Each byte in the map now corresponds to 16 bytes of real memory rather than eight, cutting the size of the map in half. Whenever the kernel allocates memory, a random, eight-bit tag value will be chosen. The pointer to the allocated object (which is aligned to a 16-byte boundary) will have that tag value set in the top byte; the tag value is also stored into the shadow memory map at the location(s) corresponding to that object. Whenever the returned pointer is dereferenced, its embedded tag value will be compared (using instrumentation from the compiler again) against the tag stored in the shadow memory map; if they do not match, an error will be logged.

There are some clear advantages to this mode, starting with the halving of the amount of memory required for the shadow map. Current KASAN can only catch references to memory that the kernel is not meant to access at all; the new mode can catch the use of pointers that have strayed into the wrong part of kernel memory. On the other hand, the new mode will fail to catch a reference just beyond an allocated object if it falls within the 16-byte resolution of the map. There is a small possibility that an errant pointer will hit another region of memory that happened to get the same tag; such an access would not be detected. This mode will also only work on ARM64 processors, and it requires at least version 7 of the Clang compiler.

There is another potential issue with the use of the software-tag-based mode. Address translation will ignore the top byte of a pointer when TBI is turned on, but other operations, such as pointer arithmetic and pointer comparisons, will not. Subtracting one pointer from another is a common operation in C programs; if those two pointers have different tag values, though, the result is unlikely to be what the developer intended. An erroneous subtraction is likely to make itself known quickly, but a comparison for equality that fails because two otherwise equal pointers have different tags could lead to rather more subtle problems. One can argue that pointers with different tags will have originated from different allocations and should not be compared anyway, but worries about the possibility of breaking things have led to some long discussions after previous postings of this work.

In an attempt to address these concerns, Konovalov ran some extensive tests to try to find potential problems:

All pointer comparisons/subtractions have been instrumented in an LLVM compiler pass and a kernel module that would print a bug report whenever two pointers with different tags are being compared/subtracted (ignoring comparisons with NULL pointers and with pointers obtained by casting an error code to a pointer type) has been used.

The test turned up a number of places where such operations were taking place, but none of them turned out to be situations where the pointer tags changed the kernel's behavior; see the patch posting linked above for the full discussion.

There is a small set of benchmark results included in the patch as well; it shows that software-tag-based KASAN performs similarly to regular KASAN in terms of CPU usage, though network bandwidth does drop somewhat. The new mode does use quite a bit less memory, though, as expected. KASAN remains far from free in either mode, though, tripling the time required for the test system to boot and reducing the networking performance to less than half of what is otherwise possible. So it is still going to be hard to use KASAN in production systems most of the time.

Upcoming technologies, such as Arm's memory tagging, promise to support much of this functionality in hardware, which may change the equation somewhat. For the time being, though, KASAN must be implemented in software. It has found a number of bugs in the kernel, and would certainly find more if it were able to run in more contexts. The software-tag-based mode should make it possible to use KASAN on systems where its memory overhead is currently prohibitive, and that seems like a good thing.

Comments (8 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>