Leading items

Welcome to the LWN.net Weekly Edition for July 23, 2020

This edition contains the following feature content:

Maintaining stable stability: an Open Source Summit presentation on how stable kernel releases are made.
Emulating Windows system calls, take 2: a different approach to intercepting system calls in user space.
Memory protection keys for the kernel: protecting memory from inadvertent access by the kernel.
Ubuntu invests in Google's Flutter and Dart: bringing an Android-centric toolkit to Linux.
Open-source contact tracing, part 2: concluding our look at contract-tracing apps.
The sad, slow-motion death of Do Not Track: why DNT failed, and what might replace it.
New features in gnuplot 5.4: voxel plots and more!

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Maintaining stable stability

By Jake Edge
July 22, 2020

OSSNA

The stable kernel trees are quite active, often seeing several releases in a week's time. But they are also meant to be ... well ... stable, so a lot of effort goes into trying to ensure that they do not introduce new bugs or regress the kernel's functionality. One of the stable maintainers, Sasha Levin, gave a talk at the virtual Open Source Summit North America that described the process of ensuring that these trees are carefully managed so that they can provide a stable base for their users.

Background

The goals of the stable tree are somewhat in competition with each other, Levin said. The maintainers do not want to introduce regressions into the tree, but they also want to try to ensure that they do not miss any fixes that should be in the tree. It is "very tricky" to balance those two goals. The talk would follow the path of patches that fix bugs, from the time they are written until they get released in a stable tree, showing the mechanisms in place to try to ensure that only real, non-regressing fixes make it all the way to the end.

The first stage is the rules for the kinds of patches that get accepted into the stable tree. They have to be small, straightforward fixes that are already upstream in Linus Torvalds's tree. No complex new mechanisms or new features are welcome in the stable tree. The patches have "passed the minimal bar" to get accepted into the mainline, but it is sometimes necessary for the maintainers (or patch submitters) to backport the patch. That is something the maintainers try hard to avoid, so that the testing of the mainline is effectively also testing everything in stable, but backports cannot be avoided at all times. If there are large, intrusive patches that must be backported—for, say, mitigations for speculative-execution processor flaws—the stable maintainers require a lot more testing, subsystem maintainer signoffs, and more to try to ensure that the backport is reasonable.

Stable patch process

The stable process starts when someone is working on a patch. If they submit it upstream tagged for the stable tree, reviewers and maintainers will generally pay more attention to it. Because the patch will likely end up in users' hands quickly, it is important to ensure that the patch is correct. If a patch is submitted that fixes a problem, but is not tagged for stable, the subsystem or stable-tree maintainers may ask that it get tagged for stable and, perhaps, get a "Fixes:" tag to help with backports.

There is a bot that grabs patches tagged for stable from mailing lists and tries to apply them to different stable trees based on the stable versions indicated or by using the Fixes: tag. If the patch does not apply correctly, the bot will alert the author to the problem and possibly offer suggestions for other patches that may need to be applied before the fix will apply. It is important to do this when the patch is still being worked on, Levin said, as developers are generally more responsive when the fix is still fresh in their minds.

"There's never a bad time to send upstream." Unlike other types of patches, fixes can be sent to the mailing list at any point in the development cycle. There is no need to wait for a merge window or to target a particular release candidate. "You should never sit on a fix, if you think it is good to go", he said.

Once a patch has been reviewed and gets accepted into a maintainer's tree, it will usually also end up in the linux-next tree, which means that it will get hit with a bunch of tests, mostly from bots of various sorts. Also, the KernelCI continuous-integration project will run its tests on various maintainer trees. These provide a "good first bar" for stable patches to clear.

Another testing tree, stable-next, is created by pulling the patches in the linux-next tree that are tagged for stable but have not (yet) been merged into Torvalds's tree. The idea is that test failures in this tree are likely to be caused by patches that are making their way into the stable trees, so it raises the visibility of those kinds of problems. "We don't want those failures to be swallowed up by other failures in linux-next", Levin said. Doing so also helps find patches that are being fixed by later patches, which are not yet upstream; if a stable-tagged fix is actually buggy, then it can be held out of the stable tree until its fix gets committed to the mainline.

Into the mainline

Once the patch is merged into the mainline, it will be exposed to even more tests, many of which are being run by kernel developers rather than bots. There is more exposure to different workloads and hardware at that point. The stable maintainers are appreciative of all of the work that people are doing testing the mainline as it helps make the stable trees better, he said.

If a patch in the mainline does not have a stable tag but it looks like it may be a fix, it might get submitted to the AUTOSEL bot, which is a neural network for finding fixes. AUTOSEL looks at various parts of a patch—its author, commit message, signoffs, files changed, and certain code constructs—in order to determine if the patch is a probable fix that is missing a stable tag.

When a patch reaches the mainline, that is when the work of the stable maintainers truly begins, he said. Each patch that is taken into the stable trees is reviewed by one of the maintainers. They look at each patch manually to try to ensure that it is correct and that it is appropriate for the stable kernels. The AUTOSEL patches get an additional week of review time in order to hopefully weed out problem patches that were identified in that process. When a patch is queued for stable, an email is generated to the author informing them that it has been done so that they can object if they think it is inappropriate or being applied to the wrong tree.

The stable maintainers try to avoid modifying patches that do not cleanly apply to a particular tree. Levin pointed to his dependency-chain Git repository as a way to find a set of patches that will allow the unmodified fix to be merged. "We would rather take a few more patches to make a certain patch apply and build and test cleanly rather than modifying the patch", he said. Looking at the dependency chain will also often help find other fixes that did not get tagged for stable.

Once a patch makes it into the queue for a stable release, it will get tested again by many of the same bots that test linux-next. These trees are generated frequently with new patches added, so any testing failures will often point to the latest patches that were added. Compared to linux-next, the stable queue trees have far fewer patches, so it is easier to sanity check them for missing patches, regressions, and so on.

Once or twice a week, release candidates are tagged. That will generate "yet another mail" informing developers that their patches have made it into a stable release candidate. It gives developers another chance to object or comment on the patch with respect to the stable trees. The release candidates are also tested with real workloads, Levin said; users of the stable trees test them in their data centers and in their test farms. The tests are more comprehensive than those typically done for mainline release candidates, since they involve the "actual end users" of the trees, he said.

Anyone who is concerned about regressions from the stable kernels is encouraged to get involved in this process by testing their own workloads with these kernels. The stable maintainers do not want to release kernels with regressions, so they want to hear about any problems from users. The process is fairly "free form", he said, so that companies who do not want to talk about their workloads publicly can still report problems they encounter privately to the stable maintainers. They will make every effort to address any problems found before any release so that regressions are minimized or eliminated.

Aftermath

Once the stable kernel is released, there is still more that the maintainers do. They monitor the mailing lists for fixes and bug reports that may impact patches added to the stable kernels. When those are found, the maintainers move quickly to get them into the next stable release in order to reduce the amount of time any regressions or bugs stay in the stable kernels.

One of the goals of the stable maintainers is to improve the testing and validation strategy for the kernel as a whole, Levin said. There is a belief that the stable kernels should only get a small number of changes "because it's stable". But if that were the case, it would mean that the tree is missing lots of important fixes. The way to address the problem is not by taking fewer patches, but "by beefing up the kernel's testing story". The maintainers work with projects like KernelCI and the 0-Day testing service to help ensure that they are working well and have the resources that they need.

Monitoring the downstream trees, like the kernel trees for Ubuntu and Fedora, is also something that the stable maintainers do. If a patch is in a distribution's kernel but is not in the upstream kernel, maybe it should be, especially if it fixes something. Similarly, they monitor the bug trackers of various vendors and distributions to spot fixes that may need to be added to the stable kernels. Recently, they have been looking at the older stable kernels to see if there are fixes that have been missed for them along the way; when those are found, they get added into those older stable trees.

The stable kernel process has a lot of safeguards in place to try to ensure that regressions are not introduced into those kernels. The stable kernels are "way better tested" than the mainline because they are seeing actual real workloads, rather than "just developers trying it on their laptops". The rate of regressions is low, especially when compared with the mainline, he said. So people should feel confident to take each new stable kernel as it is released; in addition, there will never be fixes in older stable kernels that are not also in the newer stable kernels, so upgrading to a newer stable series will not introduce regressions—from the stable process, at least.

At the end of the talk, a somewhat differently dressed Levin appeared to answer questions that were submitted through the chat-like interface in the conference system. One asked about cooperation between the stable maintainers and projects like the Civil Infrastructure Platform, which are maintaining kernels for longer time frames. Levin said there are patches flowing in both directions between the groups and that there is a lot of cooperation around KernelCI and other testing initiatives. In answer to another question, Levin said that he hoped that failure reproducers from syzbot fuzz testing could be added as part of testing for the stable tree at some point. He also acknowledged that a "not for stable" tag might be needed in the future, though currently that is handled by a note in the commit message to that effect—hopefully along with the reason why.

While the talk was interesting, it was still vaguely unsatisfying—virtual conferences unsurprisingly do not live up to their in-person counterparts. But that is the way of things for a while, at least, and perhaps beyond even the end of the pandemic. The carbon footprint of such gatherings is certainly of some concern. In any case, the stable kernel process seems to be in good shape these days, with attentive maintainers, lots of testing, and plenty of fixes to get into the hands of users. Levin's talk was definitely a welcome look inside.

Comments (4 posted)

Emulating Windows system calls, take 2

By Jonathan Corbet
July 17, 2020

Back in June, LWN covered a patch set adding a mechanism intended to help systems like Wine emulate Windows system calls on a Linux system. That patch set got a lot of attention and comments, with the result that its form has changed considerably. Gabriel Krisman Bertazi has now posted a new patch set that takes a different approach to solving the same problem.

As a reminder, the intent of this work is to enable the running of Windows binaries that call directly into the Windows kernel without going through the Windows API. Those system calls must somehow be trapped and emulated for the program to run correctly; this must be done without modifying the Windows program itself, lest Wine run afoul of the cheat-detection mechanisms built into many of those programs. The previous attempt added a new mmap() flag that would mark regions of the program's address space as unable to make direct system calls. That was coupled with a new seccomp() mode that would trap system calls made from the marked range(s). There were a number of concerns raised about this approach, starting with the fact that using seccomp() might cause some developers to think that it could be used as a security mechanism, which is not the case.

The new patch set has thus moved away from seccomp() and uses prctl() instead, following one of the suggestions that was made in response to the first attempt. Specifically, a program wanting to enable system-call trapping and emulation would make a call to:

    prctl(PR_SET_SYSCALL_USER_DISPATCH, operation, start, end, selector);

Where operation is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF. In the former case, system-call trapping will be enabled; in the latter it is turned off. The start and end addresses indicate a range of memory from which system calls will not be trapped, even when it is enabled; that is the place to put the code that performs the system-call emulation. The selector argument is a pointer to a byte in memory that provides another mechanism to toggle system-call trapping.

When this feature is enabled with PR_SYS_DISPATCH_ON, the kernel sets a flag (_TIF_SYSCALL_USER_DISPATCH) in the process's task structure. This flag is tested whenever the process makes a system call. If it is set, a further check is made to the memory pointed to by the provided selector; if the value stored there is PR_SYS_DISPATCH_OFF, the system call will be executed normally. If, instead, that location holds PR_SYS_DISPATCH_ON, the kernel will deliver a SIGSYS signal to the process.

The signal handler can then examine the register set at the time of the trap; that will indicate which system call was being made and its arguments. That call can be emulated in the handler; once the handler returns, the process will resume after the trapped system call. This handler must be placed in the special system-calls-allowed region of memory or things will not work well, even in the unlikely case that the handler makes no system calls of its own. In Linux, returning from a signal handler involves the special sigreturn() system call, which must be able to execute without trapping (recursively) into the handler.

The special selector variable allows trapping to be quickly enabled and disabled without the need to call into the kernel every time. For a system like Wine, which moves frequently between Windows and Linux-native code, that should result in a measurable performance improvement.

The initial implementation of this mechanism received a lot of comments on the mailing list. This time, comments are limited to one from Kees Cook, who said that it "looks great". So the way would seem to be clear for this feature to get into the mainline in the relatively near future.

Comments (48 posted)

Memory protection keys for the kernel

By Jonathan Corbet
July 21, 2020

The memory protection keys feature was added to the 4.6 kernel in 2016; it allows user space to group pages into "protection domains" that can have their access restricted independently of the normal page protections. There is no equivalent feature for kernel space; access to memory in the kernel's portion of the address space is controlled exclusively by the page protections. That situation may be about to change, though, as a result of the protection keys supervisor (PKS) patch set posted by Ira Weiny (with many patches written by Fenghua Yu).

Virtual-memory systems maintain a set of protection bits in their page tables; those bits specify the types of accesses (read, write, or execute) that are allowed for a given processor mode. These protections are implemented by the hardware, and even the kernel cannot get around them without changing them first. On the face of it, the normal page protections would appear to be sufficient for the task of keeping the kernel away from pages that, for whatever reason, it should not be accessing. Those protections do indeed do the job in a number of places; for example, page protections prevent the kernel from writing to its own code.

Page protections work less well, though, in situations where the kernel should be kept away from some memory most of the time, but where occasional access must be allowed. Changing page protections is a relatively expensive operation involving tasks like translation lookaside buffer invalidations; doing so frequently would hurt the performance of the kernel. Given that protecting memory from the kernel is usually done as a way of protecting against kernel bugs that, one hopes, do not normally exist anyway, that performance hit is one that few users are willing to pay.

If memory could be protected from the kernel in a way that efficiently allows occasional access, though, there is likely to be more interest. That is the purpose of the PKS feature, which will be supported in future Intel CPUs. PKS associates a four-bit protection key with each page in the kernel's address space, thus allowing each of those pages to be independently assigned to one of sixteen zones. Each of those zones can be set to disallow write access from the kernel, or to disallow all access altogether. Changing those restrictions can be done much more quickly than changing the protections on those pages.

The patch set adds a few new functions for management of protection keys, starting with the allocation and deallocation routines:

    int pks_key_alloc(const char * const pkey_user);
    void pks_key_free(int pkey);

A protection key is allocated with pks_key_alloc(); the pkey_user string only appears in an associated debugfs file. The return value will either be the key that has been allocated, or a negative error code if the allocation fails. A previously allocated key can be freed with pks_key_free().

Code using protection keys must be prepared for pks_key_alloc() to fail. This feature will not be available at all on most running systems for some time, so there may be no keys to allocate. Even when the hardware is available, there are only fifteen keys available for the entire kernel to use (since key zero is reserved as the default for all pages). If there is contention for keys, not every subsystem will succeed in allocating one. The good news is that failure to allocate a key just leaves the affected subsystem in the situation it's in today; everything still works, but the additional protection will not be available.

Putting a specific page under the control of a given key is done by setting its (regular) protections in the usual ways and using the PAGE_KERNEL_PKEY() macro to set the appropriate bits in the protection field. Once the key has been set, there is no further need to modify the page's protections. When a key is first allocated, it will be set to disallow all access to any pages associated with that key. Changing the restrictions is done with:

    void pks_update_protection(int pkey, unsigned long protection);

Where protection is zero (to enable all access allowed by the ordinary page protections), PKEY_DISABLE_WRITE, or PKEY_DISABLE_ACCESS. This operation is relatively fast; one reason for that is that it only applies to the current thread. One thread running in the kernel can thus enable access to a specific zone without opening it up to kernel code running in other threads.

One can think of a number of areas where this feature might be useful within the kernel. Protecting memory containing cryptographic keys from all access will raise the bar for any attacker trying to get at those keys. The initial focus for this patch set, though, is protecting device memory from stray writes. The kernel accesses memory found on peripheral devices by mapping that memory into its own address space; that makes access quick, but it also opens up a whole new range of potential problems should the kernel accidentally write to the wrong place.

Kernel developers famously experienced this eventuality back in 2008, when writes to the wrong place destroyed numerous e1000e network interface cards before being tracked down. That problem was at least highly evident; users tend to notice when they can no longer connect to the net. The advent of persistent memory, though, has raised the stakes on this kind of problem. This memory holds important user data; a stray write will corrupt that data in ways that may not be discovered for some time. Persistent memory can occupy terabytes of address space — something that a network interface is unlikely to do — so the target for misdirected writes is significantly larger. It would be undesirable for Linux to gain a reputation as the sort of system that occasionally trashes data in persistent memory, so an additional level of protection seems like a useful thing to have.

The PKS patch set provides this protection by allocating a single protection key for all device memory. Device drivers wishing to opt into the protection provided by this key can do so by setting a new flag in the dev_pagemap structure associated with the memory in question. This memory will be set up with writes disabled (but reads allowed); whenever the kernel needs to write to that memory, it will need to adjust the restrictions first. That can be done with a couple of helper functions:

    void dev_access_enable();
    void dev_access_disable()

Most of the time, though, those calls are not necessary. It is already a bug to access device memory without having first called kmap() (or one of its variants); the patch set enhances those functions to enable write access whenever a mapping obtained from kmap() exists for a protected region. Naturally, that means that, if a driver marks memory as being protected, calls kmap() on the memory, then keeps the mapping around forever, the extra access protection for all device memory using this mechanism will vanish. Hopefully, all users are calling kmap_atomic(), which requires mappings to be short-lived and is more efficient as well.

While there seems to be a consensus that this feature is worth supporting in the kernel, there is still some ongoing discussion about various details of how it is implemented. It thus seems unlikely to be ready to be merged when the 5.9 merge window opens next month. PKS may well find its way into the kernel in a subsequent development cycle, though, making the kernel that much less likely to overwrite a persistent-memory device by mistake.

Comments (9 posted)

Ubuntu invests in Google's Flutter and Dart

By John Coggeshall
July 16, 2020

Flutter is Google's open-source toolkit to build cross-device (and cross-platform) applications. Based on the Dart programming language released by the company in 2013, Flutter promises developers the ability to write and maintain a single application that runs on all of a user's devices. Flutter applications support deployment on Android, iOS, Web browsers via JavaScript, macOS, and now Canonical and Google have teamed up to support Flutter applications in Linux. Promises of native speed, rapid development, and a growing community make it an interesting technology to take a look at.

Flutter focuses on consistency and quality of the user experience it provides. Google has devoted considerable resources over the years in service of understanding how to build high-quality user experiences. These efforts have lead to projects like Material Design, with those principles being translated into Flutter's components and overall development philosophy. For developers who prefer an iOS-style interface, Flutter provides components for that as well.

Flutter itself is billed by Google as a "UI Toolkit", and both Flutter and Dart are licensed under a permissive BSD 3-Clause license. Google declared Flutter "production ready" in 2018, and the company now claims over two million developers use the Flutter toolkit for application development. Since its release, Flutter has also built a significant open-source community of contributors and applications.

Originally, Flutter was a toolkit focused on mobile application development targeting only Android and iOS platforms. With the version 1.0 release Google also started experimenting with using Flutter on traditional desktops. In the year and a half since then, Flutter now provides what the project describes as "alpha-quality features" for both macOS and Linux desktop environments. For Linux desktops, Flutter is implemented as a wrapper around GTK+ and according to the project, support for the Windows platform is still under development.

Hoping to grow past the "alpha" quality of Flutter's desktop experience, Google's new partnership with Canonical for the technology is an interesting development. For Canonical, the partnership offers an opportunity to expand the number of applications with polished user experiences available to install in Ubuntu. Interested readers can explore a sampling of various proprietary Flutter-based applications here, and a sampling of open-source applications here. In exchange for this new channel of applications, Canonical is "dedicating a team of developers to work alongside Google's developers to bring the best Flutter experience to the majority of Linux distributions." How exactly this team of developers is contributing to Flutter and Dart development was not disclosed in detail in the announcement.

Certainly what Canonical has in mind with the partnership is both its Linux distribution and its Snap application distribution channel. That said, there is nothing standing in the way of other distributions taking advantage of the work being done improving Flutter. While the announcement does promote Snap as the means of using and developing with Flutter, it is important to note that Snap is not a dependency for using, developing, or distributing Flutter-based applications. In fact, it would be quite reasonable to deploy a Flutter application in Linux using existing deployment mechanisms like APT.

By making Flutter applications first-class citizens in the Linux desktop, it opens the door for the estimated 80,000 applications currently written in Flutter to be ported with relative ease. To do so, Flutter (and to some extent Dart itself) has had to undergo "extensive refactoring" by Google to support desktop environments and other Linux-specific items. To understand where this effort has been focused, the general outline of the Dart and Flutter ecosystem must be understood first.

Because Flutter is written in Dart, the full array of open source Dart library packages (found at pub.dev) available to developers can, in theory, be used when building Flutter applications. This makes sense — most packages are simply implementations of things like OAuth and written in pure Dart code. However, some packages, distinguished in Dart as "plugins", are platform or architecture specific. If a Flutter application depends on these plugins, it will not run on desktop Linux due to that dependency. To that point, one of the investments Google and Canonical have made is building Linux support into the key Dart plugins. For example the url_launcher plugin, which enables a Dart application to open a URL in the user's preferred browser, has been extended to support working in Linux desktop environments. Similarly the shared_preferences plugin, which provides persistent user-preference storage across devices, has also been extended to support Linux desktops.

The work done in plugins will likely make the bar for porting existing Flutter applications to Linux lower, but it won't be enough for developers looking to use Flutter as a better way to build desktop-specific applications. For that, Flutter will need access to the countless tools available as shared libraries for everything from authentication to graphics. To these ends, the Dart project has worked to provide a foreign function interface (FFI) for the language.

FFI enables a Dart code base, such as a Flutter application, to take advantage of normal C-based libraries available on the desktop. While the feature is currently in "beta" status, once the rough edges have been smoothed it will be a critical piece of technology in the quest to make applications based on Dart (and consequently Flutter) viable on the Linux desktop. Obviously, however, Flutter applications taking advantage of the FFI wouldn't function on mobile platforms like Android, thereby undercutting the cross-device reasoning justifying Flutter's original purpose. It appears that Google and Canonical hope that Flutter can play a role in Linux desktop-specific applications, even if portability must be sacrificed in the process.

In future articles, we will look closer at both the Dart and Flutter projects — both seem like under-appreciated tools for application development. Canonical's investment in the technology certainly indicates it hopes to change that perception (at least for the desktop), and ultimately ease the task of writing open-source Linux desktop applications. It will be interesting to see if this move translates universally across distributions that offer a different desktop experience than Ubuntu, or if every distribution will have to make its own investment to take advantage of this technology to the degree it will be available to Ubuntu. Based on the approach currently being taken, it seems likely that all Linux desktops will indeed benefit from the work. That's great news, as the Linux community has struggled to provide a coherent approach to desktop applications that meet the expectations of the general public. The general public is becoming more of a concern for companies like Canonical, especially as hardware manufacturers begin offering Linux-based machines to the masses. Flutter offers a possible path forward to provide applications with the polish consumers-at-large have come to expect from their devices, in addition to being a compelling open-source tool for developers.

Comments (40 posted)

Open-source contact tracing, part 2

July 20, 2020

This article was contributed by Marta Rybczyńska

Contact tracing is a way to help prevent the spread of a disease, such as COVID-19, by identifying an infected person's contacts so that they can be informed of the infection risk. In the first part of this series, we introduced open-source contact-tracing applications developed in response to the current pandemic, and described how they work. In this part, we look into the details of some of them, of both centralized and decentralized design. These application projects have all released their source code, but they differ in the implementation details, licenses used, and whether they accept user requests or patches. We conclude with the controversies around the tracing applications and the responses to them.

TraceTogether/OpenTrace (Singapore)

In March 2020, the first contact-tracing app was released; it was TraceTogether in Singapore. As of early July 2020, it had been downloaded over 2.1 million times for a population of Singapore of around 5.8 million. The app uses a protocol called BlueTrace. A reference implementation of the protocol was released under the name of OpenTrace; it includes Android and iOS apps and the server piece. All those elements are released under GPL v3.

The Git repository seems quiet after the initial release, counting, for example, only five commits to the Android app. It seems likely, then, that the public and private source trees diverged at some point. This looks to be confirmed when we look into the binary TraceTogether app analysis by Frank Liauw, and compare his results with the OpenTrace source code. OpenTrace includes, for example, the same database structure, but does not contain the updates made in TraceTogether. This means that the installed app does not correspond with the released source code, which could mean that some of the privacy characteristics of the app have changed.

Beyond just the source code, the design paper [PDF] describes the main ideas and details of the protocol. Users are identified by their phone numbers; both global and temporary IDs are generated by the centralized server. The apps may download batches of temporary IDs in advance in order to continue working offline. The proximity tracing is done by Bluetooth and the BlueTrace protocol includes sending the phone model, for distance calibration purposes, along with the temporary ID.

The app logs all encounters (temporary IDs), without filtering them by distance. They are stored in the user's phone. When the person is infected, the health authorities contact them to upload the data. The health authorities then process the contact log to find out the people who might be at risk, and contact them by phone. Each user can ask to remove all their data stored on the server (including the global ID and the phone number), but they need to request it by email.

As TraceTogether was the first app available, the project faced a number of challenges. One of the main ones was the distance estimation using Bluetooth, as different phones have different signal strengths, so calibration is needed to obtain the actual distance based on the measured value. The project team released a detailed explanation of its experiments and the data it obtained.

In early June 2020, Singaporean minister Vivian Balakrishnan announced a standalone token device called the TraceTogether Token. It has no Internet connectivity but can interoperate with the TraceTogether phone app. The aim of the token is to help people without smartphones, as well as iPhone users (because the iPhone app had to be kept in the foreground to operate — something that was fixed in TraceTogether version 2.1). Token distribution began among older citizens in the last week of June 2020.

The token broadcasts its temporary ID and records the IDs it encounters. If a user gets infected, they are asked to hand the device over to the health authorities in order to dump the data. The device itself was presented to a group of experts before the release. Among those experts was Andrew "bunnie" Huang, who gave his impressions of the token; because of privacy concerns, he is in favor of using tokens, rather than smartphones and apps, for contact tracing. Also present was Sean "xobs" Cross, who described the technical details from examining the device and observing the teardown during the presentation; from that, he concluded that the device is unlikely to be hiding any unpleasant surprises in terms of hardware.

The BlueTrace protocol was changed so that the token generates temporary IDs on its own, using a hash of the global ID and the time, instead of downloading them from the central server. Other changes targeted lowering the power consumption; the token broadcasts the temporary ID instead of making two-way connections with peers. Neither the source code of the firmware nor the changes in the protocol are available as of early July 2020. Cross and Huang noticed that the token design is fairly similar to their Simmel hardware token for contact tracing; that design is available on GitHub, including both software and hardware.

COVIDSafe (Australia)

COVIDSafe is an Australian app that is based on OpenTrace; the Wikipedia page has additional information on COVIDSafe. It uses a system similar to BlueTrace for proximity tracing. The source code for the Android and iOS apps was released, but not the server code. Since its release, the code has been updated; there have been a small number of commits that look to be dumps of development going on elsewhere, which span multiple versions of the app.

The differences between COVIDSafe and OpenTrace concentrate on the users' privacy. For example, users can remove data stored about them from within the app, which is not the case in the Singaporean app.

The licensing of COVIDSafe has conditions that are rather different than those for the typical open-source project. It uses a custom license, which is not a free-software license since it includes the following points:

4. I agree to stop all access and use of the App Code if requested by the DTA [Digital Transformation Authority of the Australian Government].

5. I will not use the App Code for any product development purposes.

COVIDSafe lacks a public bug tracker. But, since it is based on the GPLed OpenTrace, a question has been asked in that bug tracker if there has been a proper relicensing. It is possible that the Singapore government relicensed the OpenTrace code, but there has been no answer to the question as of yet.

The Australian government has been looking into changing the framework used by COVIDSafe to the Exposure Notification framework, which was described in part 1, but at the time of this writing, a decision has not been made.

StopCovid (France)

StopCovid [French] is a French app that was released (with its source code) in early June 2020. During the preparation of the app, numerous professional bodies gave recommendations; it is interesting to note that at least two of them (CNIL, which is a data-privacy organization, and ANSSI, which is a computer-security organization) suggested publication of the app's source code as open source.

The app was the subject of debates in France, especially related to privacy concerns. The main issues are the possibility of finding out the identity of people based on the data collected, such as the identity of infected people or those who may have infected them. Another concern is that this kind of surveillance could become permanent.

The design is a centralized one, with the central server generating the global IDs. The choice was to develop the protocol independently, without the use of the Exposure Notification framework.

The source code is available for the Android and iOS apps, along with some elements of the server software. The Android and iOS apps are released under the Mozilla Public License (MPL), which is a change from an ad hoc restrictive license [French] as is visible in the sources. The server part of the system is released partially using the MPL and partially using a custom restrictive license [French] allowing access for one year but only for testing and scientific evaluation. External contributions are not yet allowed, but they are expected to be accepted in the future.

The developers of StopCovid use pseudonyms. In a discussion about transparency with regard to the pseudonyms, a developer revealed that one of the reasons that pseudonyms are used is because people working on the app have been harassed.

The bug tracker shows activity from the start, including bug reports and fixes by the development team. For example, an issue was reported that, in the case of infection, the app sends the complete contact list, while it should be filtered by the phone to include only those with a higher risk of infection (at least 15 minutes of contact with a distance of less than one meter). The problem was fixed in the following version.

When the user installs the app, they are shown explanations and links to a privacy-related FAQ. They can activate or deactivate the app at any time. There is also an option to remove their data from both the phone and the server directly from the app, but there is no option to show what data is actually stored.

When a user is diagnosed positive, their test result will include a code (QR and alphanumeric) to be used in the app. The user may decide to enter the code in the app — or not. Entering the code will cause transmission of the temporary IDs for all of the close contacts in recent days to the central server. Those other users will be notified by the app and are expected to consult their doctor. They will not know the identity of the infected person. The QR codes are pre-generated and distributed to laboratories, so they also do not link to a specific user.

Another interesting aspect of the StopCovid app is the launch of a bug bounty program [French] to help find more bugs.

The first results from the deployment are lower than expected. The number of activations in the first three weeks was around 1.8 million [French] for a population of around 67 million. During that same time, 68 people signaled themselves as diagnosed positive (and 14 were notified of their risk from exposure), which is a small percentage compared to the daily count of a few hundred positive test results in France during the same period.

Corona-Warn (Germany)

The official German app, Corona-Warn was released in mid-June; the government was criticized for launching it that late. Privacy issues were also raised; there were questions about how many people would actually download it. Since the launch, it has been downloaded [German] around 14.4 million times (as of July 2) for a population of around 83 million.

The difference from the Singaporean and French apps is the use of the Exposure Notification framework and a decentralized approach. The IDs are generated by the user's phone and uploaded only when they get infected. In turn, the central server stores the database of IDs of infected people, which is periodically downloaded by all apps. The app performs a check to see if there is a match and shows a notification to the user if there is one.

All of the source code was released using the Apache-2.0 license on GitHub, and the the project site has a subtitle of "Open Source Project". Corona-Warn accepts external contributions; it has a contributing guide and a code of conduct. Development is fairly active; looking at the recent commits, there have been nearly 400 for the Android app as of July 3, for example. It seems that part of the development is done with a different source tree and then included in a small number of big commits.

The bug trackers are busy, with some issues resulting in long discussions, such as offering the app only to German Google Play accounts. This now-solved problem was preventing the use of the app by people who moved from other countries, or are working and living in different EU states.

Controversies

Covid-19 tracing apps have faced multiple controversies. One big question is their impact on user privacy. Other technical challenges are: the percentage of the population that needs to be using the app to make it work well, the risk of harmful "similar" apps in the app stores, and the accuracy of the distance measurement.

Probably the biggest concern remains the impact on privacy and the risk of the location data being used for other surveillance; there is also the risk of the data leaking and being used maliciously by third parties. There are numerous papers covering the security analysis of the apps, including the DP3T White Paper [PDF] and the ROBERT protocol specification [PDF] (which is used by the French app). We are going to present only main concepts in this article, and the measures taken by the presented apps. Interested readers are invited to check the details in the research papers.

None of the systems broadcast the global IDs, only the temporary ones that change every ten to twenty minutes are sent. The risk of tracing of non-infected and not-at-risk (not having contact with anybody infected) users should be minimal. There is also no direct location data, as all of the apps use Bluetooth simply to measure distance.

There is a difference for the logs of infected people. In both centralized and decentralized systems, the whole log of an infected user is uploaded to single server, so there is a possibility of their identification based on that data. In centralized systems, the server may also have additional information to identify the user (such as a phone number), thus the central server must be trustworthy. There is also the question of whether the database will be deleted after the pandemic. In decentralized systems, the server has no additional information, but the whole database of the IDs of infected people is available on all the devices.

However, another source of identification leakage may be the proximity protocol. Certain protocols (for example OpenTrace) broadcast phone-model information, which might serve for identification purposes, especially in case of less popular models. Identification of users may be also possible if the location data is available. Typically, the protocols do not upload that data. However, an attacker could sniff IDs and store them with the location. This attack is limited by changing the IDs periodically, though, which is done by all of the apps mentioned here.

The risk of harmful apps in the stores is somewhat limited by the Exposure Notification framework, which only allows one app per country that gets a special notification in the app description. For the apps not using the API, the user needs to be aware of the name of the official app and the institution releasing it. The app stores have made some attempts to reduce confusion, but users must be careful to only install the official app for their region.

Another important challenge is the effectiveness of the apps. The currently available numbers are quite low, and the apps typically cover only a small percentage of the population.

Conclusions

Contact-tracing apps have made a lot of progress in just a few months. Some of them are still in development. We do not yet have the data on how effective they are in practice and how many cases will be detected during the pandemic. That data will only be available some time down the road.

From an open-source point of view, it is interesting to see that many of those apps are indeed released using this model and it has been recommended as one of the best practices from a security perspective. The activity of the project is not always particularly visible, however, and the governance of the projects is centralized. The projects with active bug trackers (Corona-Warn, StopCovid) do count on the participation of the users; we might wish that they were developed more in the open, though.

The apps are currently not compatible with each other. The development teams have learned from experiences of the earlier deployments, but the development efforts are still duplicated. Some unification efforts are in progress in Europe, but the results are not visible yet. The Exposure Notification API is the standard framework for recently released apps.

The final impact of contact-tracing apps is far from being known. Fortunately, most of them release source code, and independent experts may analyze them. In the end, it is up to users to decide if they want to use the apps or not, by taking into account the possible risks and benefits.

Comments (10 posted)

The sad, slow-motion death of Do Not Track

July 22, 2020

This article was contributed by Ben Hoyt

"Do Not Track" (DNT) is a simple HTTP header that a browser can send to signal to a web site that the user does not want to be tracked. The DNT header had a promising start and the support of major browsers almost a decade ago. Most web browsers still support sending it, but in 2020 it is almost useless because the vast majority of web sites ignore it. Advertising companies, in particular, argued that its legal status was unclear, and that it was difficult to determine how to interpret the header. There have been some relatively recent attempts at legislation to enforce honoring the DNT header, but those efforts do not appear to be going anywhere. In comparison, the European Union's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) attempt to solve some of the same problems as DNT but are legally enforceable.

In 2007, the US Federal Trade Commission was asked [PDF] to create a "Do Not Track" list, similar to the popular "Do Not Call" list. This would have been a list of advertiser domain names that tracked consumer behavior online, and would allow browsers to prevent requests to those sites if the user opted in. However, that approach never got off the ground, and DNT first appeared as a header in 2009, when security researchers Christopher Soghoian, Sid Stamm, and Dan Kaminsky got together to create a prototype. In his 2011 article on the history of DNT, Soghoian wrote:

In July of 2009, I decided to try and solve this problem. My friend and research collaborator Sid Stamm helped me to put together a prototype Firefox add-on that added two headers to outgoing HTTP requests:

X-Behavioral-Ad-Opt-Out: 1
X-Do-Not-Track: 1

The reason I opted for two headers was that many advertising firms' opt outs only stop their use of behavioral data to customize advertising. That is, even after you opt out, they continue to track you.

At some point, Soghoian said, "the Behavioral Advertising Opt Out header seems to have been discarded, and instead, focus has shifted to a single header to communicate a user's preference to not be tracked". The final format of the header is literally "DNT: 1".

Even back when Soghoian wrote that article, it was clear that getting advertisers to respect the header wasn't going to be easy:

The technology behind implementing the Do Not Track header is trivially easy - it took Sid Stamm just a few minutes to whip up the first prototype. The far more complex problem relates to the policy questions of what advertising networks do when they receive the header. This is something that is very much still up in the air (particularly since no ad network has agreed to look for or respect the header).

Part of the problem was defining what "tracking" means in this context. The Electronic Frontier Foundation (EFF), which has been involved in DNT efforts from the beginning, defines it as "the retention of information that can be used to connect records of a person's actions or reading habits across space, cyberspace, or time". The EFF's article also lists certain exceptions that are not considered tracking, which notably allows for "analytics providers". The article is also careful to distinguish between tracking by a first-party ("the website you can see in your browser's address bar"), which is allowed, and tracking by a third-party (other domains), which is not.

Starting with Mozilla Firefox in January 2011, browsers began to implement the "trivially easy" part, allowing users to opt into sending the new header. Microsoft followed soon after, adding DNT support to Internet Explorer 9 in March 2011. Apple followed suit with Safari in April 2011. Google was a little late to the game, but added support to Chrome in November 2012.

In September 2011 a W3C "Tracking Protection Working Group" was formed "to improve user privacy and user control by defining mechanisms for expressing user preferences around Web tracking and for blocking or allowing Web tracking elements". During its eight active years, the group published a specification of the DNT header as well as a set of practices about what compliance for DNT means. Unfortunately, in January 2019 the working group was closed with this notice:

Since its last publication as a Candidate Recommendation, there has not been sufficient deployment of these extensions (as defined) to justify further advancement, nor have there been indications of planned support among user agents, third parties, and the ecosystem at large. The working group has therefore decided to conclude its work and republish the final product as this Note, with any future addendums to be published separately.

As early as 2012, LWN wrote about how it wasn't looking good for DNT: advertising groups were pushing back (unsurprisingly), and there was no legal definition of how the header should be interpreted. In addition, Microsoft's decision in May 2012 to enable the header by default in Internet Explorer 10 backfired, as DNT had always been intended to indicate a deliberate choice made by the consumer. Roy Fielding even committed a change to unset the DNT header in the Apache web server if the request was coming from Internet Explorer 10 — possibly setting a record for the number of comments on a GitHub commit. Even though Microsoft finally removed this default in April 2015, it's likely that this well-intentioned move muddied the DNT waters.

A few high-profile web sites did honor Do Not Track, including Reddit, Twitter, Medium, and Pinterest. Tellingly, however, as of today two of those sites now ignore the header: Reddit's privacy policy now states that "there is no accepted standard for how a website should respond to this signal, and we do not take any action in response to this signal", and Twitter notes that it discontinued support (as of May 2017) because "an industry-standard approach to Do Not Track did not materialize". At present, Medium and Pinterest still act on the header.

Apple's Safari was the first major browser to lose support for "the expired Do Not Track standard" — it was removed from Safari in March 2019. Ironically, Apple's stated reason for removing it was to "prevent potential use as a fingerprinting variable". Tracking systems often use a fingerprint of a user's HTTP headers to help track them across different websites, and the DNT: 1 header — given its low use — adds uniqueness to the user's headers that may actually make them easier to track.

Since then, Apple has been steadily rolling out what it calls "Intelligent Tracking Prevention", which is an approach that prevents the use of third-party cookies after a certain time window and helps avoid tracking via query-string parameters ("link decoration"). Mozilla added similar protections from third-party cookies to Firefox in September 2019. Microsoft included tracking prevention in the new Chromium-based version of its Edge browser, released in January 2020. Even Google, where much of its revenue comes from advertising (and indirectly, tracking), announced its own plans to phase out support for third-party cookies in Chrome over the next two years.

In May 2014, LWN wrote about Privacy Badger, "a browser add-on that stops advertisers and other third-party trackers from secretly tracking where you go and what pages you look at on the web". Privacy Badger enables the DNT header and blocks requests to third-party sites that it believes are likely to track a user (which, not surprisingly, happens to block most ads). One of the goals of Privacy Badger is to goad advertising companies to actually respect the header. If Privacy Badger sees that a domain respects DNT by publishing the DNT compliance policy to company-domain.com/.well-known/dnt-policy.txt, it will stop blocking that domain. This sounds like a great idea for users, but it just doesn't seem to have taken off with advertisers.

One recent attempt to revitalize the DNT header is by DuckDuckGo, which is a company that builds privacy-oriented internet tools (including a search engine that "doesn't track you"). It found (in November 2018) that, despite web sites mostly ignoring the header, DNT was enabled by approximately 23% of adults in the US. In May 2019 DuckDuckGo published draft legislation titled "The Do-Not-Track Act of 2019 [PDF]" which it hopes will "put teeth behind this widely used browser setting by making a law that would align with current consumer expectations and empower people to more easily regain control of their online privacy". The company's proposal would require web sites to honor the DNT header by preventing third-party tracking and only using first-party tracking in ways "the user expects". For example, a site could show a user the local weather forecast, but not sell or share the user's location data to third parties.

Unfortunately, in the year since DuckDuckGo published the proposal, nothing further seems to have come of it. However, around the same time, US senator Josh Hawley, supported by senators Dianne Feinstein and Mark Warner, introduced a similar Do Not Track Act that was "referred to the Committee on Commerce, Science, and Transportation". There has not been any activity on this bill in the last year, so it seems there is little chance of it going further.

In June 2018, the W3C working group published an article comparing DNT with the GDPR. The GDPR requires a web site to get a user's consent before tracking them and, unlike DNT, that is enforceable by law. Similarly, the recent CCPA legislation is enforceable, but it only applies to businesses operating in the state of California, and only to the "sale" of personal information. As law firm Davis Wright Tremaine LLP noted, the CCPA waters are almost as muddy as those of DNT: "we do not yet have clarity under the CCPA, however, regarding which tracking activities (e.g., tracking for analytics, tracking to serve targeted ads, etc.) would be considered 'sales'". One possible way forward is to generalize efforts like the GDPR and CCPA rather than trying to give DNT a new lease on life.

It looks as though, after a decade-long ride with a lot of bumps, the Do Not Track header never quite got enough traction with the right people to reach its destination. It is still possible that one of the political efforts will go somewhere, but it seems less and less likely. Similar to how most of us deal with email spam, we may have to rely on technological solutions to filter out tracking requests, such as Privacy Badger and DuckDuckGo's browser extensions or the various browsers' "intelligent tracking prevention" schemes.

Comments (70 posted)

New features in gnuplot 5.4

July 22, 2020

This article was contributed by Lee Phillips

Gnuplot 5.4 has been released, three years after the last major release of the free-software graphing program. In this article we will take a look at five major new capabilities in gnuplot. First, we briefly visit voxel plotting, for visualizing 3D data. Since this is a big subject and the most significant addition to the program, we'll save the details for a subsequent article. Next, we learn about plotting polygons in 3D, another completely new gnuplot feature. After that, we'll get caught up briefly in spider plots, using them to display some recent COVID-19 infection data. Then we'll see an example of how to use pixmaps, a new feature allowing for the embedding of pictures alongside curves or surfaces. Finally, we'll look at some more COVID-19 data using the new 3D bar chart.

A full accounting of all of the improvements and bug fixes in 5.4 can be found in the release notes. More gnuplot history can be found in our May 2017 article on the soon-to-be-released gnuplot version 5.2, which described its new features, some of which have been expanded in 5.4.

Gnuplot's staying power

Gnuplot is an early free-software success story. It was the first widely used open-source graphing utility, and became the tool of choice for people performing Fortran simulations of oceans and atomic bombs, for example. As a standalone, compiled C program, it endures as a workhorse of technical graphics in large part due to its speed and stability when confronted with enormous data sets.

Its usual means of control is through an interactive prompt at the terminal, or by executing scripts written in its scripting language. Gnuplot's primary niches are creating visualizations for scientific and other technical publications, and displaying live graphs of streamed data from sensors, simulations, or server statistics.

Gnuplot is language-independent, in contrast to packages implemented as libraries for a specific programming language. Some see the necessity to learn gnuplot's scripting language as a hindrance, but, in my experience, it is no more onerous than learning the de facto domain-specific language defined by the interface to a plotting library. Gnuplot can be controlled through a socket, and can plot data from a pipe or FIFO; this makes it simple to use with any programming language. To make this even more convenient, most popular languages have gnuplot interfaces; for example, Gaston for Julia or Gnuplot.py for Python.

A major factor in gnuplot's popularity with scientists is its interoperability with TeX and LaTeX, automating the creation of sophisticated documents with plots and text that share typefaces. Gnuplot is extensively customizable, and makes simple things easy while making complex things manageable, such as creating composite illustrations by aligning a set of graphs, or doing such arcane things as embedding a vector field in a surface (but you still can't make a pie chart—unless you get slightly creative). Finally, gnuplot can produce this output on a wide variety of output devices, or in a Jupyter notebook using the gnuplot kernel.

Voxel plots

The standout new feature arriving in gnuplot 5.4 is voxel plotting. The idea of an image, such as a photograph, being composed of a 2D array of pixels is familiar. Each pixel has an x and a y coordinate, and a color. The extension of the concept of the pixel to the third dimension is the voxel; in fact, the word is short for "volume pixel".

But what do voxels have to do with the plotting and visualization of functions and data?

Up to now, all of the plot types in gnuplot have handled functions of one or two variables. In the case of two variables, we had a choice of surface, contour, or image plots (sometimes called "heat maps"). The introduction of voxels allows us, for the first time, to visualize functions of three coordinates: x, y, and z. Of course, until gnuplot gets a holographic output terminal, we are still confined to the surface of screens or paper, so the final result will be a perspective rendering of some aspect of the 3D data or function.

Voxel data sets are familiar in medical imaging, where they are used to display the results of MRIs or CAT scans, and in engineering or physics, where they are helpful in understanding such things as the 3D flow pattern around a propeller. Gnuplot provides an assortment of techniques for plotting voxel data.

In a future article, we'll go into these techniques in depth, with a variety of examples. For now, we'll display one sample of a voxel plot, motivated by an example from physics: the potential field ("voltage") due to a dipole, which is two opposite charges fixed in place. This is a model for such things as a water molecule, when you are reasonably far away from it.

If the two charges are placed on the vertical (z) axis, they give rise to the 3D potential field visualized in the following figure.

In our followup article we'll see the various other ways in which we can visualize this same data. The complete script that created this plot can be found in the Appendix.

Polygons

The splot command gets a new ability in gnuplot 5.4: it can now plot sets of closed 2D polygons positioned in a 3D space. It can do so in two distinct ways, to produce different effects.

In the first way, a list of vertex positions, defining a set of polygons, is plotted with a new command:

    splot <$vertices> with polygons

This colors all of the polygons identically, so either transparency or lighting is required to create a usable rendering.

The second way is more versatile, because it allows different colors and opacities to be applied to each polygon. This method uses gnuplot's new polygon object type. Instead of a block of coordinates, we define a list of objects. These six polygon objects are arranged to form a box with one side tilted open:

    set style fill transparent solid 0.8
    set obj 1 polygon from 0,0,0 to 1,0,0 to 1,1,0 to 0,1,0 to 0,0,0\
	depthorder fillcolor "blue"
    set obj 2 polygon from 0,0,0 to 0,0,1 to 1,0,1 to 1,0,0 to 0,0,0\
	depthorder fillcolor "#AAAA00"
    set obj 3 polygon from 1,0,0 to 1,1,0 to 1,1,1 to 1,0,1 to 1,0,0\
	depthorder fillcolor "#33AAAA"
    set obj 4 polygon from 0,0,0 to 0,1,0 to 0,1,1 to 0,0,1 to 0,0,0\
	depthorder fillcolor "#CC0066"
    set obj 5 polygon from 0,1,0 to 0,1,1 to 1,1,1 to 1,1,0 to 0,1,0\
	depthorder fillcolor "#33FF66"
    set obj 6 polygon from 0,0,1 to 0,1,1 to 1,1,1.5 to 1,0,1.5 to 0,0,1\
	depthorder fillcolor "#AAAAAA"

The first line tells gnuplot to fill objects with a solid color of opacity 0.8, which is slightly transparent. In the subsequent set object commands, depthorder ensures that the parts of the polygon farther from the "eye" will be drawn before the closer parts, so that the rendering will look right. Each command also defines the vertices of a rectangle and its color.

After defining these objects, they will be drawn to accompany any splot command until they are undefined. One of gnuplot's quirks is that there is no way to just plot polygons nor any other objects directly. They are intended to be plotted along with a curve or surface. So in a case like this, where we really just want the polygons and have no surface to plot, we have to accommodate the program by using the splot command, but without actually drawing a surface. One way to do this is to plot a surface that exists entirely outside the range of the axes:

    splot -1

That will produce the picture below.

Spider plots

A spider plot (also known by several other names including radar chart) is another new graph type in gnuplot. Spider plots are intended to visualize multivariate data, so they have a handful of axes, each one representing a different variable. In this sense they are similar to parallel axis plots, which first became available in gnuplot 5.2, and were described in our previous article. The difference is that, in a spider plot, the axes all intersect at a common point, rather than being parallel. Although their use is controversial in some quarters, they can create interesting illustrations, and are an easy way to construct certain types of diagrams.

Those who have used parallel axis plots in gnuplot should be aware that this version changes the syntax; existing scripts will break, which is regrettable.

To illustrate the kind of thing that spider plots can be applied to, I downloaded some recent COVID-19 data and extracted the numbers for the confirmed case rates for six countries on June 12 and July 12, arranging the data like this:

    Italy, United States, Honduras, France, Canada, Switzerland
    3905.638, 6112.782, 774.286, 2383.218, 2583.822, 3577.396
    4016.203, 9811.656, 2784.865, 2615.946, 2843.902, 3779.832

Those numbers are the positive cases per one million people on the two different dates. Here is a complete script, for making the plot below (line numbers have been added to facilitate the explanation):

    (1) set title "COVID cases per million, 12Jun and 12Jul 2020\n" font "Times,16"
    (2) set spiderplot
    (3) set datafile separator comma
    (4) set for [p=1:6] paxis p range [0:10000]
    (5) set for [p=1:6] paxis p tics format ""
    (6) set paxis 4 tics 2000 font ",8" format "%g"
    (7) set style spiderplot fillstyle transparent solid 0.3 border\
            linewidth 1 pointtype 6 pointsize 1.2
    (8) set grid spider linetype -1 linecolor "grey" lw 1
    (9) plot for [i=1:6] "spidey.dat" using i title columnhead

Line (2) is required to make a spiderplot. Line (3) says that the data is separated by commas. The next two lines set up six axes, with their ranges and tics. We only want numbers on one of the axes, which we turn on with line (6). Line (7) causes the fillstyle of the polygons formed by the data to have a solid, transparent color, bounds them with a border, and asks for open circles (pointtype 6) with a pointsize of 1.2. Line (8) draws the grid, which is the set of grey lines that are somewhat evocative of a spiderweb; linetype -1 is a solid line. The last line draws the plot.

The looping construct (for [i=1:6]) is now required for parallel axis or spider plots. The file is stored on disk with the name "spidey.dat", and the loop combined with using i just means to loop through each of the six numbers on each line, creating a new polygon from each line. If we wanted to skip some numbers, we could alter the loop here. The words "title columnhead" say to take the first line of the file and use the words there for axis labels.

In the plot, the data for June 12 is shown in purplish, and the greenish area shows the data a month later. One can immediately see that four of the countries experienced a very small growth in the number of cases, whereas the US and Honduras show a much faster growth. Finally, the plot makes it clear that the confirmed case percentage is much larger in the US than in the other countries shown.

Pixmaps

A new concept in gnuplot 5.4, the pixmap, is an image object that is placed at a fixed location in a 2D or 3D plotting space. Pixmaps can be used for logos, backgrounds, or as illustrative labels attached to specific points on curves or surfaces.

Suppose a planetary scientist has a model of some property of planets in the solar system, described by the function f(), depending on two variables. A plot of f() would be a surface. It might be effective to label various locations on this surface with pictures of the planets that those locations correspond to. This is possible using pixmaps. If there is an icon of Saturn on disk, and if Saturn has the properties -7 and -9 for the two variables in the model, then this command will establish a pixmap object placing the Saturn image on the surface at the location for those two variables:

    set pixmap 2 "saturn.png" at -7, -9, f(-7, -9) width screen 0.06

The clause set pixmap 2 gives the index 2 to the object; this is used to unset or redefine it later if required. The words "width screen 0.06" set the width of the pixmap to be 0.06 of the total width of the graph; some trial and error was needed to find a good size here.

After all the desired pixmaps are defined using similar commands, using splot on the function draws the surface along with all of the planets. The complete script that produces the following graph is in the Appendix.

3D bar charts

Gnuplot has had bar charts for a while. The new release takes these into the third dimension. Now you can set the boxdepth as well as the boxwidth of your columns of data. The commands for 3D bar charts, or box plots, as they are known in gnuplot, are a pretty straightforward extension of the 2D version. To plot 2D histograms or bar charts, the plot command takes a series of horizontal coordinates and values, and the clause with boxes. To make a 3D bar chart, we use the 3D version of plot, splot, which originally meant "surface plot", supplying x, y, and a value for each bar.

This can be a good visualization of discrete data that depends on two variables. As an example, we'll take the COVID-19 data that we used for our spider plot and add a few more months, to look at everything from near the beginning of the pandemic in March until close to the present. Here's the data, extracted from the same source as before:

    Italy, United States, Honduras, France, Canada, Switzerland
    2.729, 34.945, 0.202, 206.114, 74.18, 3.964, "3-12"
    617.373, 1436.877, 39.679, 2518.465, 2867.833, 1601.048, "4-12"
    1854.187, 2137.452, 202.532, 3635.583, 3496.515, 4072.221, "5-12"
    3905.638, 6112.782, 774.286, 2383.218, 2583.822, 3577.396, "6-12"
    4016.203, 9811.656, 2784.865, 2615.946, 2843.902, 3779.832, "7-12"

Now each row has a label for the date appended, which we can use to create tic labels. The complete script that produces the following plot is in the Appendix. We have arranged things so that each country gets its own color, and the heights of the bars indicate the confirmed case rate for each country and month.

Many improvements

Here we've only covered the major new features, but there are many additional improvements in 5.4, mentioned in the release notes and the official manual [2.1MB PDF]. The extensive interactive help available within gnuplot has also been thoroughly revised in the new version to cover all the new features. One of the new enhancements is automatic use of 64-bit integer arithmetic when running on a system that supports it. Most of gnuplot's internal calculations are done in floating point, but when using 32-bit integers, some operations, such as factorials or exponentiation, would return floats even when supplied with integer arguments if the results would overflow 32-bit integers. Now those functions return integer results when given integer arguments. Other new features are the ability to project graphs onto any of the coordinate planes, more ways to render surfaces and contour lines, improved array syntax, support for the LaTeX pict2e picture environment, and more Bessel functions, which is always nice.

Gnuplot is not everyone's cup of tea. But because of its combination of power, scriptability, and flexibility, it occupies a unique niche in the world of technical and scientific graphics. Those who become conversant with its sometimes quirky ways are regularly rewarded by a relentless process of improvement that stretches back to the 1980s and shows no signs of slowing down.

Comments (7 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>