|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for November 5, 2020

Welcome to the LWN.net Weekly Edition for November 5, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A Matrix overview

By Jake Edge
November 4, 2020

OSS EU

At this year's (virtual) Open Source Summit Europe, Oleg Fiksel gave an overview talk on the Matrix decentralized, secure communication network project. Matrix has been seeing increasing adoption recently, he said, including by governments (beyond France, which we already reported on in an article on a FOSDEM 2019 talk) and other organizations. It also aims to bridge all of the different chat mechanisms that people are using in order to provide a unified interface for all of them.

Fiksel is a former security consultant and a longtime member of the Matrix project. His opening slide (slides [PDF]) was an altered version of the xkcd "Why SMS refuses to die" Venn diagram, with "Matrix!" placed at the intersection of the three sets and "How we view the future :)" as the caption. It ably captured one of the main thrusts of his talk and can be seen in the screen shot below.

[Oleg Fiksel]

He began by talking about all of the different chat networks that exist, which, for the most part, are isolated from each other. One idea behind Matrix is to unite all of these networks, but that does not mean that it is just a giant bridge. Matrix is a network in its own right as well. The "difference is that Matrix is friendly to other networks", he said.

He put up a sentence that describes Matrix well: "Matrix is an open network for secure, decentralised real-time communication." It is open in the sense of having an open specification and open implementation. "Secure" means that it has end-to-end encryption by default; it is decentralized because "anyone can run their own Matrix server". These days, real-time communication generally refers to chat rooms, he said, but Matrix can be used for other types of communication, such as for the Internet of Things (IoT).

Key features

Fiksel then listed some of the key features of Matrix from his perspective. The open specification for its protocols along with an open reference implementation means that anyone can write their own server or client—and many have done so. In addition, Matrix has a distributed, federated architecture, which means that messages are not stored in a single central location or owned by a single party. The messages are spread throughout the network and each server owner can decide which other servers to connect with and federate their data with.

From an architectural point of view, it is important that Matrix was built with end-to-end encryption from the ground up, rather than bolting it on later. It also allows Matrix to support encrypted voice and video calls using WebRTC.

The "most unique feature of Matrix" for him is its ability to provide bridges that break the boundaries between networks. That means users do not have to install lots of different clients in order to connect with people on all of the disparate networks. There are also integration widgets to combine Matrix with other tools (e.g. Etherpad).

Last, but not least, he said, the project has a "healthy and friendly community", which is "very important" to him.

Encryption details

In his opinion, any encryption that is not implemented as open-source software is questionable. With closed-source encryption, you have to trust the word of a company or project that they are encrypting the data, but with Matrix you can study the code that implements it.

Encryption is such an important foundation for Matrix that the project took a lot of time to build that support in. It took "much more time" to build the user interface for the encryption, he said; the project looked at all of the different use cases so that it could make the encryption easy to use. Those who are working on encryption should take note that building the crypto code is one thing, but "making it usable for people and easy to use is a bit of a different job".

The Matrix end-to-end encryption implementation has been used for around three years at this point. There are two libraries, Olm, which is based on the cryptographic "double ratchet" specified by the Signal secure-messaging application project, and Megolm, which is a different ratchet used for group messaging. A review of the cryptography in Olm and Megolm was done in 2016. A few problems were found and fixed at that time; Fiksel encouraged security experts to review the code and provide feedback as well.

Architecture

He then described the architecture for Matrix. It starts with users that have clients that talk to a home server using a client-server API. These users will have accounts on one or more home servers. The home servers talk with other home servers to exchange messages using a server-server API. There are also identity servers that can either be centralized (e.g. at matrix.org) or run by various groups to, say, federate with a company's existing identity server. Discovery of home and identity servers is done using DNS records; home servers can also be discovered using a well-known URI scheme.

Application servers are another entity in Matrix; they connect to home servers using an application service API. These servers have more privileges than the clients have. They are mostly used for bridges or integration managers.

Home servers use the server-server API to synchronize messages and room state between them. The API can also be used to request older messages so that a server that has been offline for a day or two can catch up on ongoing conversations. Servers can query for the profiles and status of users on other servers as well.

Multiple home server implementations are available, starting with Synapse, which is the reference implementation written in Python using the Twisted networking framework. Synapse is stable and is used on matrix.org. Dendrite is a Go implementation that is now in beta; it is aimed at being highly performant and scalable. Conduit is written in Rust; it is in an alpha state and lacks federation support, but it is "blazingly fast". A C++ server, Construct, is also performance-oriented. There are others, which are listed in the Matrix directory of tools.

The client-server API was developed to be "as user-friendly as possible", Fiksel said, which makes it easy to create a client. For example, you can use curl to send a message to a room from the command line. There are also "fancy clients", including Element, which is the reference client. Element implements all of the different features that Matrix has and is available as a web application, a desktop application using Electron, and for mobile devices using Android or iOS. Other clients for desktops and mobile devices exist, with different feature sets and maturity levels, which are listed in the directory as well. Terminal-based clients are available for those who prefer that kind of interaction.

An application service has privileged access to its home server, which it needs to transport messages from one network to another if it is performing a bridging function, for example. It can subscribe to server traffic to get access to the events that are occurring in the chat rooms. It also needs to be able to act as a virtual user that will make users on other networks seamlessly appear in a Matrix chat. Users on, say, Telegram or WhatsApp, will appear just like any other Matrix user, he said.

Application services can be used for integrations with other tools. For example, an Etherpad can be added to a chat room by using an integration; users in the room will all be able to see and interact with the Etherpad. There are bots of various kinds that are implemented as integrations; for example, a bot could monitor an RSS feed and post new messages to the room. Another example would be a room for a group that is monitoring a set of servers; it could have Grafana graphs displayed for all participants to discuss. Integration with Jitsi video-conferencing is also possible. Custom integrations can be written as well, of course.

Bridges are application services that sit between a home server and another chat network. On one side they use whatever API is required for the other network, while they use the application services and client-server APIs to the home server. There are two types of bridges: puppeted and bridgebot. A puppeted bridge will present its Matrix messages as native messages in the other network and vice versa. This provides a seamless experience on both sides, so that Telegram users, for example, cannot distinguish Matrix users from other, actual Telegram users.

For various reasons it may not be possible to have a puppeted bridge for a particular network. For example, the network may not support it or the user may not have an account on the other network. In those cases, the system can fall back to a bridgebot type of bridge, which means that a single bot user relays the traffic. To a user of the other network, the messages will all come from a single bridge user, though individual messages in the chat will be separated and prefaced with the Matrix user's name. It would seem to be a kind of clunky way to work around restrictions in some "walled garden" chat networks.

Adoption of Matrix has been growing rapidly over the last few years, he said. Beyond the adoption by the French government, there are three other governments that are running pilot studies. Multiple open-source communities have adopted it, with Mozilla being the most prominent. Lots of universities and virtual conferences are using it; there are also many official Matrix channels for various projects that are available to clients from matrix.org.

Getting started

Fiksel presented some steps to get up and running on Matrix. The first is to choose a home server. There are various lists of public home servers or you can use Element with an account at matrix.org. Using matrix.org should probably be a last resort, he said, as it is fairly overloaded at this point. You can pay to host a home server at a specialized Matrix provider or, of course, host a server yourself; there are guides, an Ansible playbook, as well other efforts to automate the process using Kubernetes Helm charts.

Next up is to choose a client from a list of Matrix clients and connect it to the home server. As he did several times in the talk, he strongly encouraged attendees to get involved with the project. He also noted that Matrix and the Gitter chat network have announced that they are joining forces, which further expands the Matrix community. As mentioned, that community is welcoming and friendly, so interested folks should get involved.

Fiksel spent the last part of his talk on an extended pre-recorded demonstration of some of the bridging features in Matrix. He used the Ansible playbook to deploy a home server, then connected to it using Element and the terminal-based gomuks client. He showed bridging Matrix to both Telegram and IRC, with messages flowing in both directions. For those who find themselves with too many different chat clients, Matrix seems like it could be a welcome alternative. In addition, a decentralized end-to-end encrypted communication system that is not under the thumb of some corporation or other organization has some attractions of its own.

Comments (32 posted)

Packaging Kubernetes for Debian

By Jonathan Corbet
October 30, 2020
Linux distributors are in the business of integrating software from multiple sources, packaging the result, and making it available to their users. It has long been true that some projects are easier to package than others. The Debian technical committee (TC) is currently being asked to make a decision in a dispute over how an especially hard-to-package project — Kubernetes — should be handled. Regardless of the eventual outcome, this disagreement clearly shows how the packaging model used by Linux distributors is increasingly mismatched to how software is often developed in the 2020s; what should replace that model is rather less clear, though.

A longstanding rule followed by most distributors is that there should be only one copy of any given library (or other dependency) in the system, and that said copy should usually be in its own package. To do otherwise would bloat the system and complicate the task of keeping things secure. As an extreme example, consider what would happen if every program carried its own copy of the C library in its package. Those thousands of copies would consume vast amounts of both storage space and memory. If a security vulnerability were found in that library, thousands of packages would have to be updated to fix it everywhere. A single library package shared by all users, instead, is more efficient and far easier to maintain.

This rule is thus contrary to the practice of stuffing dependent libraries into the package of a program that needs them — a practice often called "vendoring". Living up to this rule can be challenging, though, with many modern projects, which also often engage in a fair amount of vendoring. Projects written in certain languages appear to be especially prone to this sort of behavior; the Go language, for example, seems to encourage vendoring.

Kubernetes is written in Go, and it carries a long list of dependencies with it. It was maintained in Debian for a while by Dmitry Smirnov, but he orphaned Kubernetes in 2018, stating that packaging it is "a full time job, probably for more than one person". The Kubernetes package was eventually picked up by Janos Lenart, who has been supplying updated versions to the Debian Testing repository.

Kubernetes vendoring considered harmful

Back in March, though, Smirnov made it clear that he was far from happy with how Lenart has approached the task of packaging Kubernetes. Rather than work to build Kubernetes with independently packaged libraries in the Debian repository, Lenart has chosen to vendor those libraries into the Kubernetes package directly. The Kubernetes 1.19.3 package contains over 200 of these libraries; the directory of applicable licenses alone contains 3MB of text. A README file added by Lenart notes that this approach may not suit everybody:

However, I kindly ask purist aspirations that effectively halted Kubernetes' release and updates in Debian for YEARS to be kept at bay. I wholeheartedly agree that in an ideal world all the 200+ Go packages in vendor/ would be packaged separately in Debian, all of them following the excellent semantic versioning perfectly. It would also be awesome if there was a robust and meaningful(!) way to link Go binaries dynamically. That being said, I feel that the most important step at the moment is to have Kubernetes available in Debian instead of postponing until that perfect world arrives.

Smirnov denied being a purist, but was clearly upset about what had been done to the package he once maintained. It is, in his mind, a violation of Debian's policies. What, he asked, can be done in a situation like this?

The resulting discussion was lengthy and often heated, as one might expect. This being Debian, the developers devoted a long subthread to the question of whether Debian developers really have to verify the licenses for every vendored dependency (there was no definitive answer to that question). The reasons behind Debian's policies and the degree to which they make sense when applied to a project like Kubernetes were explored, also without any real conclusions.

Lenart posted exactly one message to the thread, defending the changes to how Kubernetes is packaged. There are other packages in Debian with vendored dependencies, though none, he acknowledged, have anywhere near the 200 found in his Kubernetes package. Independently packaging hundreds of dependencies is not feasible, he said; Smirnov's attempts to do so has a lot to do with why most Kubernetes releases never made it into Debian. Even if that effort were to succeed, Debian's package would not use the versions of the libraries tested by the Kubernetes developers and would thus essentially be a fork that "no sane cluster admin would dare to use". With that many separate libraries, it would never be possible to get security updates out in a timely manner. Go binaries are statically linked, so the resource-consumption benefits of shared libraries are not available in any case. And so on.

Smirnov, unsurprisingly, was not impressed with this list of justifications, and put some effort into casting Lenart as being too inexperienced to manage a package like Kubernetes. Many others argued for or against specific points until the conversation eventually wound down with nobody seemingly having budged from their initial positions.

To the technical committee

The topic then went quiet — on the public lists, at least — until the beginning of October, when Smirnov took the issue to the TC for resolution. The Debian TC exists to make decisions on technical disputes that Debian developers are unable to resolve on their own; it was this committee, for example, that finally answered the question of whether Debian would move to systemd or not. Now the TC is being asked to decide whether the level of vendoring seen in the Kubernetes package is acceptable.

There has been little public discussion since this request was filed, but a couple of interesting things have come out anyway. One was this message from Shengjing Zhu noting that Kubernetes, too, is a library that is depended upon by other packages. But Kubernetes is not packaged in a way that allows others to use it; doing so, Zhu said, would require decoupling all of its own vendored dependencies. Without that, every package that needs the Kubernetes library must vendor its own copy of Kubernetes, which does not seem like a rational path.

As part of the TC's deliberation, Sean Whitton asked the Debian security team about the security implications of that level of vendoring. Since security is one of the primary arguments against vendoring, one might expect the security team to dislike the idea; the actual response from Moritz Mühlenhoff was somewhat more nuanced than that. Supporting Kubernetes in a stable release is difficult in the best of situations, he said, because upstream only supports specific releases for one year, "and it would be presumptuous to pretend that we can seriously commit to fix security issues in k8s for longer than upstream". Given that, there are two options that Debian could consider for this package.

The first of those options would be to just not ship Kubernetes in a Debian stable release at all. Debian users would then obtain it either from the Testing repository (which does not receive security support) or from outside of Debian entirely. The alternative is to just update Kubernetes wholesale whenever a security problem is disclosed and upstream is no longer supporting the version shipped by Debian. That is an unusual practice for Debian, he allowed, but Kubernetes users are already used to it.

Crucially, he said that if Debian ships Kubernetes in a stable release (and thus goes with the second option above), vendoring the dependencies as is being done currently is the only realistic option. Otherwise, the chances of a newer Kubernetes release working with the older versions of its dependencies shipped by Debian are small at best. Rather than impeding the security effort in this case, vendored dependencies appear to be the only way that the Debian security team could support Kubernetes at all.

In the end, the options listed by Mühlenhoff are probably the only ones available to the TC. The committee could try to mandate that the Kubernetes package be managed like others, with few (if any) vendored dependencies, but it has no authority to order any developer to actually do the work to make that happen. So such a mandate is highly likely to be equivalent to saying that Debian does not ship Kubernetes at all.

Not just Kubernetes

The TC has not given any indication of when it will make a decision on this issue. Regardless of the outcome, though, this issue is one that is likely to come up again. There is a small but growing set of free-software projects that are simply too unwieldy for most distributors to handle on their own. Beyond Kubernetes, web browsers clearly fall into this category. Distributors have generally given up on trying to backport patches to older browser releases; they just move their users forward to new releases when they happen. The resources to do things any other way just do not exist.

The kernel might in some ways be the original example of this kind of package, but with some interesting differences. The kernel, too, is a huge and fast-moving project; most distributors have no hope of trying to maintain an older release on their own. The distributors that do maintain such versions — in "enterprise" distributions usually — dedicate massive resources to keeping those kernels working and secure. Others depend heavily on the fact that the kernel project itself is now maintaining releases for several years; the 4.4 kernel has received 241 updates (at last count) with 16,422 patches. Debian is an interesting exception in that it does maintain old kernels for a long time, but that support, too, benefits from the kernel's long-term support work. In the absence of that support, most distributors would have to choose between not even pretending to keep their kernels maintained (a favorite choice of embedded vendors) or upgrading users to current releases.

The kernel, at least, is self-contained; most projects of any size accumulate dependencies quickly, and many current programming environments encourage tying dependencies to specific versions of libraries — through a relative lack of concern about ABI compatibility if nothing else. Such applications will be painful to package; Kristoffer Grönlund's 2017 linux.conf.au talk on the subject is still highly relevant.

In other words, the Linux distribution model that was first worked out in the 1990s is increasingly unsuited to the way software is developed in the 2020s. Distributors understand that and are investigating ways to stay relevant, including new package-management techniques, immutable distributions, and more. Preserving the best of what distributions have to offer while taking advantage of the best of what the software-development community has to offer will prove challenging for some time. It is, as some might put it, a high-quality problem to have, but doesn't make it easy to solve.

Comments (85 posted)

Relief for insomniac tracepoints

By Jonathan Corbet
October 29, 2020
The kernel's tracing infrastructure is designed to be fast and to interfere as little as possible with the normal operation of the system. One consequence of this requirement is that the code that runs when a tracepoint is hit cannot sleep; otherwise execution of the tracepoint could add an arbitrary delay to the execution of the real work the kernel should be doing. There are times, though, that the ability to sleep within a tracepoint would be handy, delays notwithstanding. The sleepable tracepoints patch set from Michael Jeanson sets the stage to make it possible for (some) tracepoint handlers to take a nap while performing their tasks — but stops short of completing the job for now.

Within the kernel, the tracing machinery has no need to sleep; its task is normally to package up the data associated with a given tracepoint and place the result into a ring buffer for transport to user space. This work can be accomplished without the need to wait for any outside events. The use cases driving the push for sleepable tracepoints thus must come from elsewhere — from BPF programs attached to tracepoints by user space, in particular. These programs are currently limited to accessing data in kernel space, which can always be done without the need to sleep. There would be value, though, in the ability to look at user-space data in a tracepoint handler as well. This data is not guaranteed to be resident in RAM when the handler tries to access it; should it not be present, a page fault will result. Handling page faults can take an arbitrary amount of time, during which the faulting process must be put to sleep.

In current kernels, this possibility prevents access to user-space data from tracepoint handlers. Specifically, it means that tracers cannot dereference pointers passed from user space. Thus, for example, a tracepoint running on entry to the openat2() system call can see the pointer to the open_how structure passed by user space, but is unable to examine the contents of the structure itself.

There is nothing about tracepoints that inherently makes sleeping impossible — at least, for those tracepoints that are executed when the kernel is not running in atomic context. But the BPF subsystem has long had its own rule that BPF programs could not sleep. That will change in the 5.10 kernel, though, thanks to the addition of sleepable BPF programs, which no longer have this constraint. Only certain types of BPF programs are allowed to block; in 5.10, tracing programs are on that list. There will be no users of this ability in the 5.10 release, though.

Jeanson's patch set lays the groundwork for the addition of such a user, establishing the infrastructure to support the attachment of sleepable BPF programs to specific tracepoints. This ability must be supported with care since, as noted above, the kernel is often running in a context where sleeping is a bad idea. Specifically, a sleepable BPF program can only be attached to a tracepoint located in a region of code where sleeping is allowed in general.

There is no way to know automatically whether a given tracepoint can safely sleep or not, so existing tracepoints will not allow the attachment of sleepable BPF programs without explicit modification to that effect. Tracepoints are added to kernel code with the TRACE_EVENT() macro, along with a few variants; the brave of heart can see the horrifying macro-magic details in include/linux/tracepoint.h. Jeanson's patch set adds a new macro called TRACE_EVENT_FN_MAYSLEEP() as a variant of TRACE_EVENT_FN(), which defines a tracepoint that has associated registration and unregistration functions. Switching an existing tracepoint to the new macro indicates that it is safe to attach sleepable programs there.

The most significant change within those macros is that, if a tracepoint is marked as accepting sleepable programs, the tracers called when that tracepoint is hit will be run with preemption enabled. That is a necessary precondition to being able to handle page faults, but it also changes the expectations under which all of those tracers were written. The tracers themselves will need modification to run safely with preemption enabled — work that has not yet been posted. The patch set handles that situation, for now, by modifying the ftrace, perf, and BPF tracers to explicitly disable preemption internally, thus avoiding any unfortunate surprises.

As noted above, the use case that is driving this work is following pointers passed to system calls from user space. So it is not surprising that the first user of this capability will be system-call tracing. Jeanson's patch set changes the system-call entry and exit tracepoints to use TRACE_EVENT_FN_MAYSLEEP(), thus setting the stage for the attachment of sleepable programs that could rummage around in user-space memory in response to system calls.

There is only one piece that is missing at this point: actually fixing up the tracers and using the new infrastructure to attach and run sleepable BPF programs. As the cover letter to the patch set notes:

This series only implements the tracepoint infrastructure required to allow tracers to handle page faults. Modifying each tracer to handle those page faults would be a next step after we all agree on this piece of instrumentation infrastructure.

This may seem like a strange place to stop, just before making everything actually work, but changes at this point could have significant effects on the subsequent patches.

Based on the discussion so far, it doesn't appear that there is any need for big changes at this level of the code; most of the comments relate to details around the edges. If that situation holds, we should expect to see patches in the near future that finish the job and enable the attachment of sleepable tracepoint programs. That may well lead to another increase in the capability of the tracing infrastructure for Linux.

Comments (8 posted)

Kernel support for processor undervolting

November 2, 2020

This article was contributed by Marta Rybczyńska

Overclocking the processor — running it above its specified maximum frequency to increase performance — is a familiar operation for many readers. Sometimes, however, it is necessary to go the other direction and decrease a processor's operating power point by lowering its voltage to avoid overheating. Recently, Jason Donenfeld submitted a short patch removing a warning emitted by the kernel when user space accesses special processor registers that allow this "undervolting" on x86 processors. It caused a long discussion that might result in a kernel interface to allow users to safely control their processor's voltage.

Voltage, frequency, and undervolting

Current processors can run with any of a number of combinations of frequency and voltage, which can change dynamically in a process called dynamic frequency scaling. Different combinations of frequency and voltage will naturally vary in terms of both the number of instructions executed per second and power consumption. It is possible to place a CPU into a configuration outside of its specified operational envelope; when this is done, the processor may malfunction in a number of ways, from occasional false results from some instructions to a complete crash.

For some users, lowering the operating voltage is a necessity. Their chips, especially recent Intel laptop models, can overheat while running under high load, for example when compiling a kernel. One solution is to undervolt the processors, making them run at the lower voltage to decrease power consumption (and thus heat generation). As the frequency does not change, the performance of the system stays about the same. Fortunately for those users, tools like intel-undervolt exist to help them in this task. However, they face two difficulties: the values to use are undocumented and vary from one processor to the next, and the kernel prints a worrisome warning every time the tool changes the configuration.

In the case of Intel chips, the voltage settings are controlled by Model Specific Registers (MSRs), which do not just serve to change the voltage, as MSRs are an interface to many processor settings. On Linux, access to the MSRs from user space is possible using /dev/cpu/CPUID/msr special files. Write access can be disabled, however, via the msr.allow_writes boot-time option or if the kernel is running in lockdown mode. Within the kernel, MSR access requires specific processor instructions and is handled by the msr platform-specific driver. This driver emits a warning when an attempt is made to write to a MSR that is not explicitly listed as being safe to change; it still allows the write to happen, however, if writes are enabled in general.

Donenfeld's patch silences that warning by adding an entry to the list of safe MSRs. That entry, named MSR_IA32_OC_MAILBOX by the patch, allows changing the processor voltage; it is the register used by intel-undervolt and other similar tools. Interested readers can refer to a background paper on how those registers are configured. Apparently, this work is based on partial documentation and a significant amount of reverse engineering with trial and error.

Undervolting as an essential feature

Donenfeld's patch sparked a discussion about why direct access to MSRs from user space is necessary. Borislav Petkov suggested that it would be better to provide controlled access to specific registers via sysfs and remove the ability to write directly to registers. He later went further, suggesting disabling user-space access to MSRs altogether by default. That provoked a number of reactions from users who feel that this capability is essential. Donenfeld explained that his system requires undervolting to remain usable and there are many other users in the same situation:

Well that's not cool. And it's sure to really upset the fairly sizable crowd of people who rely on undervolting and related things to make their laptops remotely usable, especially in light of the crazy thermal designs for late-era 14nm intel cpus. [...] I know that my laptop, at least, would suffer.

Another example came from Sultan Alsawaf, who described his experiences with a number of laptop processors. Undervolting is necessary on all of them when performing tasks like compiling the kernel; it results in a 22-30% power use reduction and improved performance. "I'd like to point out that on Intel's recent 14nm parts, undervolting is not so much for squeezing every last drop of performance out of the SoC as it is for necessity", he said. Petkov acknowledged this use case, saying that it should be better supported: "Sounds to me that this undervolting functionality should be part of the kernel and happen automatically". Donenfeld noted that doing it automatically could be hard, though, since the correct value varies from one chip to the next depending on the "silicon lottery".

If this functionality is to be properly supported by the kernel, there are some other questions to answer as well. Donenfeld asked where the right place to do such operations is: whether it belongs in the kernel or user space. Petkov then responded strongly in favor of the creation of "a proper interface" in the kernel. He also mentioned the in-tree x86_energy_perf_policy tool that uses a different MSR; that MSR too, he said, can be taken off the allowlist once a real kernel interface to that functionality exists. Donenfeld agreed with this goal, but said it might be hard to achieve in practice because the MSRs are not all publicly documented and differ in their semantics.

Srinivas Pandruvada, maintainer of Intel power-related drivers, responded that overclocking (along with undervolting, presumably) is not an architectural interface. There is also no public documentation of the commands to be passed to this specific MSR. He promised to look for that documentation internally. A proper sysfs interface, he said, would have to perform checks of the passed values to prevent users from crashing their systems.

Toward a solution

At that point, Andy Lutomirski, maintainer of many x86-related subsystems, commented that MSR access and undervolting are two separate topics. According to him, MSR access should be allowed (with warnings emitted) only if restrictions are off, but the undervolting feature should be supported by the kernel. He did point out a potential problem with lockdown, though, noting that this feature could destabilize the system and perhaps enable privilege escalation. He proposed a separate lockdown bit for this feature. Matthew Garrett pointed out the Plundervolt [PDF] attack, which allows the corruption of Software Guard Extensions (SGX) enclaves using undervolting. He also noted that a sysfs interface would allow adding an SELinux or AppArmor rule and thus protect the interface if needed.

About then, Pandruvada returned with the answers from Intel. It turns out that the correct values come from experimentation and Intel's guide warns about possible stability issues. There is kernel code that uses the MSR in question now (the intel_turbo_max_3 driver), so the operation of that MSR is public, but there is no way to validate the commands written to it, he said.

The discussion about where to put the functionality continued for some time until Dave Hansen proposed that Intel developers look into the documentation of the MSRs of as many models as possible and create a separate driver, perhaps for only one model at first. Petkov agreed, and the discussion stopped at that point.

Kernel developers have thus come to an agreement that the undervolting feature is essential for some users, who require it to keep their CPU in reasonable thermal conditions. The path toward providing this feature has also been laid out. One blocking point may be the lack of official documentation, but it looks like there is a will from Intel to solve this problem. The work still needs to be done, but we can hope that the new interface is going to appear soon.

Comments (19 posted)

An introduction to Pluto

November 4, 2020

This article was contributed by Lee Phillips

Pluto is a new computational notebook for the Julia programming language. Computational notebooks are a way to program inside of a web browser, storing code, annotations, and output, including graphics, in a single place. They became popular with the advent of the Jupyter notebook, which originally targeted Julia, Python, and R—the names got mashed together to make the word "Jupyter".

Pluto is similar in many ways to Jupyter, which I wrote about earlier. It uses the same basic mode of interaction that is based on input and output cells; both notebook formats are well-suited to exploration, sharing calculations, and education. In an earlier article, I reviewed progress in Julia since v. 1.0 was released two years ago. It also went into some detail about its special attractions of Julia in the area of scientific computing. Readers who are unfamiliar with Julia may want to review some of the earlier articles; here, I concentrate entirely on Pluto.

Like Julia, Pluto has an MIT license. It was created by Fons van der Plas and Mikołaj Bochenski, and has been developed on GitHub since March 2020. Despite its recent vintage, it's already mature enough for serious use; in fact, it's being used in an ongoing open MIT course. But users should keep in mind that the version number, as of this writing, is only 0.12.4; the program's behavior is certainly not set in stone. Pluto advises the user, upon startup, if a fresher version is available in the repository.

Readers who would like to try out Pluto right away and don't have Julia installed can use the Binder service to run a notebook with nothing but a web browser. When a user visits this page, it spins up a Julia instance on a server and opens a notebook interface to it that is ready for experimentation. Whether opened through Binder or running locally, the notebook offers the user an initial page with the choices to open a sample, existing, or new notebook. The series of sample notebooks offers an excellent hands-on introduction to the use of Pluto.

Why another notebook?

Jupyter has become successful because it offers a style of interaction and modes of sharing that are not available through other means. Although it is used most often with Python, the list of kernels—language back-ends that it works with—is astounding in both its size and in the obscurity of some of the languages that it includes.

Given all this, the reader can be forgiven for wondering why we need a completely new notebook project. One answer is that Jupyter presents two problems that work against its intended uses. Briefly, these are its global hidden state and its incompatibility with version control. I'll expand on these two issues below. Although they are widely seen as problems, there is, naturally, disagreement about how serious they are. This YouTube video is an entertaining demonstration of the issues that arise when using Jupyter, presented in the lion's den, the 2018 JupyterCon, by data scientist Joel Grus. Jeremy Howard responded [YouTube] at the 2020 JupyterCon, showing in detail some of the ways to overcome these problems.

[Jupyter]

At right is a screenshot from my laptop, showing Python 3 running in a Jupyter notebook; it illustrates the hidden state problem. The entire notebook is shown. Notice that a is defined in two contradictory ways, followed by a cell where we ask for the value of this variable. The response from the Python kernel is yet a third value, from an assignment that is nowhere to be found in the notebook because I deleted it.

The problem here is that the visible state of the notebook at any particular time may be inconsistent, because it depends on the order in which the cells were executed and possibly on cells that no longer exist. If the notebook is saved in an inconsistent state and shared, the recipient will open it and see calculations that may not be reproducible from the inputs as shown, or may depend on the cells being executed in some undocumented order.

But what if the notebook worked in a different way, that made these issues moot points? What if it were always consistent, and it didn't matter in what order the cells appeared?

Pluto is different

Pluto appears, at first glance, quite like Jupyter and similar notebooks, but the way it works is rather different.

In Jupyter, when a cell is executed, the result appears, and nothing else happens. Any global variables defined in the cell are available for subsequent execution of other cells. The variables can be redefined freely. The upshot is that the output of any particular cell execution depends on a hidden global state: which cells were previously executed and in what order.

Pluto, instead, analyzes the code in all of the cells and constructs a dependency graph, so that it knows the order in which the cells must be executed; this is based on which cells use which variables. Cells can be grabbed with the mouse and arranged in any order and this has no effect on the results. When the code in a cell is changed, the cell is run; all of the cells that depend on it, and only those cells, are also run, in dependency order. Therefore one is not allowed to define a global variable in more than one cell; an attempt to do so results in an error message.

The Pluto GitHub page makes this promise: "At any instant, the program state is completely described by the code you see." There is no hidden state, and nothing depends on the order in which cells were run.

Execution, as with Jupyter, can be performed with the shift-return key combination or by clicking a button. One difference from Jupyter that jumps out immediately is that results are shown above their input cells, a convention that seems counterintuitive to me. While a cell is executing, its left-hand border turns into an in-progress indicator. One can observe the status of a calculation through the dependency graph by watching as these indicators jump from cell to cell. Below each cell is printed, in unobtrusive grey type, the execution time taken when calculating that cell, which is a nice touch. The following figure illustrates these elements of the interface:

[Calculating e]

The Pluto creators describe their project as a "reactive notebook", borrowing this term from its emergence as a description of JavaScript libraries, such as React, that allow the creation of web apps where the state of the page automatically reacts to a change made by the user.

In Pluto this means that, when the user executes a change to the value of any global variable, the dependency cascade is triggered, and everything that depends on that variable is automatically updated.

An example calculation

To provide a feel for the use of Pluto notebooks, this section will go through a short but meaningful calculation. Our recent article introduced the use of the DifferentialEquations and Plots packages; here we will use these again.

To begin working with Pluto, one must start the Julia read-eval-print loop (REPL), add the Pluto package if it's not already installed, and then execute using Pluto followed by Pluto.run(port=1234), substituting a different port if required. Current versions will then automatically open a browser window pointing to localhost at this port. The machine running the Julia process and the one running the browser need not be the same; in that case an extra host argument to Pluto.run will allow it to access a remote instance.

Our calculation is the numerical solution of the differential equation for a damped, nonlinear oscillator.

[Preliminaries]

The figure above shows the top of the notebook page and the first four cells. The first two perform the necessary imports. The next cell is a comment describing the equations to be solved, and the fourth cell defines ndo!(), the function that translates this math problem into a form that DifferentialEquations can digest. This function modifies its first argument, du, which holds the derivatives of x and v; the "!" sigil in the name is a Julia convention to indicate argument-modifying functions. Example 2 on this page of the package documentation explains what is being done here.

The crossed-out eye icon to the left of the third cell hides the input, which in this case is a bit of Markdown. It makes sense to keep the input of Markdown cells hidden, unless editing them, to make the notebook easier to read. Here is what this cell looks like with the input exposed:

[Markdown]

As this shows, in addition to standard Markdown, Pluto can render LaTeX math. The Markdown can also interpolate the values of variables and calculations by using a dollar sign (e.g. $(2*var)). As with code cells, the Markdown gets automatically updated if any of the variables that it uses happen to change.

This system has three parameters that influence the nature of the solutions: k is the restoring force, or the "spring constant"; α is the nonlinearity, a second term in the restoring force; and d is the strength of the damping, or the amount of friction. The next three cells, as shown in the following figure, contain the time span for the solution (tspan), the initial conditions for the position (u0, which consists of x, and the velocity), and an array (p) holding the three parameters.

[Calculations]

The next cell defines an iterator that goes from one to 50 in steps of one; as its name (αs) suggests, I plan to let α take on this list of values.

The final cell in this figure calculates the 50 solutions, one for each value of the nonlinearity parameter α, and plots them all together in a waterfall plot. I use the variable i to offset the plots; since it is initialized outside of the loop, but incremented in the loop, it must be declared global because all blocks in Julia create lexical scopes. The whole cell needs to be wrapped in a beginend block because, otherwise, Pluto will complain that i is being defined twice. This is the only difference between the Pluto code and the way this would be written in an ordinary Julia file, where this block would be unnecessary.

The other functions in this cell are similar to those we used in the previous article to solve the logistic equation. The exclamation mark on the plot! command is so we can "mutate" the plot by adding a new line to it with each call. Inside the plot! function, the notation sol[1,:] is a convenience interface defined by the package that extracts the first of the two components of the solution, which in this case are x and v.

Here is the result:

[Results]

I think the plot is rather interesting, as I've not seen the solution space of this equation displayed this way before. It shows how the wavelength varies with time and with the amount of nonlinearity. Julia and Pluto make it easy to play with things and experiment, which can lead to new insights.

This last figure also shows the bottom of the window, with a box for sending instant feedback to the developers, which is a nice idea. At the bottom left is a link that allows the user to opt-in to sending anonymous statistics about the use of Pluto to the developers to help them to improve the product. There is also the button for the "Live Docs" popup window, which is described below.

Integration

The reactive nature of Pluto notebooks can be taken further by using HTML "bindings". Using a simple macro, any variable can be bound to the output of any HTML input type. The following figure shows a screenshot from my laptop where I've bound three variables to three types of widgets.

[HTML
inputs]

The illustration on the right shows user interaction with the color picker.

[Color picker]

In Jupyter, similar functionality can be had by importing a widget library. In Pluto, an alternative to writing HTML directly is offered by a Julia widget library. Using this library, one can, for example, replace the HTML code for a slider:

    @bind x html"<input type=range min=1 max=5>"

with:

    @bind x slider(1:5)

Those who want to go beyond standard HTML inputs can create any widget they wish using JavaScript because the output of a JavaScript function can be bound to a Julia variable.

Pluto notebooks integrate well into a more general coding environment. Code is not locked into the notebook, and code that exists outside of a notebook can be brought in. One can import (almost) any package into a notebook, just as in normal Julia code. The known exceptions are things that don't make sense in a browser-based notebook, such as REPL-specific packages, or some packages that use macros to radically redefine the language's syntax. Some parallel-processing packages also will not work, specifically those that use Distributed.

It is also possible to include local files, but a notebook that does this will no longer be portable, because those files may not exist on others' machines. So a truly portable notebook should only import from public repositories.

Pluto notebooks are stored as plain text files on disk. Jupyter's ipynb native notebook file format is also text, but there is a crucial difference. Jupyter files include not just the inputs and the code, but also whatever outputs happened to be displayed when the notebook was saved. This has three consequences. First, meaningful diffs between versions are usually impossible to get, which severely limits the usefulness of version control. Second, the person opening a shared notebook is likely to see whatever partially executed state the saved (or auto-saved) file was in before it was shared. Third, the files are not themselves legal code because of the mix of code and output data; execution can only take place through the notebook interface.

In contrast, Pluto notebooks include only the input cells; the order in which cells appear in the file is based on the dependency graph and does not follow the visual order of the cells in the notebook. This presentation order is preserved through comments in the file. Because of this design choice, Pluto notebooks can be usefully managed using version control. Notebooks can also be shared, and the recipient receives a predictable record of the code (and explanatory Markdown cells). Beyond that, notebook files are legal Julia code, so they able to be executed directly by the Julia compiler/interpreter. They can be imported or run as standalone programs, just as any file of Julia code, so they use the same .jl suffix.

This means that it's possible to freely go back and forth between editing code in an editor and in the notebook, creating files that can be imported as any other file in a project, without having to worry about stray bits of output left over in the notebook. One drawback of this design is that if, for example, a notebook contains some graphs or other output that take a long time to generate, it will need to be regenerated each time the notebook is freshly opened; and with Julia's notorious precompilation delays, this can mean waiting for several minutes before getting to work if a new package or a new version needs to be compiled. With Jupyter, in contrast, all of the output, including graphics, is embedded, so there is no delay.

Notebooks can be exported into static PDF or HTML forms with a click on the mysterious triangle-and-circle icon near the top of the page. The order in which the cells are arranged in the notebook is preserved in the exported versions of the notebook, so the visual representation is preserved, regardless of the dependency graph.

Much of Pluto's interactivity and visualization is based on Observable, an interactive notebook for JavaScript, which the Pluto authors credit for being their inspiration. Users of Observable notebooks will immediately recognize the controls for creating and manipulating cells in Pluto. Its reactivity is based on Preact, a JavaScript library that is a lightweight alternative to React. Math rendering is provided by MathJax, and code editing in cells uses CodeMirror.

Rough edges

Despite its polish, Pluto still has a few rough edges, which is not surprising in such a young project.

The current version as of this writing does not work with my daily browser, qutebrowser, but it did work for me in Chrome. This is disappointing, as some previous versions did not have this limitation. The project does claim, however, that Chrome or Firefox are needed, but it is less than ideal for programs to require specific browsers.

The print statement seems to do nothing—but the output is actually being printed in the REPL where Pluto was started. This will probably confuse some users. It certainly surprised me, and I had already read the instructions where this is explained—but I forgot. Some might find this useful, however, as a way to emit logging information and keep it out of the notebook.

A new notebook in Pluto opens in ~/.julia/pluto_notebooks/, with an arbitrary name, instead of the directory where the REPL is running. This seems like an odd behavior, but is an improvement over previous versions that opened new notebooks in /tmp, which could cause the user to lose work if they didn't notice what was going on. Neither of these facts is noted in the online guides to the program. It would seem to be a good idea to enter the desired location of the notebook in the dialog at the top before beginning work.

Certain invocations of parallel processing, such as attempts to run on a multicore graphics processor, which work fine from files or the REPL, will fail from the notebook. This is a problem that is being worked on, but for now there are some simple workarounds.

There is not a huge amount of documentation available, but what exists may be sufficient, because the use of Pluto is fairly intuitive, and the built-in example notebooks are well done.

The GitHub home page serves as an introductory tutorial; I especially recommend the short animation on the page. There is also a FAQ and a good introductory video from this year's JuliaCon.

A help window will pop up when "?" is typed in a cell, similar to the help mode in the REPL. The built-in documentation popup window can be made to persist by clicking on "Live Docs" at the bottom of the window. In this case it will continuously offer definitions for anything that it recognizes as the user types into a cell. Also, the notebook has tab completion which works well, as does the syntax-aware styling of code in cells.

Future plans

The developers are interested in being able to export notebooks that can be used in their fully interactive form without requiring the user to have Julia installed. This would seem to be an ambitious aspiration; the Jupyter developers have been talking about the equivalent for a while, but it still does not exist. However, as noted, the Binder service will host anyone's Pluto notebook, which is helpful for those in school or corporate environments where it may not be possible to install software. The developers plan to include a single button for deployment to Binder in the near future.

The concept of notebooks within notebooks is on the table, as is a modification of the notebook for teacher-oriented features like grading. There may be improved display, debugging, and logging on the way, such as the display of intermediate values during a calculation. There is also interest in making the front-end more responsive, as well as allowing user theming.

One interesting question is whether Pluto, like Jupyter, will ever allow the use of languages other than Julia. As there is nothing online mentioning this, I asked Van der Plas about it. His answer is that they have made a deliberate choice to keep Pluto Julia-only:

Julia-only allows us to do lots of exciting things.[…] I think that the traditional programming experience from the 70s - terminal input, terminal output - is something that we urgently need to leave behind us. […] to get more meaningful interactions with a language, I believe that the polyglot environment is something that you need to leave behind. While Pluto will not become a multiple language environment, we have been working on closer integration with JS, and a lot of powerful stuff is possible already. Cells can output HTML, and that HTML can contain script tags, and Pluto's frontend will execute those. We have a small API layer around vanilla JS to allow you to send values back to Julia (reactively, this is how @bind works), output to the DOM, persist DOM state between cell updates and even change Pluto's UI.

Above we made the case for Pluto by showing how it solves the hidden global state and version control incompatibility problems that occur with Jupyter (and similar notebooks). But this is a third reason for developing a new notebook project: by confining itself to one language, Pluto can focus on being the best possible interactive environment for that language, without making any compromises to allow for a variety of kernels.

Although the dominant method for programming is likely to remain in a text editor like Vim or Emacs, or an integrated development environment, computational notebooks offer an intriguing alternative. They not only offer a unique style of interactive exploration, but can be a good platform for creating documents as well. People have written books using these notebooks, and they may be a way to realize Donald Knuth's concept of literate programming. The great advantage of computational notebooks in writing about programming is that the code can be run and checked before the notebook is exported into the final HTML or PDF form; the author can guarantee that the code runs and does what the book or article says it does.

The computational notebook format began in the 1980s with the proprietary Mathematica system. It was an innovative feat of engineering, but never achieved the widespread adoption of the free, open source, and language-independent Jupyter. However the Mathematica notebook idea has been incarnated by now in several guises in addition to Jupyter. For those using Julia and interested in a notebook interface, Pluto, the latest of these options, is an obvious choice: it offers all of the features of Jupyter, but is a clear advance over the older system. It will be interesting to see how this young project evolves in the coming years.

Comments (34 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: More Arm32 boot; Signed kernel pushes; Panfrost status; Mourning Dan Kohn; Quotes; ...
  • Announcements: Newsletters; conferences; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds