Leading items
Welcome to the LWN.net Weekly Edition for November 1, 2018
This edition contains the following feature content:
- Init system support in Debian: System V init in Debian is in trouble, but there may be help from an unlikely source.
- Solid: a new way to handle data on the web: Tim Berners-Lee's initiative to fix the web.
- 4.20/5.0 Merge window part 1: what's coming in the next major kernel release.
- Improving the handling of embargoed hardware-security bugs: a Maintainer Summit discussion on how to better handle the next Meltdown-type vulnerability.
- Removing support for old hardware from the kernel: old hardware is a challenge to support; when can that support be removed?
- The proper use of EXPORT_SYMBOL_GPL(): when are GPL-only symbol exports appropriate?
- Compartmentalized computing with CLIP OS: a different approach to a hardened distribution.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Init system support in Debian
The "systemd question" has roiled Debian multiple times over the years, but things had mostly been quiet on that front of late. The Devuan distribution is a Debian derivative that has removed systemd; many of the vocal anti-systemd Debian developers have switched, which helps reduce the friction on the Debian mailing lists. But that seems to have led to support for init system alternatives (and System V init in particular) to bitrot in Debian. There are signs that a bit of reconciliation between Debian and Devuan will help fix that problem.
The Devuan split was acrimonious, much like the systemd "debate" that preceded it. Many bits were expended in describing the new distribution as a waste of time (or worse), while the loudest Devuan proponents declared that systemd would cause the end of Debian and Linux as a whole. Over time, that acrimony has mostly been reduced to random potshots (on both sides); there is clearly no love lost between the pro and anti sides (whether those apply to systemd, Devuan, or both). Some recent developments have shown that perhaps a bit of thawing in relations is underway—that can only be a good thing for both sides and the community as a whole.
Holger Levsen alerted the debian-devel mailing list that the Debian "Buster" (i.e. Debian 10) release was in danger of shipping with only partial support for running without systemd. The problem is that two packages needed for running with System V init (sysvinit-core and systemd-shim) are not really being maintained. The shim is completely unmaintained and sysvinit-core has been languishing even though it has two maintainers listed.
While Thorsten Glaser at first downplayed
the problems, he changed
his tune somewhat after Ben Hutchings pointed
out that the sysvinit package has "many open bugs with patches that have not
been applied or answered
". The problem goes even deeper than that,
according
to Andreas Henriksson:
And, as Petter Reinholdtsen explained,
the sysvinit "team" is really just him at this point. He is "lacking
both the required spare time and interest to do [a] good
job, but still try to fix the gravest problems while waiting for someone
with time and interest to [adopt] the packages
". He too is
concerned that the packages will be removed before long. So Jonathan
Dowland suggested
looking elsewhere for help:
That set off a fairly predictable round of Devuan bashing, along with concerns that inviting Devuan developers to work in Debian would bring on a return of the mailing list battles of old. But it also resulted in a reply from Enzo Nicosia (also known as "KatolaZ"), who is one of the Devuan "Caretakers" team:
In the last four years there has been hatred from both camps (Debian vs Devuan), and there is no doubt that most of that could/should have been avoided on both parts. Grepping through email archives and [resurrecting] posts from 3 or 4 years ago won't help to move on, though.
I am not interested in chit-chat or flames, because those don't get packages released. The only reason I am here is that sysvinit is effectively getting kicked off Debian, and I think I can help avoiding that.
Several Debian developers were ready to let bygones be bygones and welcomed any effort toward keeping sysvinit alive in Debian. Ian Jackson invited Nicosia to a new debian-init-diversity mailing list. That mailing list has been expressly set up to avoid the hostility on (some) Debian mailing lists, Jackson said. It would appear to be the place where non-systemd init systems will be discussed and developed moving forward.
Back in debian-devel, there were other sysvinit proponents (such as Benda Xu) who did not see any real problems with the sysvinit package. But Reinholdtsen was quick to point out that attitude as helping to push sysvinit out of Debian. Beyond that, as Martin Pitt noted, the problems are far more widespread than simply being confined to the sysvinit package:
SysV init leaves all the really hard problems to these, as it cannot really do much by itself. That's a fact that people that keep yelling "but SysV init was so easy!" keep finessing..
So "how many RC bugs does sysvinit have" is a completely useless metric IMHO.
Bernd Zeimetz brought
up a related issue: "the typical package maintainer won't test
initscripts
". Many of them won't even have access to a system that
runs sysvinit, he said, so they can't test them well. In another part of
the long thread, though, Philipp Kern asked:
"Could someone reiterate about what the current state of init
diversity
is supposed to be?
" Russ Allbery had
some thoughts on that:
It is quite easy to write a sysvinit init script for most daemons that will basically work. I don't think the maintainer is obligated to do much more than that (for instance, I don't think you need to try to duplicate systemd hardening configuration or other features that are quite challenging to do under sysvinit without more tool support, although some of that may be coming in start-stop-daemon).
He suggested that maintainers could test the init scripts by moving the
systemd unit file aside and have systemd use the init script. "For
nearly all daemons
that don't involve tight system integration, this will be more than
adequate.
" In another
message, he explained that providing an init script is, at least partly,
a gesture of goodwill within the Debian community:
Overall, the dialog was relatively positive and the outcome may well lead to better maintenance of sysvinit for both Debian and Devuan moving forward. Given a little more time (and water under the bridge), Devuan's users could provide the testbed for the init scripts, which will obviously help Devuan, but will also be a boon for any Debian users of sysvinit. There is still likely to be something of chasm between the two distributions, but any rapprochement should be welcome news to most. In truth, what separates the two is pretty trivial in the grand scheme of things—the flames and acrimony notwithstanding.
Solid: a new way to handle data on the web
The development of the web was a huge "sea change" in the history of the internet. The web is what brought the masses to this huge worldwide network—for good or ill. It is unlikely that Tim Berners-Lee foresaw all of that when he came up with HTTP and HTML as part of his work at CERN, but he has been in a prime spot to watch the web unfold since 1989. His latest project, Solid, is meant to allow users to claim authority over the personal data that they provide to various internet giants.
Berners-Lee announced Solid in a post
on Medium in late September. In it, he noted that despite "all
the good we've achieved, the web has evolved into an engine of inequity and
division; swayed by powerful forces who use it for their own
agendas
". Part of what he is decrying is enabled by the position of
power held by companies that essentially use the data they gather in ways
that run directly counter to the interests of those they gather it
from. "Solid is how we evolve the web in order to restore
balance — by giving every one of us complete control over data, personal or
not, in a revolutionary way.
"
Users' data will be stored in a Solid "pod" (sometimes "personal online data store" or POD) that can reside anywhere on the internet. Since Solid deliberately sets out to build on the existing web, it should not be a surprise that URLs, along with Uniform Resource Identifiers (URIs), are used to identify pods and specific objects within them. Pods also provide one place for businesses, including Inrupt, which was co-founded by Berners-Lee, to provide services for Solid. As he noted in his post, people are willing to pay companies like Dropbox for storage; hosting Solid pods would be a similar opportunity for Inrupt and others.
The vision is that users will be able to grant applications and other users read or read-write access to selected data in their pod. That pod can be hosted "in the cloud" or locally; users can install the Solid Server to host pods on their own system or get a pod from a hosting provider. Applications will access data that is provided to them by following "typed" links; this is called "Linked Data" in the Solid documentation. The example given is that a comment made by one person on another's photo could be represented as:
<https://mypod.solid/comments/36756> <http://www.w3.org/ns/oa#hasTarget> <https://yourpod.solid/photos/beach>.
The link type (on line 2) uses the Web Annotation Ontology to specify what kind of link is being made. The Linked Data is described using the Resource Description Framework (RDF) Turtle notation, which consists of three items: a subject, a predicate (or link type), and an object. Each element of this triple is a URI and a Turtle statement is terminated with a ".".
Solid is meant to provide ways to control access to a user's data, which implies a need for identities and authentication. The Solid specification GitHub repository provides links to all of the relevant pieces that make up Solid. Identities in Solid are provided using WebID URIs, while authentication is done using WebID-TLS. Alternative authentication methods will be supported as well; WebID-OIDC support is currently under development and other options are being explored. Web Access Control (WAC) will be used for access-control lists (ACLs) on the data in Solid pods.
One thing that is notably missing from any of the Solid marketing and documentation is any mention of encryption. If users are to turn over all of their content to a pod provider, they will likely want to know that the data is protected from both attackers and from the provider itself. That is a difficult problem to solve, however, since various different entities (applications and users) will have access to different parts of the pod. That implies that the data is either not encrypted, is decrypted by the server, or that each entity will get a key of some sort to decrypt the data it gets. There is a GitHub issue asking about encryption but, other than that, a seemingly important feature is not even discussed.
The Solid server is Node.js-based. Its installation instructions start with the always worrisome "curl | sudo" pattern. It can be run either directly from the command line or in a container using Docker. It implements many of the features envisioned for Solid and is presumably the server being used by the two existing pod providers.
There is something of a chicken-and-egg problem for Solid, though. In order for it to be adopted widely, it is going to need lots of applications that use the Solid model. Getting people to write those applications (or to add Solid support to existing applications) may be difficult without a fairly sizable user base. The ability to break that logjam will be a major factor in determining Solid's level of success.
The Solid web site provides a "Make a Solid app on your lunch break" tutorial. It uses jQuery to create a web page that handles authentication and shows information from the logged-in person's WebID Profile, including their name and the names of their friends. The friends' names are loaded from the WebID Profile on each of the friends' pods; clicking on a friend's name will load the friend's profile into the page, which is meant to show how Linked Data makes it easy to find and display data from multiple pods. The tutorial uses jQuery for simplicity, but the documentation describes creating Solid applications using the more full-featured AngularJS framework; support for other frameworks is planned.
The "semantic web" has been a longtime dream of the World Wide Web Consortium (W3C) and Berners-Lee in particular, though it has always seemed to suffer from a low adoption rate. Solid is trying to take the semantic web one step further by placing the handling of the actual content directly under the control of users. Whether that level of control is compelling enough to get over the hurdles that both the semantic web and Solid impose remains to be seen—there is certainly reason to be skeptical.
The vision that is promoted by Solid and its backers is attractive, but most consumers have shown a marked disinterest in what happens to their personal data, especially if there is any kind of cost—not just monetary, but time or inconvenience as well. Storing information like photos, contacts, videos, and so on, in ways that allows others to interact with them in various ways, sounds great—in theory. But the cost is that users will need to be cognizant of the kinds of permissions they grant and dodgy applications (and "friends") will undoubtedly try to tempt them into going astray, which leaves them in the same situation they are already in.
The incentives are too high, at least for now, for companies and others to not find ways to route around this kind of access control. Solid seems like a ... well ... solid idea, though it may be a bit overhyped (and an encryption story is needed); it is a little hard to see it gaining much traction, however. It would be nice to be wrong about that.
4.20/5.0 Merge window part 1
Linus Torvalds has returned as the keeper of the mainline kernel repository, and the merge window for the next release which, depending on his mood, could be called either 4.20 or 5.0, is well underway. As of this writing, 5,735 non-merge changesets have been pulled for this release; experience suggests that we are thus at roughly the halfway point.Some of the more significant changes merged so far are:
Architecture-specific
- The arm64 architecture can make use of the new hardware-provided SSBS state bit to defend against Spectre variant 4 attacks.
- RISC-V now supports the futex() system call and associated operations.
Core kernel
- There are two new types of BPF maps for implementing queues and stacks. Documentation is missing, but an example of their use can be found in the selftest code.
- On systems with asymmetric CPUs (big.LITTLE systems, for example), the CPU scheduler can now detect "misfit" processes that need the resources of a fast CPU but which are stuck on a slow one. When load balancing is performed, the scheduler will try to move misfits to a more appropriate processor.
- Signal handling within the kernel has been extensively reworked; the
result should be simpler and more robust handling. There is a slight
change in structure sizes that is visible to user space, but patch
author Eric Biederman couldn't find any programs that would be
affected by it. There's also one other visible change that is hinted
at: "
Testing also revealed bad things can happen if a negative signal number is passed into the system calls.
"
Filesystems and block I/O
- Numerous block drivers have been converted to the multiqueue API. Current plans call for the legacy API to be removed in the next development cycle.
Hardware support
- Audio: Texas Instruments PCM3060 codecs, Amlogic AXG PDM input ports, Allwinner sun50i codec analog controls, and Nuvoton NAU88C22 codecs.
- Miscellaneous: STMicroelectronics STPMIC1 PMIC regulators, Cirrus Logic Lochnagar regulators, UniPhier SD/eMMC Host controllers, Spreadtrum SDIO host controllers, SIOX GPIO controllers, Panasonic AN30259A LED controllers, BigBen Interactive gamepads, Spreadtrum SC2731 charger controllers, Freescale eDMA engines, and Mylex DAC960/DAC1100 PCI RAID controllers.
- Network: DEC FDDIcontroller 700/700-C network interfaces (hardware designed in 1990; it is not clear why anybody wants this now) and Intel Ethernet Controller I225-LM/I225-V adapters.
- Pin control: Nuvoton BMC NPCM750/730/715/705 pinmux and GPIO controllers, Meson g12a SoC pin controllers, Mediatek MT6765, MT7623 and MT8183 pin controllers, Qualcomm SDM660 and QCS404 pin controllers, Broadcom Northstar pin controllers, and Renesas RZ/N1, r8a774a1 and r8a774c0 pin controllers.
- SPI: Spreadtrum SC9860 SPI controllers, MediaTek SPI slave devices, Qualcomm QuadSPI controllers, Qualcomm GENI-based SPI controllers, STMicroelectronics STM32 QUAD SPI controllers, and Atmel USART SPI controllers.
- Additionally, the "LED pattern driver" can be used to drive an LED given a brightness pattern from user space; see this commit for more information.
Networking
- The TCP stack has moved to an "earliest departure time" model for the pacing of outgoing traffic. This mode, inspired by a talk by Van Jacobson [PDF] at the 2018 Netdev conference, aims to address scalability problems by replacing outgoing packet queues with a timer wheel describing the earliest time that each packet can be sent. The result is meant to be better pacing and more accurate round-trip-time calculations to drive that pacing.
- Network flow dissectors can now be loaded as BPF programs, which should provide both better hardening and better performance.
- The new "taprio" traffic scheduler allows the control of packet scheduling according to a pre-generated time sequence. Documentation is naturally scarce; a little can be found in this commit.
- The rtnetlink protocol has been enhanced with a "strict checking" option that allows user space to be sure it is getting the actual information it asked for.
Security-related
- The kernel now makes more aggressive use of barriers when switching between unrelated processes in an attempt to provide stronger protection against Spectre variant-2 attacks.
- The controversial Speck crypto algorithm has been removed from the kernel.
- There is a new mechanism for obtaining statistics from the cryptographic subsystem. Naturally, it is thoroughly undocumented, but there is an example program showing its use.
Internal kernel changes
- The read-copy-update (RCU) subsystem has seen a lot of refactoring, ending in the removal of many of the "flavors" of RCU. There are now two primary flavors, one of which is adapted to preemptible kernels and one for non-preemptible kernels.
- The PCI subsystem can now support peer-to-peer DMA operations between peripherals.
If the usual schedule is followed, this merge window will end on November 4, with the final release happening just before the end of the year. Stay tuned for the followup article, which will cover the changes pulled in the second half of the 4.20 (or 5.0) merge window.
Improving the handling of embargoed hardware-security bugs
Jiri Kosina kicked off a session on hardware vulnerabilities at the 2018 Kernel Maintainers Summit by noting that there are few complaints about how the kernel community deals with security issues in general. That does not hold for Meltdown and Spectre which, he said, had been "completely mishandled". The subsequent handling of the L1TF vulnerability suggests that some lessons have been learned, but there is still plenty of room for improvement in how hardware vulnerabilities are handled in general.There are a number of reasons why the handling of Meltdown and Spectre went bad, he said, starting with the fact that the hardware vendors simply did not know how to do it right. They didn't think that the normal security contact (security@kernel.org) could be used, since there was no non-disclosure agreement (NDA) in place there. Perhaps what is needed is the creation of such an agreement or, as was discussed in September, a "gentleman's agreement" that would serve the same role.
James Bottomley asserted that not even the gentleman's agreement would be needed
if the community were to publish a comprehensive document on how it will
handle reports of hardware security issues, but others said that the
problems go beyond the initial agreement. Linus Torvalds complained that he
has been unable to get either emails or PDF documents describing known
vulnerabilities; all that has been on offer is the ability to get an
account on an Intel server where documents can be read. Thomas Gleixner
said that there has been some progress in that area, though, and that he is
now able to get documents in a GPG-encrypted tarball.
Greg Kroah-Hartman said that the wording of the documentation on how security issues are handled is not perfect for this case, but work is being done to fix it. Gleixner said that we need to create a single point of contact for hardware vulnerabilities; the vendors will then understand the rules that we play by and that we will not leak information. Intel, he said, has learned a lot and knows who to talk to. Mauro Carvalho Chehab complained that Intel is just one vendor, though, and that the next vendor with a vulnerability will be different. Torvalds replied that the most important vendors are coming around; Gleixner added that this is another reason to have clear documentation on how we have handled these problems in the past.
Ted Ts'o said that the community's policy is to hold on to fixes for kernel bugs for up to five working days while distributors work out their response. That time period is clearly not appropriate for hardware bugs, but what would the right time be? Gleixner responded that it is "quite long". Vendors can come up with a proof-of-concept microcode update for a single product fairly quickly, but that is just the beginning; vendors like Intel have hundreds of products, each of which must be evaluated and fixed independently. So the response time tends to drag out; the kernel community has to acknowledge that hardware vendors need time to handle things properly.
Kees Cook asked how long that would be, but it seems that the answer varies considerably depending on the nature of the vulnerability. The L1TF fixes were ready three weeks before the disclosure, helped by the fact that Intel had informed the community even before it knew how many processors were affected. Torvalds complained, though, that many of the embargo periods are still controlled by "the old corrupt security rules"; the L1TF disclosure date was determined by the date of a security-conference talk rather than any technical considerations. That is not a game we want to play, he said.
Cook persisted, asking whether the community could somehow set a maximum embargo time. Gleixner said that would be difficult. We can't create our own patches before any microcode fixes are done, for example. There are also delays associated with the interaction with other operating-system vendors, some of whom are slower than the Linux community to prepare patches. Those vendors, Kosina said, have venues where they are able to collaborate on issues like these, but the kernel is not represented there. Gleixner said that the community needs a contact point that can participate in these discussions.
Torvalds said that the hardware vendors worry a lot that problems will not be kept under wraps until the appointed disclosure day; they need to have personal connections with the community to get over those fears. Gleixner agreed, saying that a new contact point should be set up for hardware issues; it would be a smaller group than security@kernel.org. The vendors would have to trust that group, though, and would have to allow domain experts to be brought in from outside the group for specific problems. Extending that trust or not is their decision in the end, he said; if they won't play, then Linux will simply wait until the issues become public to start work on fixing them.
Will Deacon said that, if one vendor has a specific type of problem, others probably do as well; there's only so much novelty in the area of microprocessor design. But hardware vendors don't have a way to coordinate around this kind of vulnerability; indeed, they tend to do the opposite. If a group of developers is talking to one hardware vendor, the other vendors will stay away from that group. That implies, he said, that the point of contact for each processor type needs to be the associated architecture maintainer. Kroah-Hartman agreed, saying that the cross-vendor collaboration problems are not amenable to solution by the Linux kernel community.
Arnd Bergmann asked about the problem of older, unmaintained processors. A number of old MIPS processors are affected, for example, but nobody is doing anything about them. Kroah-Hartman said there is little for the community to do about abandoned hardware; that is an issue for governments to deal with. Until the ability to update hardware and ongoing security support are mandated, the problem will persist.
As the session concluded, Grant Likely said that the community needs to develop a documented process for hardware vulnerabilities — before the next one hits. But who would write this document? After an awkward silence, offers of help were received from Deacon, Gleixner, Kosina, Kroah-Hartman, and Likely, with your editor being instructed by the rest to ensure that all of the names were written down and published.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the Maintainers Summit.]
Removing support for old hardware from the kernel
The kernel supports a wide range of hardware. Or, at least, the kernel contains drivers for a lot of hardware, but the hardware for which many of those drivers was written is old and, perhaps, no longer in actual use. Some of those drivers would certainly no longer work even if the hardware could be found. These drivers provide no value, but they are still an ongoing maintenance burden; it would be better to simply remove them from the kernel. But identifying which drivers can go is not as easy as one might think. Arnd Bergmann led an inconclusive session on this topic at the 2018 Kernel Maintainers Summit.Bergmann started by noting (to applause) that he recently removed support for eight processor architectures from the kernel. It was, he said, a lot of work to track down the right people to talk to before removing that code. In almost every case, the outgoing architectures were replaced — by their creators — by Arm-based systems. There probably are not any more architectures that can go anytime soon; Thomas Gleixner's suggestion that x86 should be next failed to win the support of the group.
James Bottomley said that he's seeing more bugs on 32-bit architectures
slipping through; there are almost no developers left doing 32-bit builds
anymore. He asked: can we deprecate 32-bit support? That idea didn't get
far either; Christoph Hellwig said that he routinely deals with people
running 32-bit systems. Intel still sells 32-bit-only cores, and customers
are still using them. It was agreed that setting up more automatic testing
of 32-bit architectures would be a good idea.
Bergmann moved on to another class of old architectures, which includes m68k and pa-risc, that is being kept alive by a small set of developers who continue to fix bugs and are "glad to be among the last ten people using it". Yet another set comprises old architectures that have been embedded into products that are still shipping; there are ARM9 cores being installed in air-conditioning systems now, he said. Some of these run new kernels, which is what we want. But it raises interesting support questions when current hardware is based on a 20-year-old chip, and that hardware is expected to run for another 20 years.
Those two groups of users of ancient hardware are mostly distinct, Bergmann said. At some point, the community needs to think about whether it should drop support for either or both of them. The recent removal of eight architectures made a lot of things easier; removing architectures like pa-risc, alpha, or itanium would make life easier yet. For example, the m68k architecture uses a number of internal APIs that no other architecture needs at this point; removing that architecture would enable removing the APIs as well. That architecture includes the oldest machine supported by the kernel: a Sun-3 workstation built in 1985.
Ted Ts'o suggested that an ultimatum could be made: either the m68k architecture stops using the old, deprecated timer API (for example) within one year or it is removed from the kernel. This kind of approach has worked well in the Debian community, he said; without it, things can drag on forever. Another possibility, suggested by Olof Johansson, would be to remove an architecture when toolchain support disappears. The problem there, according to Bergmann, is that the GCC developers tend to wait until kernel support is removed for an architecture before deleting it themselves.
Linus Torvalds said that there is more to the problem than architectures, though. The 3c503 network interface driver is still in the kernel, but nobody is running that hardware at the moment. Hellwig said that there was a recent purge of old network cards, but it didn't go far enough. Bergmann agreed that architectures are not the biggest problem at the moment; he mentioned the ISDN subsystem as one that should probably go.
Torvalds thought that perhaps it could be time to remove PCMCIA support, but he feared that some embedded systems still use it. That turns out to be the case, according to Bergmann. The oldest Arm processor currently supported is the StrongARM 11. Some Russian company recently bought a big pile of PDAs to use as tourist guides; these devices use that processor — and have PCMCIA slots. Hellwig also noted that there was a PCMCIA pull request for 4.20 on linux-kernel. Sometimes, though, hardware support can be removed even when users exist; Torvalds noted with chagrin that some dive computers use the recently removed IRDA infrared driver subsystem.
Bergmann said that he will often find a specific type of bug that turns out to be present in over 100 drivers; 70 of those, he said, are unlikely to be used by anybody. It would be good to find a way to remove those drivers. Torvalds said that, much of the time, it may turn out to be easier to just fix them. Bergmann answered that it's often obvious that the drivers are broken, but that doesn't mean they are entirely unused. Gleixner observed that the computing industry has trained users to not even blink an eye if their systems crash after a day of use; they just reboot and move on.
The session came to a close with some suggestions to add more old hardware to the various testing farms out there. But there was a distinct shortage of actionable solutions to the old-hardware problem, so that code is likely to be with us for some time yet.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the Maintainers Summit.]
The proper use of EXPORT_SYMBOL_GPL()
The kernel, in theory, puts strict limits on which functions and data structures are available to loadable kernel modules; only those that have been explicitly exported with EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() are accessible. In the case of EXPORT_SYMBOL_GPL(), only modules that declare a GPL-compatible license will be able to see the symbol. There have been questions about when EXPORT_SYMBOL_GPL() should be used for almost as long as it has existed. The latest attempt to answer those questions was a session run by Greg Kroah-Hartman at the 2018 Kernel Maintainers Summit; that session offered little in the way of general guidance, but it did address one specific case.
The kernel has had EXPORT_SYMBOL_GPL() for fifteen years now,
Kroah-Hartman said; its use is not mandatory. It is generally meant to
apply to core functions that cannot be used without the user being a
derived work of the kernel. But whether that is the case for specific
functions is not always obvious.
Andrew Morton was quick to raise the case that has been concerning him, relating to symbols exported for the heterogeneous memory management (HMM) subsystem. In particular, it makes some low-level memory-management functionality available to all modules, rather than just those with a GPL-compatible license. This export, Morton said, is "a big gift to NVIDIA", which needs it to use the HMM functionality in its closed-source modules. This export has upset a number of people including Dan Williams, who has been posting patches to change that export to EXPORT_SYMBOL_GPL().
Morton said that he didn't really want to get into the politics of the situation, but he needed to decide whether to apply Williams's patches, and that means deciding whether a GPL-only export would be more appropriate in this case. Christoph Hellwig was quick to argue that any users of the functionality in question can only be a derived work of the kernel. Linus Torvalds said that the initial point was to let hardware with its own memory-management unit handle its own page-table management, but that is not how the usage has actually turned out.
Hellwig said that there is other NVIDIA-specific code in the kernel that should probably be removed as well; support for NVLink was mentioned in particular. Arnd Bergmann said that there is a smaller pile of patents around AI applications (where NVLink is generally used) than around graphics, so there might be a better chance of getting that code opened eventually. Graphics drivers remain a problem, though.
Returning to the HMM issue, Morton summarized the feeling in the room as being in favor of merging Williams's patches. So, he said as the session (and the summit as a whole) came to a close, that is what he will do.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the Maintainers Summit.]
Compartmentalized computing with CLIP OS
People searching for a hardened Linux distribution have a wide range to choose from: they can use one of the security-focused offerings, or they can, with sufficient expertise, simply apply hardening patches and build everything to their taste. Such systems, of which Qubes OS is a good example, usually concentrate on the user's privacy. Recently, the French cybersecurity agency (ANSSI) released the source code for CLIP OS, its hardened operating system based on Linux. CLIP OS has been in development for more than ten years and, while sharing many elements with other hardened Linux distributions, this one is targeted to different needs: the focus is on providing maximum isolation between confidentiality levels and different users of the same system. As an illustration: the administrator is not able to access other users' data.
History: CLIP OS 4 and 5
According to the general description document [PDF, in French], CLIP OS started in 2005 as a research project aiming to provide a two-level client system that allowed simultaneous access to two networks from two "containers", one with sensitive data and the other without. This first prototype was built using FreeBSD 4.8. In early 2006, ANSSI decided to continue this development, this time based on Linux 2.6, implementing the same two-level, two-network configuration while adding new functionality like updates, audit, and administration. The result was CLIP version 3, which was deployed on more than 100 systems in a dedicated network. In addition to the desktop version (CLIP-RM), there was also a gateway variant (CLIP-GTW) for IPSec gateways.
Based on the experience of CLIP version 3, ANSSI began developing
version 4 in 2009. It added modular kernel configuration, new
network
connection modes (DHCP, WiFi, 3G
networks), new hardware support (sound cards, printers), the ability to transfer
files between the two levels of the system, and a graphical interface for
administrator tasks. A number of standard Linux
applications were supported, including
KDE 4, Mozilla Thunderbird, and LibreOffice. One
difference from other Linux distributions was visible in an
additional bar (the "confidentiality bar") on the desktop that as can be
seen in the screen shot on the right (from the
user documentation [PDF, in French]). The bar allowed the user to set the
confidentiality level and check the state of the desktop. In addition,
CLIP OS added its own security notifications (as seen in the screen shot,
lower right corner). There were additional security
measures visible in details outside the desktop; for example, the
set of kernel modules available
in the system was configured at installation time and could not be
changed later.
The source for version 4 was released, but as a "non-working reference source archive" that is not expected to build correctly. The security-related projects for this version are listed explicitly for those who would like to take a look.
ANSSI has now released the source code for an early stage of the next version, CLIP OS version 5. While it is missing most of the features from version 4, it is built from the start to be compiled by external people. The documentation is written in English this time, and there are detailed build instructions — a big change from version 4, which provided nearly all of its documentation in French. The project is hosted on GitHub, bug tracker included, with some activity involving both the ANSSI team and the public.
Design and main features
The design of CLIP OS 5 includes three elements: a bootloader, a core system, and the cages. The system uses secure boot with signed binaries. Only the x86 architecture was supported in the previous versions, and there are no other architectures in the plan for now. The core system is based on Hardened Gentoo. Finally, the cages provide user sessions, with applications and documents.
Processes running in separate cages cannot communicate directly. Instead, they must pass messages using special services on the core system; these services are unprivileged and confined in the cage system, but privileged in the core. These communication paths are shown in this architecture diagram from the documentation. Cages are also isolated from the core system itself — all interactions (system calls, for example) are checked and go through mediation services. The isolation between applications will be using containers, and the team plans to use the Flatpak format. The details of the CLIP OS 5 implementation are not available yet, as this feature is planned for the stable release.
A specific Linux security module (LSM) inspired from Linux-VServer will be used to add additional isolation between the cages, and between the cages and the core system. Linux-VServer is a virtual private server implementation designed for web hosting. It implements partitioning of a computer system in terms of CPU time, memory, the filesystem, and network addressing into security contexts. Starting and stopping a new virtual server corresponds to setting up and tearing down a security context.
Another central design point of CLIP OS is that the system is multi-level, meaning that it is designed to have users working with documents at different confidentiality levels. However, users may work with documents at different levels at the same time, while each cage can manipulate documents of one level only. Passing documents between cages (and thus confidentiality levels) requires a specific action from the user. In CLIP OS 4 it was completely integrated into the graphical user interface and all that was required was to choose the transfer option from the context menu (as documented for version 4 [PDF, in French]).
The CLIP OS 5 documentation refers to multiple security requirements and considerations at different levels. At the hardware level, UEFI secure boot and a trusted platform module (TPM) are needed. In addition to that, CLIP OS assumes that hardware-assisted virtualization is supported. All software will be built from source with hardened compilation options, with the exception of the firmware for certain devices. The sources themselves should be cryptographically signed by the maintainers or other "trusted sources" (for third party software). In future releases, builds will be binary reproducible.
Rich documentation (mostly in French) exists for the security features of CLIP OS 4. This includes, for example, patches for the "write XOR execute" policy that ensures that a program is able to execute only code provided by the system. The goal is to limit arbitrary code execution from user code. This is already possible for binary code in ELF files, but the CLIP OS team has modified the interpreters of popular scripting languages to support this feature: they have added an additional O_MAYEXEC flag for open() to check in the interpreter if the mount options for the underlying filesystem allow execution.
Current state of CLIP OS 5 and next steps
The source code that has been released corresponds to an alpha release, which does not include most of the security features. It contains the build system, the bootloader, and a basic system that allows kernel tuning (with a guide explaining the various options) and, notably, with a root partition mounted read-only. Further development on CLIP OS will be done in the open, without direct pushes from ANSSI. The project aims for public code review in Gerrit (currently being set up) and contribution of the modifications to upstream, including ebuilds to Gentoo and kernel patches. The plan is also to upstream the changes that have been already made in the previous versions.
A detailed roadmap shows that the security options will be added in the next beta release (which is in development). The plan includes an update system (with fallback, similar to what is available in Android), partition management with integrity and encryption, confined system services using features of systemd, new services, confined administration roles (admin and audit role; both non-root), and support for various hardware platforms.
Conclusions
CLIP OS has a different use case than most other security-related Linux distributions. It concentrates on strong isolation, with the threat model including untrusted users, administrators, and a local attacker with full access to the hardware. The features that will be developed may be interesting for other distributions and system builders. There is also a good chance that the patches may be generic enough to be integrated into the corresponding upstream projects. On the other hand, the question remains how much of the source code from previous versions is going to be used, or if the new version is a complete rewrite.
Transitioning from a closed, government project to open source brings a number of difficulties and risks, too. It remains to be seen how the team will handle upstreaming and interactions with the wider community. If ANSSI commits to making its security developments open source, for the benefit of a wider audience, this may result in a useful contribution to the free-software world. Those who can read French can review the detailed security documentation of the older versions to get insight in what is likely to come (security patches are already available for download and may be reused). It will be interesting to watch how the project evolves and if it lives up to its promise.
[See the slides from a recent Kernel Recipes talk [PDF] and the video of the talk for details of the security patches developed for CLIP OS.]
Page editor: Jonathan Corbet
Next page:
Brief items>>