Leading items

Welcome to the LWN.net Weekly Edition for November 8, 2018

This edition contains the following feature content:

Protecting the open-source license commons: Richard Fontana talks about the risks inherent in shared licenses and how we can protect our licensing.
A "joke" in the glibc manual: a new attempt to remove an old joke.
Zinc: a new kernel cryptography API: a Kernel Recipes talk on the new cryptographic subsystem underlying WireGuard.
4.20 Merge window part 2: the rest of what was merged for this development cycle.
Limiting the power of package installation in Debian: package installation can corrupt a system in many ways; what can be done to reduce the risk?
SpamAssassin is back: after a slow period, SpamAssassin is back up to speed and adding new features.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Protecting the open-source license commons

By Jonathan Corbet
November 1, 2018

OSSEU

Richard Fontana has a long history working with open-source licenses in commercial environments. He came to the 2018 Open Source Summit Europe with a talk that, he said, had never before been presented outside of "secret assemblies of lawyers"; it gave an interesting view of licenses as resources that are shared within the community and the risks that this shared nature may present. While our licenses have many good properties, including a de facto standardization role, those properties come with some unique and increasing risks when it comes to litigation.

Open-source licenses still matter, he said, even though many people have been downplaying their significance recently. Interest in the community has shifted to other kinds of governance issues, codes of conduct, for example. It is said that today's youth cares little about licenses and has less interest in the surrounding ideology, though he doesn't believe that. There is an increasing level of concern about the sustainability of many communities, and a sense that licenses are not a useful way to define modern open source.

Even so, licenses are still highly relevant for corporate users of open-source software, he said. They are the basic tools that make the whole thing possible. But licenses only matter if they are followed, which is why we are seeing increasing efforts to bring about voluntary compliance, and some increases in enforcement efforts as well.

Enforcement, especially involving version 2 of the GPL, has always been a part of the open-source landscape. It only reached the point of actual litigation in the early 2000s, where we saw enforcement efforts showing up in three broad classes. Community enforcement came directly from the developers, either individually or through organizations like the Software Freedom Conservancy (SFC). Commercial entities have done some enforcement, usually in support of an associated proprietary licensing model. And "non-community developers", such as Patrick McHardy, have been pursuing extortionate actions in search of commercial gain. These are the so-called copyright trolls, though he does not like that term. There has been an increase in all three types of enforcement in the last few years; one outcome has been the SFC enforcement principles that try to distinguish the first two types of enforcement from the last, he said.

A lot of thought has gone into enforcement at his employer Red Hat; Fontana said that enforcement activities should be judged by whether they promote collaboration or not. Enforcement that promotes certainty, predictability, and a level playing field will do that, while commercially motivated enforcement will reduce the incentive to collaborate. So he believes, like many others, that enforcement should not be done for commercial gain. Beyond that, there needs to be transparency around the funding of litigation and the selection of targets. Proceedings should be open; the secrecy built into the German legal system (where much enforcement activity to date has taken place) has not helped here. And, overall, litigation is a poor way to achieve license compliance.

The license commons

Software is a shared resource, a commons that we all benefit from and maintain; this is well understood in the development community. Outsiders do not fully understand that; they often only really learn about it when a disaster strikes, as when an underfunded project is hit by a severe security issue.

Fontana asserted that legal texts are a shared resource as well, even if that may be less obvious. Lawyers share and reuse legal language all the time with no concerns about licensing; that text is just assumed to be in the public domain. Proprietary licenses tend to reuse shared text; end-user license agreements tend not to. But, even with reused text, there is no standard proprietary license; each is unique. So a legal decision may have implications for similar licenses, but the lack of standardization puts limits on those implications. A bad ruling around one product's proprietary license does not necessarily affect other proprietary products.

Open-source licenses are different; they are truly shared licenses, of which there is only a small set. License proliferation has been heavily discouraged over the years, so there is almost no customization of licenses by individual projects. Licenses are shared between communities that may have different policy objectives. There are a lot of benefits to this sharing, including increased certainty and predictability, and the fact that interpretation discussions are not project-specific. But there are risks too, especially when it comes to litigation.

One might think that litigation would increase predictability by creating a body of case law around a license; this view is especially popular among lawyers who lack actual litigation experience. But each case is unique, and cases can have unusual or extreme facts. License interpretations in court will be fact-specific and the resulting decisions will be shaped by the arguments of the litigants — and by judges who are not familiar with open-source licenses. There is little opportunity for the community to influence decisions; all told, there is significant potential for any given case to yield bad results. And, given the standardization of licenses in the community, those results can affect a broad group of projects.

There is, he said, the potential for a lot of litigation to happen, because there are a lot of copyright holders out there. Communities may be stuck with bad decisions as a result. There is no easy solution at hand when one of those decisions comes down. There is, for example, often no license steward who could produce a new version of a license in response to a bad decision, so no license updates are possible. And even when an update is possible, there is a lot of pressure to avoid license revisions, and a difficult path to get a project to accept a new version of a license.

Protecting our licenses

So how can we protect our shared license resources? Fontana said that there can be value to litigation, but he is skeptical of it in general. We should, he said, be advocates for our licenses and look for ways to reduce both the likelihood and the impact of bad legal decisions. Among other things, that implies promoting community enforcement norms. We need to document our license interpretations, refute nonstandard interpretations, and promote modern interpretations that make compliance easier. McHardy, he said, has been trading on some strange interpretations of the GPL that should be refuted. New licenses should be drafted in public and updated more often.

One effort toward some of those goals is the GPL Cooperation Commitment (GPLCC), which seeks to promote community norms for license enforcement. It is based on the idea that licensees with good intentions should not be penalized for mistakes. One concrete step in that direction is extending the GPLv3 termination conditions to GPLv2, since the GPLv2 default is "harsh". This effort started with an enforcement statement put together by the kernel community, but it has since spread well beyond that. Quite a few companies have signed onto it, and more are on the way; it has also picked up signatures from around 200 developers. Efforts are being made to get all GPLv2 or LGPLv2 projects to adopt it; Red Hat now requires it for new GPL-licensed projects.

There have been some criticisms of the GPLCC, he acknowledged. Bruce Perens has said that the new commitment is hollow, since those companies won't enforce the GPL anyway and communities have always given violators more time to come back into compliance. Fontana's response is that companies are normally less forgiving than the community, so the GPLCC represents a change, and McHardy's enforcement was definitely counter to this promise. Bradley Kuhn has complained that the GPLCC has taken only one part of the SFC's enforcement principles, which were really designed to be adopted as a whole. And, according to Kuhn, even the savviest of companies need more than the 30 days given to come back into compliance. Fontana's answer here is that the whole thing is an experiment in establishing a norm that is worth pursuing.

Concluding with a look toward the future, Fontana said that just how license interpretations should be documented is still an open question. The GPLCC group will be looking at other aspects of the interpretation of the GPL with that in mind, and in the hope of preventing future McHardy-like incidents.

Q&A

After the talk, Fontana was asked about the community's work to avoid license proliferation and whether that was, in retrospect, a mistake. He replied that he always thought that proliferation was an overblown concern, and that the community was standardizing on a few licenses anyway. He has not been seeing many new licenses in recent years, though he did acknowledge that companies like MongoDB are trying to change that. The current tendency, though, is to play with the details of standardized licenses — an effort that is driven by the merits of those licenses. Standardization is good, he said, but it does carry a few risks.

Another audience member asked whether the community's interpretation of licenses really influences courts; he replied that, while there is no real evidence of it yet, there has always been an assumption that the courts would pay attention to the community's thoughts. But courts aren't really set up to take outside interpretations into account. The US has a mechanism for amicus briefs, but there are limits to what they can do and it may be harder to express community opinions to courts in other countries.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Comments (78 posted)

A "joke" in the glibc manual

By Jake Edge
November 7, 2018

A "joke" in the glibc manual—targeting a topic that is, at best, sensitive—has come up for discussion on the glibc-alpha mailing list again. When we looked at the controversy in May, Richard Stallman had put his foot down and a patch removing the joke—though opinions of its amusement value vary—was reverted. Shortly after that article was published, a "cool down period" was requested (and honored), but that time has expired. Other developments in the GNU project have given some reason to believe that the time is ripe to finally purge the joke, but that may not work out any better than the last attempt.

The joke in question refers to a US government "censorship rule" from over two decades ago regarding sharing of information about abortion. It is attached to documentation of the abort() call in glibc and the text of it can be seen in the patch to remove it. One might think that an age-old US-centric joke would be a good candidate for removal regardless of its subject matter. That it touches on a topic that is emotionally fraught for many might also make it unwelcoming—thus unwelcome in documentation. But, according to Stallman, that's not so clear cut.

The GNU project recently adopted the "GNU Kind Communications Guidelines", authored by Stallman, that seek to help maintain a welcoming tone in the project's communications. With that in mind, Matthew Garrett re-proposed removing the joke:

As documented in https://www.gnu.org/philosophy/kind-communication.html, GNU projects should aim to communicate in ways that are not unwelcoming. Multiple people have indicated that they found this joke unwelcoming, and in addition it is an unrelated and off-topic political issue: as the Guidelines say, "Please don't raise unrelated political issues in GNU Project discussions, because they are off-topic".

Carlos O'Donell, who is one of the glibc maintainers and who called for the cool-down period, was supportive of the patch (as he was back in May). He praised the new guidelines and said that he expected them to "cover all forms of communication including the manual, website, and social media, and not just email". But he studiously avoided talking about the content of the joke as a reason for removing it; instead he noted the confusion that it has caused along the way and that it "does not support the present intent of the manual, which is to provide accurate technical information for the GNU C Library".

O'Donell said that wanted to hear from Alexandre Oliva, who had reverted the change back in May, to see if he still had objections. Oliva replied that he did not think the guidelines should cover manuals, just interactive discussion forums, such as email, IRC, and social media. But he did concede that he may have misunderstood the intent of the guidelines and wanted to hear what Stallman had to say on that.

For his part, Stallman seems to agree with Oliva:

More precisely, the guidelines are about how we communicate in our discussions, not what ideas we communicate (as long as they are pertinent to the topic of the list and support the goal of the project).

These guidelines as such do not apply to manuals. Kindness as a general principle surely does apply to manuals, but precisely how remains to be decided.

He noted that he had recently added a statement into the GNU maintainer guide that "humor is welcome _in general_" and that the project rejects "the idea of 'professionalism' which calls for deleting humor because it is humor" (though that does not yet appear in the guide at the time of this writing). In order to even consider the question of the abort() joke, there are several "broader issues" that need to be resolved first, he said.

According to Stallman, the joke "opposes censorship", which is also a position of the GNU project, so the joke is "not an unrelated political issue". However, the oblique reference to a gag rule on abortion information, which was imposed on organizations receiving US aid off and on since 1984, may not really come through in the joke. Even many US-based glibc users might be hard-pressed to link it to the Mexico City policy that it is targeting. Even if they did, a joke buried in a manual for an unrelated C library is not likely to have any real impact on the rule (which has been rescinded by Democratic presidents and reinstated by Republican presidents since it was first enacted).

When pressed for more information about what these larger issues are, as O'Donell did, Stallman counseled patience. He did not offer any more information than that; perhaps the discussion has moved to a private mailing list or the like.

For many, including me, it is a little hard to understand why there is any opposition to removing the joke at all. It is clearly out of place, not particularly funny, and doesn't really push the GNU anti-censorship philosophy forward in any real way even if you grant that anti-censorship is a goal of the project (which some do not). There are, of course, those who oppose removing it because they are opposed to "political correctness" and do not see how it could be "unwelcoming", but even they might concede that it is an oddity that is poked into a back corner of a entirely unrelated document. And it is not hard for many to see that tying the topic of abortion to a C function might be upsetting to some; why waste a bunch of project time defending it when it has effectively no impact in the direction that Stallman wants, while putting off some (possibly small) percentage of glibc manual readers?

As was noted in the article back in May, the GNU project is run by a (hopefully benevolent) dictator in Stallman. Ultimately, he gets to decide what goes into project communications and can dictate the tone for its community (thus the guidelines). It is a bit weird to claim that all project communications except the manuals need to be "kind"; Stallman hasn't exactly said that, but that is kind of how it comes across. Digging in his heels, for unclear reasons, on this particular issue just seems like something a benevolent dictator might find a way to avoid.

Comments (116 posted)

Zinc: a new kernel cryptography API

November 6, 2018

This article was contributed by Tom Yates

Kernel Recipes

We looked at the WireGuard virtual private network (VPN) back in August and noted that it is built on top of a new cryptographic API being developed for the kernel, which is called Zinc. There has been some controversy about Zinc and why a brand new API was needed when the kernel already has an extensive crypto API. A recent talk by lead WireGuard developer Jason Donenfeld at Kernel Recipes 2018 would appear to be a serious attempt to reach out, engage with that question, and explain the what, how, and why of Zinc.

WireGuard itself is small and, according to Linus Torvalds, a work of art. Two of its stated objectives are maximal simplicity and high auditability. Donenfeld initially did try to implement WireGuard using the existing kernel cryptography API, but after trying to do so, he found it impossible to do in any sane way. That led him to question whether it was even possible to meet those objectives using the existing API.

By way of a case study, he considered big_key.c. This is kernel code that is designed to take a key, store it encrypted on disk, and then return the key to someone asking for it if they are allowed to have access to it. Donenfeld had taken a look at it, and found that the crypto was totally broken. For a start, it used ciphers in Electronic Codebook (ECB) mode, which is known to leave gross structure in ciphertext — the encrypted image of Tux on the left may still contain data perceptible to your eye — and so is not recommended for any serious cryptographic use. Furthermore, according to Donenfeld, it was missing authentication tags (allowing ciphertext to be undetectably modified), it didn't zero keys out of memory after use, and it didn't use its sources of randomness correctly; there were many CVEs associated with it. So he set out to rewrite it using the crypto API, hoping to better learn the API with a view to using it for WireGuard.

The first step with the existing API is to allocate an instance of a cipher "object". The syntax for so doing is arguably confusing — for example, you pass the argument CRYPTO_ALG_ASYNC to indicate that you don't want the instance to be asynchronous. When you've got it set up and want to encrypt something, you can't simply pass data by address. You must use scatter/gather to pass it, which in turn means that data in the vmalloc() area or on the stack can't just be encrypted with this API. The key you're using ends up attached not to the object you just allocated, but to the global instance of the algorithm in question, so if you want to set the key you must take a mutex lock before doing so, in order to be sure that someone else isn't changing the key underneath you at the same time. This complexity has an associated resource cost: the memory requirements for a single key can approach a megabyte, and some platforms just can't spare that much. Normally one would use kvalloc() to get around this, but the crypto API doesn't permit it. Although this was eventually addressed, the fix was not trivial.

So Donenfeld's experiences left him convinced that although the current crypto API has "definitely been developed by some smart people who can really code [...] who can really push the limits of what we're used to in C", it is a big, fancy enterprise API that is hard to use, which means that people often use it wrong. It's also, he said, a museum of ciphers. Primitives and their multiple implementations lie stacked about the place, some getting quite dusty — MD4 is still in there, for example. It's hard to tell who wrote any given implementation, or whether it has been formally verified or how widely used it is, which makes it hard to know what's reliable. He has a strong preference for formally verified code. Failing that, he prefers code that is in widespread use and has received a lot of scrutiny, noting that these are often the fastest implementations as well. And failing that, he prefers code based on the reference implementations.

So Zinc's approach, and this is where feathers started to get ruffled, is not to be an API at all; it's just functions. Donenfeld argues that functions are well-understood in C, and people know about and are comfortable with them. Zinc is aiming for high-speed and high-assurance; these are easier goals to achieve without a big API, as is formal verification. Moreover, "tons of code" has already leaked out of the kernel and into lib/; it's clear that programmers want functions and Zinc is prepared to provide them in a non-haphazard way.

As for formal verification, there are apparently several teams working on that for crypto code, including MIT's fiat-crypto project and INRIA's HACL*. The latter project takes the approach of modeling the algorithm in F* and proving the model correct, which F* is designed to optimize. Then — in a term of art which never fails to make me think of Arnold Schwarzenegger's Terminator descending into a bath of molten metal — the model is "lowered into" C (or in some cases, all the way into assembly language). According to Donenfeld, this produces C which, though slightly non-idiomatic, is surprisingly readable, and much more likely to be bug-free than human-written code. It also produces some of the fastest C implementations that exist, which he suspects is because the formal verification process removes certain things that are not obviously removable when you're working the mathematics out by hand. In addition to using formal verification, all Zinc code has been, and will continue to be, heavily fuzzed.

Donenfeld has been working with the INRIA team to get as much as possible of their work into Zinc, and is trying generally to improve relations between the kernel community and academic cryptographers. He feels that the people who design cryptographic primitives, and their hordes of capable graduate students, generally don't come anywhere near kernel development, and that it's our loss.

Cryptographic primitives in Zinc are organized differently than the current API. Code is organized by the name of the cipher; for example, the ChaCha20 cipher lives under lib/zinc/chacha20/, where you can find the generic C implementation chacha20.c as well as architecture-specific assembly versions including chacha20-arm.S and chacha20-x86_64.S. Donenfeld feels this invites contribution in an approachable and manageable way. It also allows architecture implementation selection not via function pointers but by compiler inlining, "which makes things super fast" — and the absence of function pointers means no retpoline-induced slowdowns.

Zinc currently implements the ChaCha20 stream cipher, the Poly1305 message authentication code (MAC), the Blake 2s hash function, and Curve25519 for elliptic-curve cryptography. This is a long way from a complete replacement of the current API; it's essentially just what WireGuard uses. But Donenfeld and Samuel Neves have started the work of refactoring the existing crypto API to use Zinc functions under the hood. He also tried re-implementing big_key.c in Zinc, which ended up removing over 200 lines of a 500-line file and replacing them with 28 lines of new code.

Some of the current API will not be so easily refactored. One question from the floor asked about handling asynchronous callback for hardware crypto accelerators, which are important in lower-power CPU environments such as some ARM chips. Donenfeld's response was that he didn't like the idea of "polluting" the Zinc API by evolving it to handle asynchronous callback. He said that it would be better to build a layer on top of Zinc that either invoked Zinc for in-CPU crypto or handed the request out to the external hardware, which essentially dodges the question. It does appear that he has some clear views about how people should use crypto; uses that don't fit into those views aren't really his top priority.

Two easier questions related to naming and to learning opportunities. As to the former, Zinc apparently now stands for "Zinc Is Nice Cryptography"; the suggestion it might stand for "Zinc Is Not a CryptoAPI" elicited some laughter. The name also fits with the elemental naming scheme used in crypto projects like Sodium and libchloride. The other question was motivated by the questioner finding it increasingly difficult to know how to respond to requests for a good place for beginners to get started with kernel work. Donenfeld felt that Zinc may not be a great place to start contributing, but that the nature of the code and the tasks it performs — it's pure C, it takes in data, modifies it, and writes it out — makes it a great place to start reading and understanding.

Although his frustration with Zinc's reception by the kernel community occasionally leaked through, it's clear that Donenfeld has learned from the response to his initial 24,000-line patch. Talks like this are part of the outreach. But the process is made easier by the legitimate criticisms he makes of the current API and that, even though Zinc doesn't do everything the crypto API currently does, what it does do it does really well. I suspect that, in some form not yet determined, we will all benefit from this work for some time to come.

[We would like to thank LWN's travel sponsor, The Linux Foundation, for assistance with travel funding for Kernel Recipes.]

Comments (23 posted)

4.20 Merge window part 2

By Jonathan Corbet
November 5, 2018

At the end of the 4.20 merge window, 12,125 non-merge changesets had been pulled into the mainline kernel repository; 6,390 came in since last week's summary was written. As is often the case, the latter part of the merge window contained a larger portion of cleanups and fixes, but there were a number of new features in the mix as well.

Architecture-specific

The MIPS architecture has gained support for kexec on many sub-architectures.
Support for the C-SKY processor architecture has been added to the kernel.

Core kernel

The pressure-stall information patch set has been merged. It creates a new set of kernel interfaces giving better information on just what is slowing the system down.
The new "udmabuf" pseudo-device allows user-space code to convert a memfd region into a dma-buf structure; it is intended for use in QEMU.
The syntax for accessing data from kprobes has been extended to allow easier access to arrays and function arguments. This merge commit gives an overall picture of the changes.

Filesystems and block layer

There are two new ioctl() commands for working with zoned devices: BLKGETZONESZ to get the zone size, and BLKGETNRZONES to get the number of zones. Both will return zero for normal (non-zoned) block devices.
The fanotify_mark() system call has gained a new FAN_MARK_FILESYSTEM mark type; it can be used to watch all events happening within a filesystem.
Server-side support for the NFS 4.2 asynchronous copy protocol has been added.
The UBIFS filesystem has a new authentication feature meant to prevent attacks via corrupted data structures; see this document for details.

Hardware support

Clock: Qualcomm SDM845 camera clock controllers, Qualcomm SDM660 and QCS404 global clock controllers, and Ingenic JZ4725B clocks.
Graphics: Rockchip RGB output controllers.
Industrial I/O: Qualcomm SPMI PMIC5 analog-to-digital converters (ADCs), Analog Devices ADXL372 3-Axis accelerometers. Microchip Technology MCP3911 ADCs Linear Technology LTC1660/LTC1665 digital-to-analog converters (DACs), and STMicroelectronics VL53L0X ToF ranger sensors.
Miscellaneous: STMicroelectronics STM32 thermal sensors, Marvell Armada 37xx watchdog timers, Toshiba TC358764 DSI/LVDS bridges, NXP i.MX pixel pipelines, Sony IMX319 and IMX355 sensors, Xilinx ZynqMP Ultrascale+ clock controllers, Qualcomm ADSP peripheral image loaders, and Allwinner sunXi video decoders.
USB: Cadence MHDP DisplayPort PHYs, Marvell PXA USB PHYs, UniPhier USB2 and USB3 PHYs, and Rockchip INNO HDMI PHYs.
The media subsystem has a new experimental "request API" meant to support frame-to-frame parameter changes in devices with that capability. See this commit for documentation on the user-space API for this feature.

Security

After a number of ups and downs, the "STACKLEAK" GCC plugin has finally been merged into the mainline. This plugin works to keep information from leaking out of the kernel via uninitialized on-stack variables.

Internal kernel changes

The XArray data structure, a reworking of the radix tree structure, has been merged at last and the page cache has been converted to use it.
Kernel builds now use -Wvla to warn about the use of variable-length arrays. That has become possible because the task of removing VLAs has finally reached its conclusion (or something close to it).
The new list_bulk_move_tail() list function will move a subsection of the list to the tail.
Two file_operations methods — clone_file_range() and dedupe_file_range() — have been combined into the new remap_file_range() method, since there was a fair amount of overlap between them. All in-kernel filesystems have been updated.

One feature that didn't quite get in was the new filesystem mounting API, which was sent to Linus but then ran into some opposition. It is likely to be restructured so that the internal virtual filesystem changes go in first, with the user-visible API changes happening later. It is possible, though perhaps unlikely, that the internal changes could still be pulled in the near future.

Now it's a matter of stabilizing all of that new code for the final release which, if the usual schedule holds, can be expected just before the end of the year.

Comments (none posted)

Limiting the power of package installation in Debian

By Jake Edge
November 7, 2018

There is always at least a small risk when installing a package for a distribution. By its very nature, package installation is an invasive process; some packages require the ability to make radical changes to the system—changes that users surely would not want other packages to take advantage of. Packages that are made available by distributions are vetted for problems of this sort, though, of course, mistakes can be made. Third-party packages are an even bigger potential problem because they lack this vetting, as was discussed in early October on the debian-devel mailing list. Solutions in this area are not particularly easy, however.

Lars Wirzenius brought up the problem: "when a .deb package is installed, upgraded, or removed, the maintainer scripts are run as root and can thus do anything." Maintainer scripts are included in a .deb file to be run before and after installation or removal. As he noted, maintainer scripts for third-party packages (e.g. Skype, Chrome) sometimes add entries to the lists of package sources and signing keys; they do so in order to get security updates to their packages safely, but it may still be surprising or unwanted. Even simple mistakes made in Debian-released packages might contain unwelcome surprises of various sorts.

He suggested that there could be a set of "profiles" that describe the kinds of changes that might be made by a package installation. He gave a few different examples, such as a "default" profile that only allowed file installation in /usr, a "kernel" profile that can install in /boot and trigger rebuilds of the initramfs, or "core" that can do anything. Packages would then declare which profile they required. The dpkg command could arrange that package's install scripts could only make the kinds of changes allowed by its profile. Mostly, he wanted to spark some discussion:

This is a quick thought, while I was trodding in the dark, wet, cold evening to the grocery store. It's not a full specification, and it may well not solve all problems that may happen when installing a broken or malicious .deb. I'd like for us to solve at least the more glaring problems, rather than throw our hands up and say it's [too] difficult a problem. I'd like to be safe from my own mistakes, and if that means our users are more safe and secure as well, that's a good thing.

As Paul Wise pointed out, though, maintainer scripts are not the only problem. There are lots of ways ("setuid binaries, cron jobs, systemd units, apt keyring information, sudoers files and so on") that a malicious package could compromise the system. He pointed to a wiki page created by Antoine Beaupré to document the problem of untrusted .deb files along with some possible solutions. Wise suggested that Flatpak might also provide part of the solution.

Beaupré agreed that there are numerous problem areas in the current installation mechanism. He would like to see Debian look at fixing some of them, even if it doesn't lead to fixing them all:

Yes, there are a billion things that could go wrong in the current approach, but if we had *some* safety net, controlled in the sources.list file, we could at least restrict what third-party packages would do.

For example, there's no reason why a package like Chromium should be able to run stuff as root. The vast majority of third-party repositories out there mostly ship this one binary that does not require special privileges other than installing stuff in /usr, without suid or any special permissions.

He suggested some low-hanging fruit, like changing maintainer scripts to use a declarative syntax rather than be open-ended shell scripts. Switching Debian to use Flatpak, which wasn't quite what Wise meant, "would be a rather controversial change", but the Flatpak project is working to address many of the problems under discussion, so it may make sense to look at it in more detail, Beaupré said.

Protecting against malicious .deb files is too high of a hurdle to realistically clear, according to several in the thread. W. Martin Borgert said:

However, I would not try to see this work too much as means of defense against malicious deb packages. This leads to a wrong, non-achievable goal. I see it as a means to provide better quality, predictable system state, and safety against bugs.

Ralf Treinen concurred with that, but noted that it is difficult to know if a script is doing something that it shouldn't. "Having a declaration of what the maintainer thinks are the possible effects of a script would certainly help us." But Guillem Jover rejected the whole idea of ever securely installing untrusted .deb files: "If you do not trust the .deb, then you should not be even installing it using dpkg, less running it unconfined, as its paradigm is based on fully trusting the .debs, on using shared resources, and on system integration." He went on to list multiple areas that would need attention, concluding that if all of those things were fixed, it wouldn't be much like today's packages "except for the container format. :)".

While protecting against malicious packages may not really be in the cards for .deb files, Flatpak (or other, similar ideas such as Ubuntu's .snap packages) may eventually provide some of that kind of protection. That is not really something that Debian can control; users have shown that they want various third-party tools and the makers of those tools are going to do what they think they need to do in terms of installation. If Debian gets in the way of that, it is likely to lead to it becoming less relevant.

The way that packages are built and maintained for Debian is already complicated enough, Tomas Pospisek said: "I think Linux systems per se, Debian as a runtime, the (social) processes required from DDs/DMs, the whole technical Debian packaging ecosystem are each plenty complex enough already." Adding to that will just make it "less fun" leading to fewer Debian Developers and Maintainers (DDs/DMs), less software being packaged, and, ultimately, fewer users.

There is appeal to protecting packagers and users from silly errors; as Paride Legovini put it: "I know I won't screw up anybody's system with a font package as I restricted it to /usr/share and /etc/fonts." But the problem with things like Chrome and Skype that Wirzenius started out with is not really amenable to the (not fully baked) solution he described. Profiles might help Debian package maintainers, but are not likely to help with third-party packages. Legovini continued:

But I don't think it would solve the problem you pose. Who is going to set the profile? If if is the 3rd party packager, they will just use 'core' and install their APT source (or whatever they want). If it can be specified or overridden by the user at install time, then trying to install Skype with ‑‑profile=default will make the installation fail, and the user will just resort to ‑‑profile=core.

Some kind of declarative system for package installation is one solution that could help. Packages could only perform the actions allowed by the package manager—not arbitrary shell commands. Simon Richter described what that might look like in some detail. Wise also pointed out that the dpkg maintainers do have declarative packaging "on their radar".

No grand conclusions came from the discussion, but that is partly because there were several different aspects under consideration. Preventing Debian packagers from inadvertently causing installation errors is a much more tractable problem than trying to prevent malicious .deb files from wreaking havoc. And non-malicious third-party packages, though possibly buggy or taking undue liberties, are yet again a somewhat different problem. The problem is not limited to Debian either, of course; package installation on any distribution is likely to suffer from most of the same problems. It is good to see discussion in this area, if only to keep it in mind as packaging evolves down the road.

Comments (53 posted)

SpamAssassin is back

By Jonathan Corbet
November 2, 2018

OSSEU

The SpamAssassin 3.4.2 release was the first from that project in well over three years. At the 2018 Open Source Summit Europe, Giovanni Bechis talked about that release and those that will be coming in the near future. It would seem that, after an extended period of quiet, the SpamAssassin project is back and has rededicated itself to the task of keeping junk out of our inboxes.

Bechis started by noting that spam filtering is hard because everybody's spam is different. It varies depending on which languages you speak, what your personal interests are, which social networks you use, and so on. People vary, so results vary; he knows a lot of Gmail users who say that its spam filtering works well, but his Gmail account is full of spam. Since Google knows little about him, it is unable to train itself to properly filter his mail.

Just like Gmail, SpamAssassin isn't the perfect filter for everybody right out of the box; it's really a framework that can be used to create that filter. Getting the best out of it can involve spending some time to write rules, for example. Most of the current rule base is aimed at English-language spam, which isn't helpful for people whose spam comes in other languages. Another useful thing to do is to participate in the MassCheck project, which can quickly evaluate the effectiveness of new rules on a large body of spam. In particular, MassCheck performs a nightly run to check the hit rate of rules to determine how those rules are performing in real installations. It can also check for overlap; if two rules always trigger on the same messages, there isn't really a need for both of them. This information feeds into the RuleQA database to give a picture of how the rules are working overall.

SpamAssassin is not just for email filtering, Bechis said; some sites are using it to detect spam submitted in web forms, for example.

So what is new in SpamAssassin? There has been a lot of work by the project's system administration team, he said, to update the infrastructure. That has resulted in the rebuilding of the MassCheck implementation from scratch. The 3.4.2 release contained fixes for four security bugs, and also an important workaround for a Perl bug that was only triggered on Red-Hat-based distributions. Startup time has been improved, and SSLv3 support has been removed. The "freemail antiforge" mechanism, which seeks to detect forged Gmail messages, has been improved. The geo-aware scoring system can adjust scores based on which continent the mail came from. The URILocalBL plugin, which can blacklist URLs based on information like where they are hosted, has seen a number of improvements.

3.4.2 Also saw the addition of the HashBL plugin, which can be used to block email addresses from domains that cannot be blocked wholesale. There is a new anti-phishing plugin that can filter on URLs commonly found in phishing emails. The new ResourceLimits plugin can put limits on the amount of CPU and memory used by SpamAssassin. And the FromNameSpoof plugin tries to detect attempts to confuse users about the source of an email using the full-name field.

Some future plugins include a couple that are aimed at detecting Microsoft Office attachments containing macros. There is one for checking URLs from URL-shortening services; it will filter based on the final destination of those URLs. The KAM.cf ruleset is an unofficial addition that can allow sites to respond more quickly to new spam campaigns, but at a cost of more false positive results. Also coming is a set of international channels that will carry signed rulesets designed for different parts of the planet.

The SpamAssassin 4.0 release can be expected around January, Bechis said. It will include full UTF-8 support that has been completely rewritten, with better detection of east-Asian languages. The TxRep plugin, which applies scores to messages depending on the reputation of the sender, is being improved and will be able use PostgreSQL 10. The Office macro and URL shortener plugins will be in this release, but another new plugin to check for suspicious URLs inside attachments will have to wait until 4.1.

Further in the future, the project plans to update its approach to machine learning. The current code is getting old, and there is interest in applying deep-learning techniques to the spam-detection problem. There was a Google Summer of Code project that attempted to make progress in that area but it didn't succeed, so more work is needed.

When asked about whether the SpamAssassin project had really slowed down as much as its release history suggests, Bechis conceded that it had. A number of people had left the project, and there were infrastructure problems that blocked the rule-generation process. But the situation has since improved, he said. The project has picked up a new set of developers and is moving forward again. Certainly the world can only benefit from better spam filtering.

The slides from this talk [PDF] are available.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Comments (13 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>