Leading items
Welcome to the LWN.net Weekly Edition for January 10, 2019
This edition contains the following feature content:
- What should be in the Python standard library?: how many batteries does the Python core development community want to include?
- A new free-software forge: sr.ht: a look at an email-friendly software forge.
- The rest of the 5.0 merge window: some significant changes at the end of this merge window, including a user-visible change to mincore() to address a security issue.
- A setback for fs-verity: deep disagreements over how this feature should be implemented and presented to users.
- Pressure stall monitors: how to make pressure stall information useful for handsets.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
What should be in the Python standard library?
Python has always touted itself as a "batteries included" language; its standard library contains lots of useful modules, often more than enough to solve many types of problems quickly. From time to time, though, some have started to rethink that philosophy, to reduce or restructure the standard library, for a variety of reasons. A discussion at the end of November on the python-dev mailing list revived that debate to some extent.
Jonathan Underwood raised the issue, likely unknowingly, when he asked about possibly adding some LZ4 compression library bindings to the standard library. As the project page indicates, it fits in well with the other compression modules already in the standard library. Responses were generally favorable or neutral, though some, like Brett Cannon, wondered if it made sense to broaden the scope a bit to create something similar to hashlib but for compression algorithms. Gregory P. Smith had a different take, however:
If anything, it'd be nice to standardize on some stdlib namespaces that others could plug their modules into. Create a compress in the stdlib with zlib and bz2 in it, and a way for extension modules to add themselves in a managed manner instead of requiring a top level name? Opening up a designated namespace to third party modules is not something we've done as a project in the past though. It requires care. I haven't thought that through.
Steven D'Aprano objected
to Smith's assertion about the Python Package
Index (PyPI): "PyPI makes getting
more algorithms easy for *SOME* people.
" He noted that in many
environments (e.g. schools, companies) users cannot install additional
software on the computers they are using, so PyPI is not the panacea it is
sometimes characterized as.
That led Cannon to suggest
discussing the standard library and its role: "We have never really
had a discussion about how we want to guide the stdlib going forward
(e.g. how much does PyPI influence things, focus/theme, etc.).
"
Paul Moore wasn't
sure that discussing the matter would really resolve anything, though:
A larger standard library would help those without access to PyPI, Antoine
Pitrou argued,
while a smaller one does not provide huge benefits: "Python doesn't
become magically
faster or more powerful by including less in its standard
distribution: the best it does is make the distribution slightly
smaller.
" But there are definite downsides to having a large
standard library, Benjamin Peterson said:
- The [development] of stdlib modules slows to the rate of the Python release schedule.
- stdlib modules become a permanent maintenance burden to CPython core developers.
- The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists.
Steve Dower would rather see a smaller standard library with some kind of "standard distribution" of PyPI modules that is curated by the core developers. Later in the thread, he listed numerous different Python distributions as examples of what he meant, but that just highlighted another problem, Moore said: which of those should he recommend to his users? Right now, the standard library provides the base that a Python script can rely on:
Moore acknowledged that maintaining modules in the standard library has a
"significant cost
" but wondered if moving to the distribution
model was simply shifting those costs to users—without users gaining much
from it. Nathaniel Smith looked at the list of distributions and came to
a different conclusion: the "single-box-of-batteries
" model is
not really solving the problems it needs to solve.
It's really hard to tell whether specific packages would be good or bad additions to the stdlib, when we don't even know what the stdlib is supposed to be.
But Moore found that to be overstated somewhat. For him (and presumably
others), the standard library is what you can expect to find when you have
Python installed. That means that various things like StackOverflow
answers, tutorials, books, and so on can rely upon those pieces being
present, "much like you'd expect every
Linux distribution to include grep
". In addition, the "batteries
included" attribute is likely to have been part of what helped Python grow
into one of the most popular languages, D'Aprano said. "The
current model for the stdlib seems to be working well, and we mess
with it at our peril.
"
Nathaniel Smith sees
some advantages to the "standard distribution" model, though he is not sure
that it would really be the best option. "But what I like about it is that it could potentially reduce the conflict between what our different user groups need, instead of
playing zero-sum tug-of-war every time this comes up.
" Others
don't see it that way, though; "not every need can be solved by the
stdlib
", as Pitrou put
it. He continued:
Moore concurred: "In exploring alternatives, let's
not lose sight of the fact that the stdlib has been a huge success, so
we know we *can* deliver an extremely successful distribution based on
that model, no matter how much it might trigger regular debates :-)
"
In any case, as he pointed
out, a more concrete proposal (in the form of a PEP) is going to be
needed before any real progress can be made. Dower floated
some ideas about what a distribution might look like along the way, but,
without something like a PEP to discuss, participants are often
talking past each other based on their assumptions.
The topic has come up before on the Python mailing lists and at Python Language Summits. In 2015, there was a discussion at the summit on adding the popular Requests module to the standard library. Participants recognized that there were significant barriers—development pace, certificate handling, no asyncio support—to moving it into the standard library. In the end, it made sense for Requests to stay out. At the 2018 summit, Christian Heimes brought up a number of batteries that should perhaps be removed from the set, though the effort to create a PEP listing them seems to have stalled.
No firm conclusions were drawn in the discussion, but part of the underlying problem seems to be a lack of clarity on what the purpose of the standard library is. At the 2015 summit, Cannon suggested an informational PEP be drafted to solidify that; until that happens, there will be wildly differing views on what role the standard library serves. At the moment, though, there is no process to accept or reject a PEP even if one were on offer; that will have to await the new Python Steering Council, which will be elected in early February. One of the first orders of business of that group is likely to address the PEP process.
As far as adding LZ4 goes, the overall feeling from the thread is that it would be useful to have it in the standard library—at least for those not looking to change the standard library model. Adding LZ4 also requires a PEP, however, so that process may be stalled by the governance change, as well.
A new free-software forge: sr.ht
Many projects have adopted the "GitHub style" of development over the last
few years, though, of course, there are some high-profile exceptions that
still use patches and mailing lists. Many projects are leery of putting
all of their project metadata into a proprietary service, with limited
means of usefully retrieving it should that be necessary, which is why
GitLab (which is at least "open core") has been gaining some traction. A recently announced
effort looks to kind of bridge the gap; Drew DeVault's sr.ht ("the hacker's forge
")
combines elements of
both styles of development in a "100% free and open source software
forge
". It looks to be an ambitious project, but it
may also suffer from a lack of "social network" effects, which is part of
what sustains GitHub as the forge of choice today, it seems.
The announcement blog post is replete with superlatives about sr.ht, which is
"pronounced 'sir hat', or any other way you want
", but it is
a bit unclear whether the project quite lives up to all of that. It
combines many of the features seen at sites like GitHub and GitLab—Git
hosting, bug tracking, continuous integration (CI), mailing list
management, wikis—but does so in a way that "embraces and improves
upon the email-based workflow favored by git itself, along with many of the
more hacker-oriented projects around the net
". The intent is that
each of the separate services integrate well with both sr.ht and with
the external ecosystem so that projects can use it piecemeal.
There are two sides to the sr.ht coin at this point; interested users can either host their own instance or use the hosted version. For now, the hosted version is free to use, since it is still "alpha", but eventually one will need to sign up for a plan, which range from $2 to $10 per month, to stay on the hosted service. There are instructions for getting sr.ht to run on other servers; it uses nginx, PostgreSQL, Redis, and Python 3 along with a mail server and a cron daemon.
While, overall, the documentation is rather terse and a bit scattered, it is not difficult to get started using the service by following the tutorial. Logging in allows one to create a Git repository; adding an SSH public key to the account then allows pushing an existing repository up to the system. From there, it can be browsed, as shown in the core sr.ht repository, cloned by others, and so on.
![sr.ht-dev mailing list [sr.ht-dev mailing list]](https://static.lwn.net/images/2019/sr.ht-list-sm.png)
As mentioned, sr.ht has not taken the approach of being yet another GitHub clone. Instead, it is geared toward a mailing-list-centric approach, possibly using the sr.ht mailing list component. The sr.ht-dev mailing list (seen at right) provides an example of the user interface for that component. Unlike some other forges or mailing-list replacements, it is not JavaScript-heavy—in fact, sr.ht uses no JavaScript at all, so pages are small (less than 10KB on average) and load quickly.
There is a guide to the preferred development and collaboration style for sr.ht. It is based around git send-email to a mailing list with copies to potential reviewers, much like Linux kernel development is done. Instead of forking a repository on the server, as is done for GitHub and others, a local clone is made, changes are made and committed, then posted for review. Once a patch is ready for merging, maintainers can apply it using git am. As can be seen, this is much different than the web-centric "pull request" model used by GitHub and others.
Wikis for sr.ht can be created using the man component. Wikis are simply a Git repository of Markdown files that get converted to HTML and served when they are visited. In addition, any sr.ht Git repository can have a top-level README.md, which will be shown when the repository is browsed and could provide a link to a project-specific wiki.
The build and CI component, builds.sr.ht, is what DeVault calls
"the flagship product from sr.ht
". His announcement notes
that he has been working with both Linux and non-Linux (e.g. BSD, Hurd)
distributions to have them start using it because "it's the only
platform which can scale to the automation needs of an entire Linux
distribution
". He also says that smaller users are switching away
from Travis CI and Jenkins to builds.sr.ht.
The build manifests specify more than just how to build the project, "test" tasks can be specified as well. The manifests also specify the platform (e.g. Alpine Linux, FreeBSD) that should be used for the build and test tasks. Build manifests can be placed in particular locations (.build.yml, .builds/*.yml) in a Git repository in order to run them automatically when new code is pushed to the repository. More information about builds.sr.ht can be found in the tutorial, manual, and API reference.
There is also a bug/issue tracking
component called "todo". Its user manual is particularly brief
as of this writing ("TODO: write these docs
"). There are
other places one will run into missing documentation pages, perhaps most
critically for the code review page
that is referred to in the lists.sr.ht documentation for those new to
mailing lists. One would guess those holes will be filled in before too
long.
The project is written in Python 3 and licensed under the Affero GPLv3. As noted, it is an ambitious project, but one has to wonder whether the mailing-list-centric workflow will survive long term. The instructions on how to set up mail clients and descriptions of proper mailing-list etiquette may not sit well with newer developers. Email is painful to set up and use any more—many are finding alternatives far more attractive.
Ultimately, what a project like sr.ht needs in order to fill out its feature base, grow, and thrive is new projects. There is an existing stable of projects that are run in a way that is compatible with sr.ht, but not very many new projects are going that route—for good or ill. In addition, the social effects of GitHub (and, to a lesser extent, GitLab, at least in the free-software world) are a major piece of what makes that model so successful; it is hard to see sr.ht replicating that to any significant degree. It is an interesting project, though, and one that deserves well-wishes; for compatible projects looking for a home, it is certainly worth a look.
The rest of the 5.0 merge window
Linus Torvalds released 5.0-rc1 on January 6, closing the merge window for this development cycle and confirming that the next release will indeed be called "5.0". At that point, 10,843 non-merge change sets had been pulled into the mainline, about 2,100 since last week's summary was written. Those 2,100 patches included a number of significant changes, though, including some new system-call semantics that may yet prove to create problems for existing user-space code.The most significant changes merged in the last week include:
Architecture-specific
- The C-SKY architecture has gained support for CPU hotplugging, ftrace, and perf.
Core kernel
- There is a new "dynamic events" interface to the tracing subsystem. It unifies the three distinct interfaces (for kprobes, uprobes, and synthetic events) into a single control file. See this patch posting for a brief overview of how this interface works.
Hardware support
- Miscellaneous: NVIDIA Tegra20 external memory controllers, Qualcomm PM8916 watchdog timers, TQ-Systems TQMX86 watchdog timers, MediaTek Command-Queue DMA controllers, UniPhier MIO DMA controllers, Raspberry Pi touchscreens, Amlogic Meson PCIe host controllers, and Socionext UniPhier PCIe controllers.
- Pin control: NXP IMX8QXP pin controllers, Mediatek MT6797 and MT7629 pin controllers, Actions Semi S700 pin controllers, and Renesas RZ/A2 GPIO and pin controllers.
- Support for high-resolution mouse scroll wheels has been significantly improved.
Security
- A small piece of the secure-boot lockdown patch set has landed in the form of additional control over the kexec_load_file() system call. There is a new keyring (called .platform) for keys provided by the platform; it cannot be updated by a running system. Keys in this ring can be used to control which images may be run via kexec_load_file(). It has also become possible for security modules to prevent calls to kexec_load(), which cannot be verified in the same manner.
- The secure computing (seccomp) mechanism can now defer policy decisions to user space. See this new documentation for details on the final version of the API.
- The fscrypt filesystem encryption subsystem has gained support for the Adiantum encryption mode (which was added earlier in the merge window).
- The semantics of the mincore() system call have changed; see below for details.
Internal kernel
- The venerable access_ok() function, which verifies that
an address lies within the user-space region, has lost its first
argument. This argument was either VERIFY_READ or
VERIFY_WRITE depending on the type of access, but no
implementation of access_ok() actually used that
information. The new prototype is:
int access_ok(void *address, int len);
The patch implementing this change ended up modifying over 600 files. There have also been several follow-up patches fixing various issues created by this change.
Changing mincore()
The mincore() system call is used to determine which pages in a
virtual address-space range are currently resident in the page cache; the
idea is to allow an application to learn which of its pages can be accessed
without incurring page faults. As Torvalds notes in this
commit, the intended semantics of this call have always been
"somewhat unclear
", but its behavior all along has been to
indicate which pages are resident in the cache, regardless of whether the
calling process has ever tried to access those pages. In other words,
mincore() would reveal the presence of pages faulted in by other
processes running in the system.
Naturally, it turns out that if you can observe aspects of the system state that are the result of other process's activity, you can use that information to extract information that should be hidden. Daniel Gruss et al. have recently released a paper [PDF] showing how mincore() can be exploited in just this manner. In response, Jiri Kosina posted a patch allowing system administrators to turn mincore() into a privileged system call by way of a sysctl knob, but Torvalds wasn't pleased with that approach. He responded with a patch restricting the information returned by mincore() to anonymous pages and a small subset of file pages.
After Jann Horn pointed out that restricting the query to the calling process's page tables reduces the attack surface considerably, though, Torvalds decided to change his approach. As a result, the patch that was committed adds no new knobs, but does unconditionally restrict mincore() to pages that are actually mapped by the calling process — pages that said process has accessed at some point. That makes it much harder to use mincore() to observe what other processes are doing; as Torvalds pointed out, though, such observation is still theoretically possible, but harder.
So the easy attack is closed, but that additional security may come at the cost of creating problems for user space. As Torvalds noted in the changelog:
I'm hoping that nobody actually has any workflow that cares, and the info leak is real.
If the change breaks code in the wild, it may have to be reverted and some other solution found; for this reason, this patch has not been marked for inclusion into the stable kernels. For those out there who have code that uses mincore(), now would be a good time to test the new semantics to ensure that things still work as expected.
A couple of significant things were not merged before the merge window closed, including the controversial fs-verity patch set. Also missing again is the new filesystem mounting API, though some of the precursor patches did go in toward the end of the merge window. Unless something surprising happens, the feature set for this cycle is complete and the 5.0 kernel is now in the stabilization phase, with a final release expected in late February.
A setback for fs-verity
The fs-verity mechanism, created to protect files on Android devices from hostile modification by attackers, seemed to be on track for inclusion into the mainline kernel during the current merge window when the patch set was posted at the beginning of November. Indeed, it wasn't until mid-December that some other developers started to raise objections. The resulting conversation has revealed a deep difference of opinion regarding what makes a good filesystem-related API and may have implications for how similar features are implemented in the future.The core idea behind fs-verity is the use of a Merkle tree to record a hash value associated with every block in a file. Whenever data from a protected file is read, the kernel first verifies the relevant block(s) against the hashes, and only allows the operation to proceed if there is a match. An attacker may find a way to change a critical file, but there is no way to change the Merkle tree after its creation, so any changes made would be immediately detected. In this way, it is hoped, Android systems can be protected against certain kinds of persistent malware attacks.
There is no opposition to the idea of adding functionality to the kernel to detect hostile modifications to files. It turns out, though, there there is indeed some opposition to how this functionality has been implemented in the current patch set. See the above-linked article and this documentation patch for details of how fs-verity is meant to work. In short, user space is responsible for the creation of the Merkle tree, which must be surrounded by header structures and carefully placed at the beginning of a block after the end of the file data. An ioctl() call tells the kernel that fs-verity is to be invoked on the file; after that, the location of the end of the file (from a user-space point of view) is changed to hide the Merkle tree from user space, and the file itself becomes read-only.
Christoph Hellwig was the first to oppose the
work, less than two weeks before the opening of the merge window. The
storage of the Merkle tree inline was, he said, "simply not
acceptable
" and the interface should not require a specific way of
storing this data. He later suggested
that the hash data should be passed separately to the ioctl()
call, rather than being placed after the file data. Darrick Wong suggested a
similar interface, noting that it would give the filesystem a lot of
flexibility in terms of how the hash data would be stored.
Dave Chinner complained that
storing the Merkle tree after the end of the file was incompatible with how
some filesystems (XFS in particular) use that space. He described the
approach as being "gross
", arguing that it "bleeds
implementation details all over the API
" and creates problems far
beyond the filesystems that actually implement fs-verity:
Chinner, too, argued that the Merkle-tree data should be provided separately to the kernel, rather than being stored in the file itself using a specific format. Filesystem implementations could still put the data after the end of the existing data, but that is a detail that should, according to Chinner be hidden from user space.
Eric Biggers, the developer of fs-verity, responded that, while the API requires user space to place the Merkle tree after the end of user data, there is no actual need for filesystems to keep it there:
He also said that passing the Merkle tree in as a memory buffer is problematic, since it could be too large to fit into memory on a small system. (The size of this data also prevents it from being stored as an extended attribute as some have suggested.) Generating the hash data in the kernel was also considered, Biggers said, but it was concluded that this task was better handled in user space.
Ted Ts'o claimed repeatedly that there
would be no value to be had by changing the API for creating protected
files; he described the complaints as "really more of a philosophical
objection than anything else
". The requested API, he said, could be
added later (in addition to the proposed API, which would have to be
maintained indefinitely) if it turned out to be necessary.
After the discussion continued for
a while, he escalated the
discussion to Linus Torvalds, asking for a decision:
What came back might well have failed to please anybody in the discussion, though. It turns out that Torvalds has no real objection to the model of storing the hash data at the end of the file itself:
So that part I like. I think the people who argue for "let's have a separate interface that writes the merkle tree data" are completely wrong.
From there, though, he made it clear that he was not happy with the current implementation. This model, he said, should be independent of any specific filesystem, so it should be entirely implemented in the virtual filesystem layer. At that point, filesystems like XFS would never even see the fs-verity layer, so its implementation could not be a problem for them. A generic implementation would require no filesystem-specific code and would just work universally. He also disliked the trick that hides the Merkle tree after the fs-verity mode has been set; the validation data for the file should just be a part of the file itself, he said.
As Ts'o pointed
out, keeping the hash data visible in the file would create confusion
for higher-level software that has its own ideas about the format of any
given file. He also provided some reasons
for why he thinks filesystems need to be aware of fs-verity; they include
ensuring that the right thing happens if a filesystem containing protected
files is mounted by an older version of the filesystem code. Making
fs-verity fully generic would, he said, have forced low-level API changes
that would have affected "dozens of filesystems
", a cost that
he doesn't think is justified by the benefits.
The last message from Ts'o was sent on December 22; Torvalds has not
responded to it. There has not, however, been a pull request for
fs-verity
sent, and it is getting late in the merge window for such a thing to show
up. [Correction: a pull request was
sent copied only to the
linux-fscrypt mailing list; it has not received a response as of this
writing.] It seems likely that fs-verity is going to have to skip this
development cycle while the patches are reworked to address some of the
objections that have been raised — those from Torvalds, at least. Even
then, the work might be controversial; it is rare for the kernel to
interpret the contents of files, rather than just serving as a container
for them, and some developers are likely to dislike an implementation that
depends on that sort of interpretation. But if Torvalds remains in favor
of such an approach, it is likely to find its way into the kernel
eventually.
Pressure stall monitors
One of the useful features added during the 4.20 development cycle was the availability of pressure-stall information, which provides visibility into how resource-constrained the system is. Interest in using this information has spread beyond the data-center environment where it was first implemented, but it turns out that there some shortcomings in the current interface that affect other use cases. Suren Baghdasaryan has posted a patch set aimed at making pressure-stall information more useful for the Android use case — and, most likely, for many other use cases as well.As a reminder, the idea behind the pressure-stall mechanism is to track the amount of time that processes are unable to execute because they are waiting for resources (for CPU time, memory, and I/O bandwidth in particular). For example, reading /proc/pressure/memory will yield output like:
some avg10=70.24 avg60=68.52 avg300=69.91 total=3559632828 full avg10=57.59 avg60=58.06 avg300=60.38 total=3300487258
This output says that at least one process has been blocked waiting for memory 70.24% of the time over the last ten seconds, or 68.52% of the time over the last minute. In the last ten seconds, all processes have been stalled 57.59% of the time, indicating a system that is seriously short of memory. An orchestration system monitoring this system would see that over half the CPU time is going to waste because the demands on memory are too high; corrective action is probably indicated.
The Android runtime system also tries to manage the set of running processes to make the best use of the hardware while providing acceptable response times to the user. When memory gets tight, for example, background processes may be killed to ensure that the application the user is engaging with at the moment has the resources it needs to run quickly. The pressure-stall information has some obvious utility when it comes to this kind of automated resource management: it provides exactly the kind of information needed to determine whether the system's response time is being affected by a shortage of memory.
The problem, from the Android point of view, is that the information provided is too little and too late. The highest-resolution information available is aggregated over ten seconds; that is entirely adequate for most data-center settings, but it's far too slow for a device that is interacting directly with users. If it takes ten seconds to learn that the device is getting sluggish, the user is likely to be getting grumpy by the time any corrective action is taken. Such users might well conclude that they are better off not staring into their phone all day, and that would clearly be bad for the industry as a whole.
The answer to this problem is to extend the pressure-stall mechanism to allow for high-frequency monitoring of stall data. With the patch set applied, an interested application can open /proc/pressure/memory for write access, then write a line containing three pieces of information:
type stall-trigger time-window
The type value is either some (indicating that information about any stalled process is wanted) or full (limiting the information to full-system stalls where no process can run). stall-trigger indicates (in microseconds) the stall time that will trigger an event, and time-window is the time period over which that stall time happens. So, for example, writing:
full 100000 1000000
will cause the monitor to trigger when the system stalls for a minimum of 100ms over any 1s period. The minimum time-window is 500ms, while the maximum is 10s. The stall-trigger can also be expressed as a percentage value; "10%" asks for a stall time that is 10% of the given time window.
Having requested a stall notification, the application can then pass the file descriptor to poll(). An exceptional condition (POLLPRI) event will be returned whenever a notification is generated. A monitoring system can thus be notified within a half-second of the system starting to become unresponsive and act to address the situation. There can be multiple processes monitoring the same stall information with different triggers and time windows. As is the case with the current pressure stall information, the new mechanism is aware of control groups; opening the relevant files within a memory control-group hierarchy will provide information on the members of that group only.
The actual tracking of stall times has been kept simple to avoid adding to the load on the system. For each monitor, the accumulated stall time is checked ten times for each time window. If the current window is 50% past, the calculated stall value will be the time accumulated so far in this window, plus 50% of the total from the previous window. This mechanism assumes that the situation will not change hugely from one window to the next; the benefit is that it only has to store a single past value for each monitor. The monitoring is turned off entirely if no stall events are occurring, so its overhead should be zero on a lightly loaded system.
The end result, Baghdasaryan says, is good:
The functionality provided by this patch set seems clearly worthwhile, but the code itself is going to need a bit of work yet. The biggest complaint came from Peter Zijlstra, who doesn't like the elimination of the "idle mode" that stops data collection entirely when little is going on. Keeping the collection running will prevent the system from going into its deepest idle states, which will not be good for power consumption. Some sort of solution to that problem will need to be found before this code can go upstream.
There were also some comments on the string-parsing code added by the patch set; it may be simplified by eliminating the percentage option described above. Beyond that, it seems clear that this is a welcome addition to the system's load-monitoring functionality. Chances are it will find its way upstream before too long. How long it will be stalled before finding its way into production handsets is rather less clear, of course.
Page editor: Jonathan Corbet
Next page:
Brief items>>