Leading items
Welcome to the LWN.net Weekly Edition for November 15, 2018
This edition contains the following feature content:
- C library system-call wrappers, or the lack thereof: a collaboration problem between the kernel and the GNU C Library.
- Debian, Rust, and librsvg: using Rust in librsvg makes life hard for some Debian ports.
- ktask: optimizing CPU-intensive kernel work: a proposed kernel subsystem for parallel processing.
- Device-tree schemas: a new way to define and validate device-tree bindings.
- A report from the Automated Testing Summit: progress in automatic kernel testing.
- iwd: simplifying WiFi management: an easier alternative to wpa_supplicant.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Note that, in observance of the (US) Thanksgiving holiday, there will be no LWN.net Weekly Edition next week. We will return, well fed, with the November 29 edition. We plan to continue to post articles during the Thanksgiving week, so stay tuned.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
C library system-call wrappers, or the lack thereof
User-space developers may be accustomed to thinking of system calls as direct calls into the kernel. Indeed, the first edition of The C Programming Language described read() and write() as "a direct entry into the operating system". In truth, user-level "system calls" are just functions in the C library like any other. But what happens when the developers of the C library refuse to provide access to system calls they don't like? The result is an ongoing conflict that has recently flared up again; it shows some of the difficulties that can arise when the system as a whole has no ultimate designer and the developers are not talking to each other.
Calling into the kernel is not like calling a normal function; a special trap into the kernel must be triggered with the system-call arguments placed as the kernel expects. At a minimum, the system-call "wrapper" provided by the C library must set up this trap. In many cases, more work than that is required; the functionality provided by the kernel does not always exactly match what the application (or the relevant standards) will expect. Features like POSIX threads further complicate the situation. The end result is that a lot of work can be happening between the application and the kernel when a system call is made. Doing that work is, in most cases, delegated to the C library.
System calls in glibc
Many Linux systems use the GNU C Library (glibc) in this role; glibc is often thought of as the Linux C library. When the kernel developers add a new system call, it is thus natural to expect that a corresponding wrapper will show up in glibc, but there is no guarantee that this will ever happen. The addition of wrappers to glibc is often slow and, in some cases, the glibc developers have refused to add the wrappers at all. In such cases, user-space developers must fall back on syscall() to access that functionality, an approach that is both non-portable and error-prone.
Recently, frustration with this situation led Daniel Colascione to ask:
It is worth noting, as Michael Kerrisk did, that it's not really true that glibc is no longer adding wrappers; quite a few have found their way into recent releases. But there are some notable exceptions, the most glaring of which is probably gettid(), which has been under discussion for over a decade with no real resolution in sight. Kerrisk suggested that, in most cases, the problem was simply a lack of developers on the glibc side and said that kernel developers should take more responsibility for the creation of glibc wrappers for new system calls:
Glibc developer Florian Weimer stated
clearly that it's not just a matter of developer time, though:
"It's not a matter of resources or lack thereof
". In another
message he explained
why many system calls lack glibc wrappers, with a number of specific
examples. "A lot of the new system
calls lack clear specifications or are just somewhat misdesigned
".
In other cases, new system calls — such as renameat2() — use names
that glibc had already used for other functions. Reasons vary, but the end
result is the same for a number of system calls: no glibc wrappers to go
along with them.
According to Colascione (and others like Willy Tarreau), the proper answer is to provide low-level system-call wrappers with the kernel itself:
In this view, glibc would retain all of the higher-level C-library
functions, ceding only the system-call wrappers to this new library. But,
according
to Weimer, it's not so simple: circumventing glibc for system calls
would break features, many associated with threading.
Or, as Zack Weinberg put
it: "The trouble is that 'raw system call wrappers and arcane
kernel-userland glue' turns out to be a lot more code, with a lot more
tentacles in both directions, than you might think
".
It's not just a matter of breaking things out into a separate library.
Overcoming the impasse
Arguably, what this whole discussion is really showing is that there need to be better lines of communication between kernel and C-library developers. It takes developers from both groups to actually make a feature available to user space after all; it would make sense for kernel developers — who are not always known for the best API designs — to talk more with the library developers who must actually support an API for application developers. Those communications are better now than they were some years ago, but one could argue that this is a low bar that has not been surmounted by much.
One complication there is that glibc is not the only C library that runs over the Linux kernel; it's not even the most popular one if one looks at the number of installed copies — that title surely belongs to the bionic library used by Android. The Linux community would be well served by a forum where developers from all C libraries could interact with kernel developers to address API problems before they are set into stone. The linux-api mailing list ostensibly serves that purpose now, but it is underused even before considering the absence of C-library developers there.
Once upon a time, all operating systems had an overall architect who would be responsible for ensuring coordination between the various layers of the system, but Linux lacks such a person. So developers have to find ways to coordinate on their own. Arguably, one place where this should be happening is the Linux Plumbers Conference, which starts November 13 in Vancouver. There is indeed a relevant session on the agenda, but it's not clear how many of the necessary developers will be there.
Free-software projects tend to value their independence; their developers have little time for others who would tell them what to do. But few projects truly stand alone. Whenever developers decide to cooperate more fully with related projects, the result tends to be better software for the community as a whole. The design and delivery of system calls would appear to be one of those places where a higher level of communication and cooperation would be a healthy thing. That, rather than trying to absorb low-level wrappers into the kernel project, seems like the proper long-term solution to this problem.
Debian, Rust, and librsvg
Debian supports many architectures and, even for those it does not officially support, there are Debian ports that try to fill in the gap. For most user applications, it is mostly a matter of getting GCC up and running for the architecture in question, then building all of the different packages that Debian provides. But for packages that need to be built with LLVM—applications or libraries that use Rust, for example—that simple recipe becomes more complicated. How much the lack of Rust support for an unofficial architecture should hold back the rest of the distribution was the subject of a somewhat acrimonious discussion recently.
The issue came up on the debian-devel mailing list when John Paul Adrian Glaubitz complained
about the upload of a new version of librsvg to unstable.
Librsvg is used to render Scalable
Vector Graphics (SVG) images; the project has recently been switching
some of its code from C to Rust, presumably for the memory safety offered
by Rust. Glaubitz said that the new "Rust-ified" library had been uploaded
with no warning when the package maintainer "knows very well that
this particular package has a huge number of reverse dependencies and would cause
a lot of problems with non-Rust targets now
". The reverse
dependencies are the packages that rely on librsvg in this case.
Glaubitz noted that he has put a lot of effort into getting Rust working
for various unofficial architectures. In fact, on November 2, he had
announced
that four new architectures now had Rust support, which meant that all of
those that are officially released by Debian are covered. That brought the number of
Debian architectures with Rust support to 14. His goal, it would seem, is
to get Rust running on all of the ports architectures eventually.
So he was unhappy that the new upload has made his work
on Debian ports "harder with all the manual work
required for maintaining librsvg on the non-Rust targets now
". He
concluded with a final shot:
Is that seriously the way we want to work together?
Though never mentioned by name, apparently Jeremy Bicha was the target of
Glaubitz's ire. Bicha responded
by noting that there was some coordination on the upload, as documented
in a Debian bug report. That coordination was aimed at the supported
architectures, however, and was "with the Release Team and not with
whoever maintains ports
". Part of the problem, evidently, is that
Bicha is not clear on how one might even coordinate with the ports
maintainers; ports is not exactly in the Debian mainstream, he said:
Adam Borowski's call
for a revert of the upload was not well-received. Ben Hutchings said
that Debian bases its releases and their contents on the architectures that
will be included. "We do not allow
Debian ports to hold back changes in unstable.
" Bicha pointed
out that the reversion would have a rather unreasonable outcome:
"It sounds to me like you're saying that to fix librsvg being out of
date on 11 arches, we need to make it out of date on every
architecture.
"
Bicha continued that he did not mean to upset Glaubitz with the
upload. "It honestly didn't occur to me
that I ought to talk to ports maintainers before uploading packages
that won't build on ports.
" Beyond that, the new version of librsvg
spent months in the experimental repository without complaint. He also
noted that the announcement did not come with a warning:
The announcement of Rust support on more architectures and, importantly, all of the release architectures is what allowed Bicha to upload the new version of librsvg, however. So Glaubitz feels like his work to make that happen has been used against him. He rejected Bicha's suggestion:
A suggestion made by Samuel Thibault may provide a temporary way to move forward. He suggested that the older version of librsvg be given a different source package name (others suggested "librsvg-c") but build the same binaries as librsvg for architectures that lack Rust support. That will work, but will leave those architectures with an outdated (and unsupported) version of librsvg. Bicha asked for a volunteer to step up to maintain librsvg-c, though Manuel A. Fernandez Montecelo gave a long explanation of why he thought the Debian GNOME team (of which Bicha is a member) should maintain librsvg-c alongside librsvg.
But the aggressive tone of Glaubitz's messages (including this followup) was not particularly helpful. He seems to have a rather one-sided view of respect and communication between Debian participants but also feels that his work in getting Rust on more architectures was not really appreciated and acknowledged by the rest of the project. Various project members tried to rectify that in a sub-thread, while also noting that his messages were unnecessarily harsh and off-putting.
Josh Triplett said that he didn't really see more coordination leading to a different technical outcome; other libraries and applications will be using Rust over time, so architectures that don't support it are going to get left further behind. Librsvg is simply the first to go down this path for Debian (though recent versions of both Firefox and Thunderbird also use Rust):
He asked what kind of help was needed in order to get LLVM and Rust support onto the remaining architectures. Glaubitz replied with a sizable number of complaints about Rust and its upstream; he is also skeptical of the security claims that are made for the language. He did point to three reviews needed for LLVM and to a bug in Rust that needed to be fixed, but he couldn't resist complaining about the stability of the Rust language as well as its six-week release cycle. According to Triplett, though, most of the Rust stability problems that are being complained about boil down to running Rust on unsupported and, critically, untested architectures. He pointed to the Rust platform-support page and suggested that platforms not in tier 2 are not going to be well supported.
In yet another lengthy
reply, Glaubitz continued to complain about Rust. In fact, he said:
"[...] I think it's a bad
idea to use Rust code in a core component like librsvg
". As
Triplett pointed
out, though, Debian cannot control what languages are used by upstream
projects. Beyond that, he is not seeing any real actionable call for
assistance by Glaubitz or others working on Debian ports.
In the end, it is not entirely clear what Glaubitz wants—at least what he wants that is within the realm of possibility. Rust may irritate him, but he can't stand in the way of projects that want to use it, even if it means that there are some less popular architectures that cannot support them. His tone seems to discourage exactly what he might like to see happen (Rust on the rest of the ports architectures). It is understandable that he is unhappy that his work on getting Rust working for more architectures enabled librsvg to move forward for most of Debian—and all of the official parts of the distribution. The ports that do not yet support Rust are going to need to make that happen or be left behind it would seem.
ktask: optimizing CPU-intensive kernel work
As a general rule, the kernel is supposed to use the least amount of CPU time possible; any time taken by the kernel is not available for the applications the user actually wants to run. As a result, not a lot of thought has gone into optimizing the execution of kernel-side work requiring large amounts of CPU. But the kernel does occasionally have to take on CPU-intensive tasks, such as the initialization of the large amounts of memory found on current systems. The ktask subsystem posted by Daniel Jordan is an attempt to improve how the kernel handles such jobs.If one is going to try to optimize CPU-intensive work in the kernel, there are a number of constraints that must be met. Obviously, that work should be done as quickly and efficiently as possible; that means parallelizing it across the multiple CPUs found in most current systems. But this work needs to not interfere with the rest of the system, and it should not thwart efforts to reduce power consumption. The current patch set tries to meet those goals, though some parts of the problem have been deferred until later.
Basic usage
To use the ktask subsystem, kernel code must provide two fundamental pieces: a structure describing the work to be done and a "thread function" that can be called to do one sub-portion of the total job. The control structure looks like this:
struct ktask_ctl {
ktask_thread_func kc_thread_func;
ktask_undo_func kc_undo_func;
void *kc_func_arg;
size_t kc_min_chunk_size;
/* Optional, can set with ktask_ctl_set_*. Defaults on the right. */
ktask_iter_func kc_iter_func; /* ktask_iter_range */
size_t kc_max_threads; /* 0 (uses internal limit) */
};
The first member (kc_thread_func()) is the function to do a part of the work, while kc_func_arg is private data to be passed to that function. kc_min_chunk_size defines the smallest piece of the job that can be passed to a call to kc_thread_func(); if the job were to clear a large number of pages of memory, for example, the minimum size might be set to the size of a single page. The other fields will be described below.
In the usual kernel style, this structure can be initialized in either of two ways (using macros):
DEFINE_KTASK_CTL(name, thread_func, func_arg, min_size);
struct ktask_ctl ctl = KTASK_CTL_INITIALIZER(thread_func, func_arg, min_size);
With that in place, the job can be run with:
int ktask_run(void *start, size_t task_size, struct ktask_ctl *ctl);
Here, start describes the starting point of the job to be done in a way that the thread function understands. It is mostly opaque to ktask itself, though, for the purpose of splitting the job into pieces, ktask treats start as a char * pointer by default. The size of the task (in whatever units make sense to the thread function) is given by task_size, and ctl is the control structure.
This call will break down the given task into units of at least the specified minimum size and pass pieces of it to the thread function, which has this prototype:
typedef int (*ktask_thread_func)(void *start, void *end, void *arg);
The portion of the job to be done is described by start and end, while arg is the kc_func_arg value from the control structure. It should return KTASK_RETURN_SUCCESS (which happens to be zero) if all went well, or an error code otherwise.
The call to ktask_run() will not return until either the entire job is done or a call to the thread function returns an error. Multiple calls may be run in parallel on different CPUs though, by default, ktask_run() will limit itself to CPUs on the current NUMA node. The final return value is, again, either KTASK_RETURN_SUCCESS or an error code.
While running on the local NUMA node is a good default, it will often make sense to spread the work out across multiple nodes. To return to the memory-initialization example, the optimal arrangement would be to have each node initialize the memory that is local to it. If a ktask user needs explicit control over the node that a specific piece of the job should be run on, it starts by creating an array of one or more ktask_node structures:
struct ktask_node {
void *kn_start;
size_t kn_task_size;
int kn_nid;
};
The kn_start and kn_task_size members describe the job in the same way as the start and task_size arguments to ktask_run(). The node to run the job on is stored in kn_nid; that value can also be NUMA_NO_NODE to allow the job to run on any node in the system. The job is then run with:
int ktask_run_numa(struct ktask_node *nodes, size_t nr_nodes,
struct ktask_ctl *ctl);
This call will act like ktask_run(), except that it will split it across NUMA nodes as directed by the structures in the nodes array.
Advanced details
In the default mode, ktask_run() and ktask_run_numa() will simply stop if the thread function returns an error. But in some cases there can be cleanup to do if things fail partway through; that has to be managed by ktask, since it holds the knowledge of what part of the job had been completed before the error happened. If the caller provides an "undo function" (as kc_undo_func in the control structure), that function will be called on the chunks of the job that had been successfully executed before the error happened. The undo function is not allowed to fail.
By default, the start and end values used to define a portion of the job are treated as char pointers, and the calculation of chunk sizes is done with simple pointer arithmetic. Callers that need a different interpretation can have it, though, by setting a new "iter function" in the control structure. The default function is:
void *ktask_iter_range(void *position, size_t size)
{
return (char *)position + size;
}
Users can replace it by defining a new function and storing it into the control structure with:
void ktask_ctl_set_iter_func(struct ktask_ctl *ctl, ktask_iter_func (*iter_func));
The normal usage of this function would be to use a different pointer type for the position and scale the size accordingly.
By default, ktask will run parallel calls to the thread function in a maximum of four threads. That value can be changed with a call to:
void ktask_ctl_set_max_threads(struct ktask_ctl *ctl, size_t max_threads);
The number of threads actually used may fall short of the given max_threads depending on the nature of the job and the system it's running on.
Performance
Ktask can clearly get a job done more quickly if it is able to spread that job out across multiple idle CPUs in the system. If that work then prevents those CPUs from doing anything else, though, the end result may not look like a net win to the user. To avoid interference with real work, ktask runs its worker threads at the lowest priority available (though still above SCHED_BATCH). That, naturally, leads to another problem: what if the system is overloaded and the thread functions never get to run? To avoid that problem, ktask will raise one thread at a time to its priority to allow things to continue at the single-threaded pace, at least.
The documentation
claims that ktask will disable itself if the system is running in
a power-saving mode. That same documentation also says: "TODO:
Implement this
". While it is agreed that ktask should not drive up
power consumption on systems where power is at a premium, there is not yet
agreement on how that policy should be implemented. Control-group
awareness is another detail that has not yet been worked out.
Perhaps the simplest example use of ktask can be found in this patch, which converts clear_gigantic_page() (which is tasked with zeroing a 1GB huge page). According to the changelog, if ktask is allowed to use eight threads for this job, it will speed it up by a factor of just over eight — a benefit of being able to use more memory bandwidth overall by spreading the job across the system. Other tasks converted in the patch set (almost all associated with memory initialization in one way or another) show similar improvements.
This patch set has been simmering on the mailing lists for some time; the first version was posted in July 2017. The current revision is the fourth, and it appears to be getting closer to being ready to go upstream. There are still some outstanding issues, though, including the loose ends described above, so it seems likely that at least one more posting will be required. There is also the question of how ktask relates to padata; Jordan had not heard of it before being asked, but thinks that its requirements are significantly different from those of ktask. All told, ktask may be fast, but its path into the kernel is a bit less so.
Device-tree schemas
Device trees have become ubiquitous in recent years as a way of describing the hardware layout of non-discoverable systems, such as many ARM-based devices. The device-tree bindings define how a particular piece of hardware is described in a device tree. Drivers then implement those bindings. The device-tree documentation shows how to use the bindings to describe systems: which properties are available and which values they may have. In theory, the bindings, drivers and documentation should be consistent with each other. In practice, they are often not consistent and, even when they are, using those bindings correctly in actual device trees is not a trivial task. As a result, developers have been considering formal validation for device-tree files for years. Recently, Rob Herring proposed a move to a more structured documentation format for device-tree bindings using JSON Schema to allow automated validation.
Device-tree documentation today is free-form text with some defined structure and optional examples (like the generic GPIO clock multiplexer documentation in gpio-mux-clock.txt). For new bindings, the review process is entirely manual and depends on the reviewers to find typos and errors that an automated system might be expected to catch. No tool exists to check whether any given device-tree file conforms to the binding documentation. In addition, the bindings documentation files sometimes either duplicate information that is also contained elsewhere or are missing information that is necessary to validate a device-tree file.
Numerous proposals have been made in the past to address the validation
of device trees. One went as far as using YAML as a source format
for
device-tree files. Herring does not go that
far; instead he proposes to convert only the documentation files, using
JSON Schema for the schema vocabulary, while leaving the device-tree format
itself unchanged. He explained
the choice in the submission cover letter: "the language has a
defined specification, maps well to DT data, and there are numerous
existing tools which can be leveraged
". He prefers to use a YAML subset
because it is generally considered more human-readable and
allows certain additions, including comments. This solution also
takes advantage of existing technology and libraries and avoids inventing a new
language. The goal was to allow validating device-tree files at build time
and verifying the correctness of the documentation. In addition, error and
warning messages can be made more meaningful.
Documentation format and validation process
The device-tree documentation in the new format becomes a structured file, a schema. It is written in YAML using a JSON-compatible format. A schema should include the information necessary to validate a device-tree file. The schema file has a number of sections (mandatory and optional), including:
- $id — gives a unique schema identifier
- $schema — gives the identifier of the meta-schema this schema follows
- title — provides the schema title
- description — includes a multi-line description of the binding
- maintainers — a list of email addresses of all maintainers
- properties — the sections with the dictionary of property descriptions (this is a big part, of course)
- requires — lists the mandatory properties
- examples — that may include examples using the DTS language
The patch-set documentation includes an annotated example. A simple file could look like:
$id: "http://devicetree.org/schemas/bindings/vendor/someexample.yaml#"
$schema: "http://devicetree.org/meta-schemas/core.yaml#"
title: Documentation example
maintainers:
- Our Maintainer <example@example.com>
description: |
Multi-line description is to be added here.
properties:
# Here we define the compatible property with one possible string
compatible:
items:
- const: vendor,my-clk
reg:
maxItems: 1
required:
- compatible
The schema files must follow a number of rules and can be validated using so called meta-schemas that are also implemented in YAML. The meta-schemas are provided with the tools and documentation writers are not expected to modify them.
Current status and further plans
The submitted series adds build support, documentation, and converts some bindings in the ARM tree. An additional tool, doc2yaml, exists in Herring's tree, but has not been submitted; the script can be used for a preliminary conversion of a device-tree binding file in the current text format to the YAML one.
The meta-schemas, schemas, and validation tools are hosted in a separate repository for now. The repository includes a number of scripts (implemented in Python 3 with ruamel.yaml and JSON Schema). There are three that are expected to be used the most. dt-doc-validate can help validating a schema. dt-mk-schema creates a single schema file from the provided schema files and generic schemas from the repository; the use of such processed schema is expected to speed up device-tree validation. Finally, dt-validate takes the YAML device trees and validates them against the schema.
With the patch set, the kernel build system gains two new targets: dt_binding_check to check the device-tree binding schema documents, and dtbs_check, which checks device-tree files against the binding schema. They are using the new functionality of the device-tree compiler (dtc) present in 4.20: YAML output. The exact format is intended for validation purposes only and may change in the future.
During the conversion effort, Herring refactored some of the documentation. In the process he moved the bindings to have one binding per file and moved miscellaneous bindings (that are used for multiple SoCs) to separate files. The plan is to merge the core changes directly and then use specific trees for conversions of the bindings documentation. Future plans include conversion of more documentation files, validation of the examples against the schema, allowing validation of selected targets, and more control over which schemas are used for validation (so that it takes less time).
The patch set has received some comments, some with direct approval, others asking for clarifications. Many questions concerned the details of the schema file syntax. The overall push to more structured documentation of device-tree bindings seems to be uncontested. It looks likely that the tools to support the new format will be merged soon. The conversion itself might require more work, however. A simple attempt to convert random documentation files from the mainline kernel shows that many properties will require manual description. Setting this new format as a rule for new submissions would certainly help to make the format widespread in the months to come.
More information is available in the slides from a couple of recent Linaro Connect talks on this subject; they give examples of the device-tree schema and their usage: presentation [PDF] and BoF slides [PDF].
A report from the Automated Testing Summit
In the first session of the Testing & Fuzzing microconference at the 2018 Linux Plumbers Conference (LPC), Kevin Hilman gave a report on the recently held Automated Testing Summit (ATS). Since the summit was an invitation-only gathering of 35 people, there were many at LPC who were not at ATS but had a keen interest in what was discussed. The summit came out of a realization that there is a lot of kernel testing going on in various places, but not a lot of collaboration between those efforts, Hilman said.
The genesis of ATS was a discussion in a birds-of-a-feather (BoF) gathering at the 2017 Embedded Linux Conference Europe (ELCE) on various embedded board farms that were being used for testing. A wiki page and mailing list were created shortly after that BoF. The summit, which was organized by Tim Bird and Hilman, was meant to further the work going on in those forums and was set for October 25 in Edinburgh, Scotland during ELCE 2018. There were 22 separate testing projects represented at ATS, he said.
The main idea was to try to find common ground between all of the different projects and to figure out what was needed to collaborate more. A lot of effort is being put into all of these projects, but there is no place to share resources and development ideas. There are lots of different test suites and new ones are being developed, but there is no real coordination on what tests to run and how they should be run. Beyond that, there is no common definition of test plans or strategies and how to report results. In a way, running tests is easy, Hilman said, doing something with the results is the hard part.
He wanted to summarize what was discussed at the summit, but also pointed to some detailed minutes that were kept. One of the questions was why there hasn't been more collaboration along the way. Some companies think of their test suites and infrastructure as a kind of "secret sauce" but, as is often found in the open-source world, it actually turns out that most are doing much the same things. But some tests are written specifically for a particular lab setup, device, or test framework, so are not all that useful to share. Some companies also fear sharing their tests because that might show how little testing they are actually doing, he said, especially if they are saying they do a lot more testing than, in reality, they are.
Some time at ATS was spent bikeshedding over terminology, but there is a need to collect terms and definitions into a glossary. The discussion "took a fair amount of time", he said, but was useful because a lot of time was being wasted when people used unfamiliar terms or defined the same terms differently. Most of the participants come from the embedded world, so there are terms and concepts that may not have been familiar to those from the server and desktop spaces. For example, "DUT", which means "device under test", was confusing some attendees before it was clarified and added to the glossary.
There are multiple areas where collaboration is not happening but the ATS participants needed to figure out which areas to start with. They decided on trying to work out some common test definitions. That would include where to get the source for test suites, how to build them, how to run them, and how to determine the pass/fail criteria. Attendees also spent a lot of time discussing output formats, he said. There is a wide variety currently, including XML, plain text, JSON, and 0 or 1 exit status. That diversity makes it hard to collect up results from multiple test suites and coherently combine them for consumption by humans.
Pass/fail conditions are influenced by what is being tested and by whom. For example, the Linux Test Project has lots of tests, but many of those are never going to pass on some embedded targets. That means there are lists of specific tests to skip in different frameworks, but those often do not give any reason why a test appears on the list. It might be that there are hardware-specific reasons that the test cannot pass, but it might be that some developer was just lazy, didn't figure out why it didn't pass, and simply added it to the skip list. So, even the same test suite may be run differently on a particular test framework or hardware device.
The test definition collaboration will start with a survey to determine what various testing efforts are already using and how. Results from that will help guide further work. He doubts that there will be a single test framework or test specification mechanism that will come out of this work, but it should help narrow things down a ways. A survey was already done to gather information about the testing infrastructure in use; that survey and its results are available on the wiki.
ATS attendees also spent some time trying to understand what others are doing, particularly in terms of lab and infrastructure configuration. The gathering had an embedded bias to some extent, because of its roots, so there was a lot of discussion about power-distribution units (PDUs), handling serial consoles, and so on. There was also some discussion of the test frameworks from a high level; they tried "not to get too much down into the weeds" of specific frameworks, Hilman said.
The action items from the meeting include cleaning up the glossary. They will also be creating a survey to start the test definition collaborative effort. It turns out that most test labs have their own "scripts and hacks" for working with PDUs, but there was no place to collect them up for sharing. PDUDaemon is an existing project that handles PDUs, so another action item is to have people add their scripts and such to that project.
There is a plan to meet again at next year's ELCE, which is in October in Lyon, France. There was a question about having a gathering with a wider scope and Hilman said that had been discussed. Perhaps also having a gathering at Plumbers or another conference would make sense. Another question concerned how many testing efforts oriented toward servers and desktops were represented. Hilman said that it was heavily skewed toward embedded testing, but that around half a dozen of the projects were for server/desktop testing frameworks and efforts.
Another question concerned comparative testing, such as comparing performance release to release. Much of that data is being collected in one form or another, but in order to be useful, the underlying testing regime, configuration, hardware, and so on must be reliably captured to allow apples-to-apples comparisons. Similarly, lab-to-lab differences, including hardware such as storage and networking, also play a role, so comparisons between different labs' data may be difficult.
He concluded his session by suggesting that interested people look at the wiki and participate on the mailing list.
[I would like to thank LWN's travel sponsor, The Linux Foundation, for assistance in traveling to Vancouver for LPC.]
iwd: simplifying WiFi management
It has been nearly 13 years since Jeff Garzik proclaimed that Linux was "proving its superiority in the area of crappy wireless (WiFi) support". Happily, the situation has improved somewhat since then, but that doesn't mean that things can't get better yet. During the Embedded Linux Conference portion of the 2018 Open Source Summit Europe, Marcel Holtmann described the work being done to create iwd, a new system for configuring and managing WiFi connections. If this project has its way, future users will have little room for complaint about how WiFi works on Linux systems.
At the moment, Holtmann said, WiFi on Linux is far too complicated for users to deal with; it asks them for far too much information. Users have to contend with complicated configuration dialogs to provide details that, much of the time, the system should be able to figure out for itself. The situation is bad even on basic open networks, but it gets worse on corporate networks, where it is often not possible to create dialogs that can work right in all settings.
The problem comes down to the old wpa_supplicant daemon, which is a complex "Swiss army knife". It is "awesome work", he said, but it has two big problems. One is that the wpa_supplicant project doesn't make releases, with the result that nobody picks up the bug fixes that are made. And there are no usable APIs for controlling it, so users have to figure out everything themselves. It is a great tool, but few people have any idea of how to use it.
For some time, Holtmann and his collaborators thought that they could improve wpa_supplicant and turn it into a proper management daemon. But the upstream project does not want things to go that way; they have, he said, a lot more interest in producing a toolbox that can be used for tasks like testing new protocol specifications. This focus and the lack of proper releases mean that everybody ships their own version of wpa_supplicant, with unpleasant consequences. There is, for example, no Linux distribution support for the WPA3 protocol, even though wpa_supplicant has that support.
Developing iwd
Thus the decision was made to develop iwd instead. Its task is to manage all WiFi network connections used by a machine with a goal of only asking for information that it cannot figure out for itself — often only the network to connect to and the passphrase. It "remembers everything" so users do not need to repeat themselves. Iwd will be the only program that performs WiFi scanning on systems where it is running; that differs from systems using wpa_supplicant, where higher-level software must also scan for networks. With iwd, that work has all been pushed to a single level where good decisions can be made. Iwd is meant to support fast and reliable roaming; it can ask an access point for information about its neighbors and use that to maintain connectivity as the system moves.
An explicit decision has been made to not care about running on anything
except Linux. The D-Bus API is used for communication, and the kernel's
cryptographic interfaces are used for encryption functionality. Iwd,
Holtmann claimed, has readable source code — another thing that
wpa_supplicant lacks. There is a large battery of unit tests to go along
with it. Version 0.1 was released in February, and 0.10 came out on
October 20.
Iwd has been designed to deal with multiple clients from the outset. With wpa_supplicant, instead, only one client at a time can be handled, so systems have had to put a wrapper around it to make things work. On an iwd system, tools like ConnMan, iwctl, and user agents can all talk to the daemon directly, at the same time.
Support for iwd has been in ConnMan since version 1.34, and in NetworkManager since version 1.12. Integration with NetworkManager took some work, since it had to be convinced to not worry about many WiFi details and let iwd handle them. Iwd works well now for personal networks; support for enterprise networks is under development but not really ready yet. A number of distributions have packaged it, and there is "rough" support for ChromeOS as well.
Developing iwd required fixing a number of things in the kernel along the way. There had only been one user of the WiFi-control API so far, so a lot of things didn't work right, Holtmann said. Those problems have been fixed, and most of the patches are already in the mainline kernel.
Iwd offers a long list of features, starting with support for the station, ad hoc, and access-point modes. Modern features like SSID grouping and scanning with hidden SSIDs are supported; the four-way handshake for pairwise transient keys is coming soon. A lot of enterprise methods are being added, including support for the extensible authentication protocol (EAP). Configuration for EAP is not done through a graphical interface, though; it is too complex for users to be able to realistically deal with it, so it will instead use a configuration file provided by the network administrator. Making this work requires asymmetric_key keyring support in the kernel, which is set to be part of the 4.20 release. For extra security, this key can be stored in the Trusted Platform Module (TPM).
Toward 1.0
There are a number of features missing, but they are not expected to stop the 1.0 release from happening around the end of the year. These include pairwise master key caching, and an opportunistic wire encryption mode that prevents eavesdropping but doesn't bother with authenticating the other end. Support for the device provisioning protocol is in the works, as is WiFi P2P support, which is almost done. Work that can be expected in the 1.0 release includes an API review of the Embedded Linux Library and D-Bus interfaces.
Holtmann concluded by saying that iwd is not just about WiFi; it is also expected to work with Ethernet connections. Corporations are increasingly locking down their Ethernet ports, requiring authentication before they can be used. To work with such systems, iwd is gaining an Ethernet authentication daemon that can work with the TCP; initial support is already present.
The audience asked whether iwd would ever work without D-Bus; Holtmann replied in the negative. It would be better to optimize any inefficiencies out of D-Bus, he said, than to try to get iwd to work without it. When asked about supporting 802.11s mesh networks, he said that it was something that the project needed to do, but it will not happen right away; contributions are welcome, he added. The last question was about integration with systemd-networkd; he replied that things mostly just work now, but there are still some details to take care of.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]
Page editor: Jonathan Corbet
Next page:
Brief items>>
