Leading items
Welcome to the LWN.net Weekly Edition for November 9, 2017
This edition contains the following feature content:
- The rise and fall of Limux: The history of Munich's Limux project and some questions the community should ask itself going forward.
- A report from the Realtime Summit: A report on several talks from this year's Realtime Summit.
- USBGuard: authorization for USB: A framework for rule-based policies governing USB devices.
- A kernel self-testing update: The kernel self-tests are growing, but there is more to do there.
- Kernel regression tracking, part 2: A continuation of the regression tracking discussion, this time at the Maintainers Summit.
- Bash the kernel maintainers: A discussion about feedback from the community about working with kernel maintainers.
- An update on the Android problem: The Android ecosystem is full of out-of-tree code, but it would appear that things are getting better.
- The state of Linus: The traditional session where Linus Torvalds gives his view of the state of the development community.
- Maintainers Summit: SPDX, cross-subsystem development, and conclusion: A few short topics to close out the discussion.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
The rise and fall of Limux
The LiMux (or Limux) initiative in Munich has been heralded as an example of both the good and bad in moving a public administration away from proprietary systems. Free Software Foundation Europe (FSFE) President Matthias Kirschner reviewed the history of the initiative—and its recent apparent downfall—in a talk at Open Source Summit Europe in Prague. He also looked at the broader implications of the project as well as asking some questions that free-software advocates should consider moving forward.
History
He began by revisiting the year 2000; we had just survived the Y2K scare and public administrations (cities and other governmental organizations) were realizing that Windows NT 4.0 was about to reach its end of life. So some of them started evaluating the possibility of moving to Linux. One of those administrations was the city of Munich. There was a lot of media attention focused on the idea that Munich might move away from proprietary software. Microsoft CEO (at the time), Steve Ballmer, famously left his ski vacation to talk to the mayor, but that didn't dissuade the city from starting a project to move to Linux.
![Matthias Kirschner [Matthias Kirschner]](https://static.lwn.net/images/2017/osseu-kirschner-sm.jpg)
A few weeks later, though, the project did stop because the city was worried about patents. That problem was studied and the city came to the conclusion that the patent risk was no worse for free software than it was for proprietary software. Still, over the years, there were repeatedly rumors about the demise of the project and that Limux, Munich's Linux distribution, would be dropped.
One of the questions that Munich wanted to answer was if switching would save it money. An IT committee estimated it would save €20 million by using Linux clients for the desktop. There were other studies, including a Microsoft-funded one by HP that said it would cost €43 million more; that study was not published, though its conclusions were featured in the news.
The rumors that Munich would drop Limux often cited cost as a reason for doing so but, even now, it is difficult to estimate what the cost of switching was. There was more to the switch than simply desktop clients as the city also centralized much of its IT infrastructure at the same time. It is hard to separate the organizational and technical costs to really determine the bottom line, Kirschner said.
Along the way, it was reported that 20% of the users of Limux were not happy or satisfied with it; other reports had the number at 40%. In most cases, it was not at all clear what they were unhappy with—was it the client or something else? Those reports never said anything about how happy the city workers in "Hamburg or Paris or wherever" were. He noted that one of the changes moved the support staff to a centralized facility, rather than it "being the guy in the next room". He wondered if that may have impacted users' happiness with their desktops.
Another thing that was often reported was that it was difficult to exchange documents with other administrations in Germany. There was a German policy that documents were supposed to be delivered in an open format, but Munich regularly got documents in proprietary formats.
Despite all of the reports of the imminent downfall of Limux, by 2013, 15,000 computers had been migrated. In addition, 18,000 LibreOffice templates had been created for documents. Previously, each office had its own templates, but the new ones were shared across the city administration. The mayor who had started the project was "always supporting it", Kirschner said. He continuously backed the team behind Limux.
That all ended in 2014. The old mayor did not run for reelection, so a new mayor, Dieter Reiter, from the same party was elected. Reiter did not like Limux and was quoted in some articles as being a Microsoft fan. He ran partly on the idea of switching away from Limux.
The cause of all evil
From then on, Kirschner said, "Limux was the cause of all evil in Munich". For example, iPhones did not work with the city's infrastructure, which was blamed on Limux though it had nothing to do with the desktop client. A mail server outage was also unfairly blamed on Limux.
All of that led people around Europe to believe that Munich had already switched away from Limux, but that was not the case. The new mayor was making a lot of noise about it, which made things hard for the IT staff. Effectively, the boss was not supporting their work.
The city government paid for a study to look at the IT problems that the city was having. It was done by Accenture, which is a Microsoft partner, so the FSFE and others expected the worst. It turned out not to be what they expected, he said. The study identified several problems, one of which was about an old version of Windows that was still in use, but the biggest problems were organizational rather than technical.
It turned out that there were fifteen different operating system versions in use throughout the city administration. Upgrades could be blocked by departments if they didn't like the update or didn't have time to do them. That meant there were users who were dealing with bugs that had been long fixed in LibreOffice (or OpenOffice before it). The study recommended that those problems be fixed.
The Munich city council decided to do a reorganization of the IT department, which was similar to some of the study's recommendations, but not the same. A city council meeting was held with a late, surprising, addition to the agenda: to vote on moving to an integrated (proprietary) client. There were no costs or justifications associated with that agenda item, it was just an attempt to have a decision made about that question.
The FSFE wrote to all of the city council members (and the press) to ask about the effects, costs, which services would not be available after a switch, and so on. That led to multiple press inquiries to the city council and a television crew showing up at the meeting. Many of the questions the FSFE had asked were brought up in the meeting and the council wanted "real answers"; they had never gotten so many requests from the public about any other issue, Kirschner said.
In the meeting, the mayor said that the agenda item was not actually about making a decision, but was instead about examining options. It was agreed that, before a decision could be reached, clarity on costs, service disruptions, and the like would be needed. A decision would be made by the council at some later date.
In that meeting, though, it became quite clear that a lot of parties had already made up their mind, Kirschner said. There would be a move to a unified desktop client over the next few years. In fact, without waiting for a decision from the city council, some services were stopped and email started to be migrated to Microsoft Exchange. The "pattern is quite clear", but the public is being told that the city is still examining options. That is "harming not only free software, but also democracy", he said.
Moving forward
The "lighthouse we had seen before will not be there anymore". Limux will be replaced with Microsoft clients. It doesn't make sense, he said, because the city already had a strategy to move away from desktops to "bring your own device" and other desktop alternatives. He wondered if this is all really Munich's fault or whether the free-software community also unwittingly helped Limux fail. It is something we need to understand and learn from for other migrations that may happen in the future.
To that end, Kirschner had a few different questions that he thought the community should think about. "Do we suck at the desktop?" We are dominant in everything from supercomputers to embedded devices, but have never made any real gains in the desktop space. Many in the community use other operating systems as their main desktop. Is our desktop client bad or is it applications that are needed, especially for public administrations?
Was there too much focus on the cost savings? People in the community promised that Munich would save money and he is confident that in the long run that is true, but a switch always has costs. If the budget is tight, switching to save money may well not be the right plan. He also wondered if the community should do more to support companies and individuals who charge for free software.
"Do we sometimes harm these migrations by volunteering?" Migrations to free software are generally driven by individuals, either inside a public administration or by a parent for a school. Those individuals start bringing free software in and do lots of work (for free) to make it all work. Problems arise and there is no budget to bring in others to help out; people burn out and then everything fails. Instead of coming to the conclusion that not having a budget led to free software failing, the organizations often decide to "get a budget and do it right". He thought it might make more sense to try to get the budget for the free-software project, instead of volunteering.
It may be better to focus on applications, rather than on the operating system. Public administrations have applications for all sorts of different tasks, such as passport programs or marriage license applications. Those need to work right away as part of any migration. Maybe a path forward is to make the argument that those applications should be free software, so that they aren't so closely tied to a particular platform.
There was a tendency in the community to point to Munich whenever the topic of free software in public administrations would come up. That may have been too much focus on one migration. As seen in Munich, decisions about migrations are not always made for technical reasons, but since that migration was always touted, it means that Munich failing equates to free software failing in some minds. There are other examples of migrations in public administrations, he said; we should research those and point out multiple different migrations instead of concentrating on one.
Public money, public code
The FSFE has started a new campaign, called Public Money, Public Code, that seeks to make all code developed for the public be released as free software. He quoted the Director-General of the European Commission's IT directorate who said: "Sharing and reuse should become the default approach in the public sector". He showed a short video [YouTube] from the campaign web site that highlighted the absurdity of proprietary restrictions on software by analogy to public infrastructure like roads and buildings. If the owners of public buildings could force complete replacement in order to upgrade or restrict the kinds of votes that could be taken in legislative building, it would obviously be completely unacceptable, but that is often what is required for our public code infrastructure.
There is a open letter that Europeans can sign to demand that lawmakers
"implement legislation requiring that publicly financed software
developed for the public sector be made publicly available under a Free and
Open Source Software licence.
". Organizations can also join the
campaign and donations are accepted to further the work. He concluded with
a quote: "Many small people, in many small places, do many small
things, that can alter the face of the world.
"
In the Q&A, Kirschner noted that, unlike companies, which often don't want to share with their competitors, public administrations are not in competition with each other. It should be easy for them to understand the advantages of sharing and reusing the software they procure. He also said that part of the campaign's work is in trying to convince lawmakers to support the effort. For the German elections, FSFE contacted all of the candidates and asked for their support; the same will be done for elections in other countries and for the EU Parliament.
[I would like to thank LWN's travel sponsor, the Linux Foundation, for supporting my travel to Prague for OSS Europe.]
A report from the Realtime Summit
The 2017 Realtime Summit (RT-Summit) was hosted by the Czech Technical University on Saturday, October 21 in Prague, just before the Embedded Linux Conference. It was attended by more than 50 individuals with backgrounds ranging from academic to industrial, and some local students daring enough to spend a day with that group. What follows is a summary of some of the presentations held at the event.
Beyond what is covered here, Thomas Gleixner, who is the lead for the Real-Time Linux project, gave an update on that project's status in a session that was covered in a separate article.
Realtime trouble, lessons learned, and open questions
Gratian Crisan started his presentation on problems using the realtime patch set that his group has run into by underlining that the work presented was done by multiple people and that some of it is admittedly hackish. He then listed the problems his group recently encountered and how they were addressed.
The first problem originates from a sequence of write operations to a memory-mapped input/output (MMIO) region that is followed by a read operation. The I/O devices (examples of the e1000e and tpm_tis drivers were presented) will usually be connected to a bus running at a lower frequency and with different bit-width than the CPU's. Buffering and arbitration are required in the I/O fabric, which causes write operations to be queued up along the way. When a long sequence of writes to an MMIO region is followed by a read operation from the same region, ordering guarantees mandate that the read operation wait until all writes are flushed before it can complete. This stalls the CPU in the middle of the MMIO read instruction, preventing the servicing of timer interrupts. Realtime priority threads running on the CPU will then wake up late because the timer interrupt was delivered late.
To address the situation, long stretches of MMIO operations can be broken up by introducing delays, allowing time for the writes to be committed to the device and for high-priority realtime threads to preempt the driver code so that they can execute. Another way around this is to follow each MMIO write with an MMIO read when a kernel is configured with the PREEMPT_RT_FULL option. That way, the amount of time a CPU is stalled is bounded since only one store operation has to complete. Crisan then asked if the same problem has been seen elsewhere, something that Gleixner confirmed. There is no known solution other than education via the realtime wiki site and testing.
The second problem related to the aggregation of high-resolution timers (hrtimers) from SCHED_OTHER threads and the large amount of latency the processing of their wakeups induces on a system. Crisan provided a small code snippet that reproduces the same pattern as observed in the real use case. A patch that moves all hrtimers wakeup processing for non-realtime tasks to the softirq thread has been submitted to fix the problem.
The third problem stems from a lack of priority inheritance support in the standard glibc pthread library; only the pthread_mutex_*() functions are tailored to handle priority inheritance. A short discussion in the audience revealed that not much has happened on the topic since last year's realtime summit and a comment in the bugzilla entry notes that there is no solution in sight.
The fourth and last problem revolves around the management of interrupt thread priorities and the risk of priority inversion if the interrupt handler needs to do a bus transfer that involves another interrupt thread. It is hard to associate an interrupt number with its corresponding kernel interrupt thread process ID so that its priority can be configured properly. A related problem pertains to the configuration of priorities for threads that are not created at boot time. For example, some ethernet drivers create an interrupt thread dynamically when a cable is plugged in. A patch that adds the ability to call poll() on /proc/interrupts has been created to address the issue. From there a daemon or service like rtctl can react to changes and assign the right priority to the newly created interrupt thread. A sysfs interface has been proposed as a better alternative.
The presentation ended with advice for people when working with realtime systems. First, always check the kernel configuration after a kernel upgrade as some options may have changed that introduce unwanted latency. Second, verify that clock sources are properly configured so that realtime latency is kept to a minimum. Using a trusted clock source when instrumenting the kernel is also important to keep in mind. That way one can trust that real latency problems are being investigated rather than a side effect of the tracing code behaving differently after an upgrade. Last, but not least, never underestimate the value of running reboot tests. They have proven to unearth interesting conditions that often lead to malfunctions.
Using Coccinelle to detect and fix nested execution context violations
![Julia Cartwright [Julia Cartwright]](https://static.lwn.net/images/2017/rts-cartwright-sm.jpg)
In her presentation Julia Cartwright focused on problems related to operations executed in a context where they are not valid and how she used Coccinelle to find them.
The first context violation of interest to Cartwright is code executing within an interrupts-disabled region that calls spin_lock(). When the realtime patch set is applied, spinlocks are turned into sleeping locks and will eventually call schedule(), which is something that triggers a "sleeping while in atomic context" complaint from the kernel lock debugging mechanism.
The second context violation also involve calls to spin_lock(), but this time from interrupt code dispatch routines that are running in hardirq context. Once again implicitly or explicitly calling schedule() will result in a warning from the kernel. From Cartwright's point of view, occurrences of the above context violations in the kernel come from developers not understanding when to use raw spinlocks and how to properly use preempt_disable().
Finding even obvious context violations in the kernel is arduous and calls for help from runtime and static-analysis tools, which leads us to Coccinelle. Cartwright then went on to give a short introduction to Coccinelle and present the scripts she used to address various code violation scenarios. To date, 38 context violations have been identified in the mainline kernel (as of v4.14-rc5) and 22 in the realtime patch set. Now that context violations have been located, the hard work of addressing each situation remains, since each case demanding careful assessment and a tailored solution.
SCHED_DEADLINE: what's next?
The goal of this talk was to present the latest development in the area of deadline scheduling along with ongoing work and topics being considered for future enhancement.
Claudio Scordino started with the greedy reclaiming of unused bandwidth (GRUB) algorithm that was merged in the 4.13 kernel. The motivation is for scenarios where deadline tasks that need more bandwidth (CPU time) than reserved when they entered SCHED_DEADLINE may use bandwidth from other deadline tasks that haven't fully used their own reservation. The main requirement is obviously to do so without breaking any reservation guarantees. Scordino went on to give a summary of how GRUB works and presented graphics showing the positive effect of the feature.
![Claudio Scordino [Claudio Scordino]](https://static.lwn.net/images/2017/rts-scordino-sm.jpg)
The presentation continued with work that is currently under development, more specifically the integration with the schedutil CPU-frequency governor so that deadline tasks may run at frequencies below the maximum operating point. The idea is to use the reclaimed bandwidth metrics provided by GRUB to lower the frequency of the system and scale the runtime reservation according to the current operating point. Graphics showing experimental results between GRUB-PA (power aware) and a mainline kernel were presented, highlighting the possibility of honoring bandwidth reservation at still lower frequencies. A new version of the patch set is expected to appear on the mailing lists in the coming weeks.
Next up was the hierarchical group scheduling feature, first published in March. Scordino asked the audience for guidance on the expected behavior and if the sysfs interface to configure the feature should be modified. That was followed by a discussion between the scheduler maintainers and attendees from Scuola Sant'Anna University on how best to implement the admission test on a hierarchical scheduler in order to improve the current, conservative, worst-case scenario implementation. The Scuola Sant'Anna University attendees had some ideas and plan to follow up with a proposal.
The semi-partitioning scheduling feature was presented by Daniel Bristot de Oliveira. It concentrates on use cases where a task would fail the current deadline acceptance test while bandwidth is still available in the system. The proposal is to use the theoretical approach for static task partitioning and try to achieve the same results for dynamically scheduled tasks. It works by splitting a task between CPUs at runtime, but keeping the reservation proportion as it was when first accepted by the system.
Scordino then discussed areas of future development. One of those is the idea of bandwidth reclaiming by demotion, where a deadline task is demoted to the SCHED_OTHER class rather than be throttled at the end of its time budget. Another is throttled signalling, where user-level signals are sent to throttle tasks. Scordino concluded by expressing the desire to see a closer relationship between deadline scheduler developers and the community. The goal is to come up with development priorities so that time isn't wasted on features that aren't important to users.
Future of tracing
Steven Rostedt said that he wanted to use his time as a conversation on the future of tracing rather than on a classic presentation. In order to give context to that discussion, though, he started by giving an overview of the tracing infrastructure and the various features it supports.
Rostedt then moved on to cover some of the features currently under development, such as more advanced histogram support with full customization, synthetic events, IRQ/preempt disable events, and the storage of variable events. Another area being worked on relates to the tracing of functions called by kernel modules that aren't already loaded so that tracing would start when the modules get loaded.
On his wish list, he would like the ability to have zero overhead on lock events as well as more interaction between eBPF and Ftrace. Also on the list is the capability to trace function parameters and convert the trace.dat file (from trace-cmd) to the common trace format (CTF). There is work on KernelShark to get rid of GTK2 and convert it to use Qt and there are plans to add plugins for customized views and flame graphs.
Rostedt asked for input from the audience, something that led to a request for adding support to feed the output of an Ftrace dump to KernelShark for further analysis.
In closing
This event has once again proven to be helpful to the realtime Linux community. A good number of presentations triggered conversations between audience members on the common problems they face, the way to address them, and some of the remaining challenges to overcome.
[I would like to thank the speakers for their time reviewing this article and the valuable input they have provided.]
USBGuard: authorization for USB
USBGuard is a security framework for the authorization of USB devices that can be plugged into a Linux system. For users who want to protect a system from malicious USB devices or unauthorized use of USB ports on a machine, this program gives a number of fine-grained policy options for specifying how USB devices can interact with a host system. It is a tool similar to usbauth, which also provides an interface to create access-control policies for the USB ports. Although kernel authorization for USB devices already exists, programs like USBGuard make it easy to craft policies using those mechanisms.
Malicious USB attacks
Before we look at USBGuard, let's first take a look at the problem it tries to solve: unauthorized use of the USB ports. Beyond the usual data theft that can occur with USB storage devices on sensitive machines, there is also the problem of malicious USB devices programmed specifically for sophisticated attacks. The recent spate of vulnerabilities in the USB subsystem that can be exploited via specially crafted USB devices is worrying. However, even in the absence of software vulnerabilities, it is possible to mount a USB-based attack against host systems. The BadUSB attack is a proof-of-concept malware that reprograms a USB device to surreptitiously attack a computer that it is plugged into—with no privileges required beyond physical access to the machine in question.
USB devices have a tiny embedded microcontroller that is used to facilitate communications between the device and the host computer. The microcontroller runs embedded firmware that lets it perform this task; for BadUSB this firmware was overwritten with customized malicious code that can wreak havoc on a computer in a number of ways. A malicious USB device attached to a host that's booting can infect the system if USB devices are in the boot sequence order. USB devices interact with the host via interfaces, which are a description of the type of device that is presented to the host. A malicious USB storage device, for example, could stealthily create a secondary interface such as a faux keyboard or spoofed network device that sends malicious input or redirects network traffic.
It is also possible to steal credentials from a screen-locked machine simply with a rigged USB drive which, when plugged in, pretends to be a NIC, hijacks the network and DNS settings, and collects credentials from the all too trusting operating system. The information can either be collected by physically retrieving the drive, or it can have its own rogue wireless connection that transmits the sensitive information to the attacker.
USBGuard
It is possible to thwart unauthorized or malicious USB port usage by controlling the types of devices that are allowed to connect to them or to take certain actions based on USB device events. The idea behind USBGuard is that there is a set of approved devices that a user can specify to be allowed on their system and USBGuard enforces those restrictions. A whitelist or blacklist can be generated for a set of devices, which also specifies the allowed behaviors from approved USB devices.
When constructing a whitelist (or blacklist) of devices, a question that needs to be answered is: "How can I tell if this device is mine or not?" The identification system needs to work even on devices that can't be written to, such as scanners and mice. USBGuard attempts to do this by creating a hash of each device a user owns that they want to whitelist, based on its name, manufacturer and model, serial number when available, interface type, and which port it is connected to.
USBGuard works using the underlying kernel USB authorization mechanism to authorize or de-authorize USB devices connecting to the system. A daemon waits on USB events to implement whatever policy the user has specified. A shared library, libusbguard, exports an API for third-party applications to create their own programs that wish to use USBGuard's functionality. While USBGuard can't do anything about boot-time USB attacks, it can protect the ports once the operating system is up and running.
Identifying USB devices
USBGuard collects information about the devices the user owns to identify them, but it is not possible to generate a unique hash for every single USB device in existence. To generate a hash, USBGuard primarily relies on information inside the USB device descriptor, which is a data structure sent from the device to the host when it connects to a port on the computer. The device descriptor includes the manufacturer and model ID of the USB device. The descriptor also contains a field for serial number, but the USB spec makes it an optional field, except for USB storage devices. Therefore, there is no guarantee that the serial number will be present.
It is better to generate a hash with the serial number information if it is present, but if it isn't, then the default behavior for USBGuard is to enable port-specific rules for the connected device. The documentation notes that this is a security measure to ensure the policy is harder to bypass for devices without serial numbers, with the assumption that said device is always physically connected to a specific port.
Rule-based configuration
Configuring USBGuard policy is done using a rule-based language that lets a user specify whether to allow, block, or reject a device based on particular attributes. A device that is blocked will be listed by the operating system as being connected, but no communication is allowed for it. A device that is rejected will be completely ignored after it is inserted into the port.
When a device is inserted in the host, the rules list is scanned from beginning to end until a matching rule is found or, if it reaches the end without a match, then a default rule is applied. An initial policy can be created for the system using the usbguard command-line tool with the generate-policy sub-command that will add the currently connected USB devices to the list of allowed devices and create hashes to identify each device.
The following is an example of a rule, extracted from the documentation:
allow with-interface equals { 08:*:* }The colon separated triad of numbers refer to the USB interface class, subclass, and protocol, which in turn identifies the type of USB device. For example, the hexadecimal number 08 in the class segment of the interface identifier signifies a USB storage device. The example will let any USB storage device (wildcards indicate any subclass or protocol) work on the system. In the absence of any other rule, the default is to block everything else.
The following example is a rule allowing a specific device, also from the documentation:
allow 1050:0011 name "Yubico Yubikey II" serial "0001234567" \ via-port "1-2" hash "044b5e168d40ee0245478416caf3d998" \ reject via-port "1-2"The identifier 1050:0011 indicates that the device's manufacturer ID is 1050 (Yubico), and the product ID is 0011 (which is the Yubikey II). Therefore the above command will accept only a Yubikey II device with the given serial number and hash on either port 1 or 2 and reject everything else on that port. The hash is a value that is pre-generated when the user created a whitelist of devices they want to allow.
Development status
USBGuard is included with Red Hat Enterprise Linux 7.4 and later, and has found its way into other distributions too. On Ubuntu and Arch Linux, it is a community-supported optional install. The code is also available from its GitHub repository. The documentation in the repository warns that the 0.x releases are not production ready; since the latest release is numbered at 0.7.0, there is no production release of the software yet.
Conclusion
USBGuard gives users a way to lock down the USB ports of their machine while also enabling specific permissions for specific devices. It is particularly useful for machines that are accessible to the public or in an environment with many users, such as an office. USBGuard provides a reasonable defense against malicious USB devices, although such things are not widespread, at least yet. However, the attribute-based identification mechanism is only "good enough", as it is not possible to confirm the ID of a plugged-in device with 100% certainty, especially for non-storage USB devices without a serial number. It is also possible to spoof the information in the USB descriptor, thus if the policy rules are revealed to an attacker, the whitelist can be bypassed. Despite this, it is better than having no guards against USB attacks at all.
A kernel self-testing update
Shuah Khan is the maintainer of the kernel's self-test subsystem. At the 2017 Kernel Summit, she presented an update on the recent developments in kernel testing and led a related discussion. Much work has happened around self-testing in the kernel, but there remains a lot to be done.
The kernel has contained a set of tests for a long time, at least since
2005, Khan said. It was only three years ago, though, that a formal effort
to create a self-test "subsystem" was started. The idea was to add a lot
more tests and create a regression-test suite for kernel developers.
Various organizations, such as Linaro, run it; it is also part of the
KernelCI.org and the 0-day test
service.
Running the tests is a simple matter:
make -silent kselftest
The -silent option reduces the clutter in the output. Better results will be had if the tests are run as root, but one should beware of the more disruptive tests, which can force a reboot.
It is possible to add a TARGETS= option to restrict the tests to one or more subsystems; in this case, it only runs the non-disruptive tests. Recent work has enabled the O= option to use a separate build directory, but it still uses the .config file from the source directory. This detail is deemed to be worth fixing, but it is of relatively low importance because the tests don't make much use of the kernel configuration. Ted Ts'o observed, though, that the build architecture is defined in .config, as are options that configure various subsystems out of the kernel entirely.
Another option for testing a specific subsystem is to run:
make -C tools/testing/<subsystem> run_tests
The output from this command will be a summary of the results; the details can be found in a file named after the specific test in /tmp. Ben Hutchings requested that fixed names in /tmp not be used (presumably to avoid creating yet another vector for symbolic-link attacks), so this behavior may eventually change. There was some discussion on the details of the formatting of the results that didn't lead to any significant conclusions.
Khan concluded by saying that, starting with the 4.12 kernel, test results are reported in the TAP13 format. This is a simple text format that makes it easy to spot differences between test runs; it also supports machine parsing of the results.
Arnd Bergmann said that he has been running the latest self tests on (older) stable kernels. That work has resulted in patches fixing problems with the tests; most maintainers like those patches, but the networking subsystem, in particular, will not accept patches aimed at making current tests run on older kernels. Bergmann said that there is a strong desire to make adding new tests as easy as possible; requiring that those tests behave properly on older kernels raises the barrier.
It is also not clear what should happen with tests for features that are not present on older kernels; there are differences of opinion over whether they should report failure or that the test has been skipped. Ts'o suggested that "skip" is the proper result, and that it would be good to know why some developers object to it. Perhaps, he said, the real problem is that some features are failing to present themselves properly. In that case, a "skip" result could mask a deeper problem. Mathieu Desnoyers suggested that the default should be to signal failure for skipped tests, with an option to simply pass over them instead.
At the end of the session, Matthew Wilcox raised a question regarding the tests for the radix-tree implementation. Perhaps, he said, those tests should be run at every kernel build, with the build as a whole failing if the tests do not pass. If the radix-tree tests don't pass, he said, booting the resulting kernel is not advisable. This idea ran into opposition, though. Beyond slowing down the build, their effectiveness is reduced by the fact that the build environment is often different from the run environment. ARM kernels tend to be built on x86 systems, for example. The discussion, overall, suggested that running self-tests during ordinary kernel builds will be a hard sell.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to the event.]
Kernel regression tracking, part 2
The tracking of kernel regressions was discussed at the 2017 Kernel Summit; the topic made a second appearance at the first-ever Maintainers Summit two days later. This session was partly a repeat of what came before for the benefit of those (including Linus Torvalds) who weren't at the first discussion, but some new ground was covered as well.Thorsten Leemhuis started with a reprise of the Kernel Summit discussion, noting that he has been doing regression tracking for the last year and has found it to be rather harder than he had expected. The core of the problem, he said, is that nobody tells him anything about outstanding regressions or the progress that has been made in fixing them, forcing him to dig through the lists to discover that information on his own. He had, though, come to a few conclusions on how he wants to proceed.
First, he will try again to establish the use of special tags to identify
regressions. His first attempt had failed to gain traction, but he agreed
that he perhaps had not tried hard enough to publicize the scheme and get
developers to use it. He will be looking into using the kernel Bugzilla
again, even though it still seems like unpleasant work to him. He'll try
to improve the documentation of how regressions should be tracked and
handled. There is a plan to create a new mailing list on vger.kernel.org,
with the idea that regression reports would be copied there. He will put
more effort into poking maintainers about open regressions.
The discussion quickly turned to the problem (as seen by some) of the many kernel subsystems that do not use the kernel.org Bugzilla instance for tracking bugs. Peter Anvin said that many developers don't see much value in that system. Reported bugs tend to say something like "my laptop doesn't boot" with no further information; that tends not to be useful for the identification of any actual bugs. Beyond that, many bugs reported against the core kernel or x86 architecture turn out to be driver bugs in the end.
Users, it was suggested, should be explicitly directed to the mailing lists when reporting bugs for the subsystems that do not use Bugzilla. Laura Abbott said that this would be just a beginning; the kernel is lacking more general guidance on where and how to report bugs. Ted Ts'o, though, suggested that many kernel developers like the current system, which tends to filter out reports from relatively non-technical users who are unable to create useful reports. It could be seen as a feature, he said; perhaps such users would be better directed to distributor bug trackers.
One ongoing problem is that many of the less technical users are unable to build their own kernel to test a patch intended to fix their problem. Ben Herrenschmidt said there might be value in a facility that would automatically generate a package containing a distribution kernel with a patch added. But Greg Kroah-Hartman said that this discussion (as a whole) had come up many times before. Before creating elaborate systems, it might be best to create a better landing page on the kernel Bugzilla to help users report their bugs. Arnd Bergmann said that this landing page could perhaps be a wiki so that maintainers could easily add information on how to report bugs in their own subsystems. Takashi Iwai suggested that more subsystems should use Bugzilla; it can host useful materials like screenshots that are not really suitable for the mailing lists, but Kroah-Hartman said that those subsystems have managed without Bugzilla so far.
As in the previous session on this topic, it was noted that the linux-kernel mailing list is a black hole; reports sent there without copies to the relevant maintainers are likely to go unread. Once again, it was suggested that a bot should be set up to reply to such postings with suggestions on how to reach an actual human. Another echo from that session was the notion that some subsystem maintainers are resistant to having their bugs called "regressions", since it requires them to respond to them more quickly. Linus Torvalds said that he should be told about any such maintainers, who would then be able to expect a strongly worded message from him. Chris Mason, instead, said that some maintainers like to tag problems as regressions, since regressions are a ticket to get patches into a late -rc kernel release.
Torvalds said that the problem that led to Rafael Wysocki ceasing his regression-tracking work was that he was the only one doing it. Leemhuis, too, is doing this job on his own, and it's a grind. There's only so much help that can be had from more scripting and documentation, what really needs to happen is that more people need to be involved in tracking regressions. In that regard, the Bugzilla is useful because it helps people to work together, even though it is "hell" in general.
The session wound down with a couple of brief side discussions. Many regressions in the kernel are related to specific hardware, which makes them hard to write tests for. Dan Williams said that, using mocking, unit tests can be created to enable at least partial testing for drivers. Finally, it was noted that it would be useful to know which subsystems, in particular, have been prone to regressions; that could help identify parts of the kernel that could use some refactoring, better self tests, or changes in maintenance style. At the moment, nobody really knows which subsystems those are.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to this event].
Bash the kernel maintainers
Laurent Pinchart ran a session at the 2017 Embedded Linux Conference Europe entitled "Bash the kernel maintainers"; the idea was to get feedback from developers on their experience working with the kernel community. A few days later, the Maintainers Summit held a free-flowing discussion on the issues that were brought up in that session. Some changes may result from this discussion, but it also showed how hard it can be to change how kernel subsystem maintainers work.The first complaint was that there is no consistency in how maintainers respond to patches. Some will acknowledge them right away, others take their time, and others will sometimes ignore patches altogether. James Bottomley defended the last group, saying that it is simply not possible to respond to all of the patches that show up on the mailing lists. The discussions can get nasty or, with some posters, a maintainer can end up stuck in a never-ending circle of questions and reposts. Arnd Bergmann suggested that maintainers could adopt a standard no-reply reply for such situations.
Laura Abbott said that the low-quality patches tend to be cleanup work, but
there are also plenty of developers doing feature work who would appreciate
more consistent maintainer responses. Bottomley said that such
developers know what they are doing with regard to working with the
community already, but Bergmann disagreed. Bottomley went on to say that
the best way to get a response to a feature patch is to recruit other users
to say that they, too, are interested in that feature. He is not
interested in merging features that have a single user. Finding these
users should not be hard, he said; developers working for companies already
have a user base to draw from. When he was working at Parallels, he said,
he was able to go out and find users to push for patches that had been
languishing.
Shuah Khan said that there is a lot of inconsistency around when submitters should ping maintainers for a response. Bottomley replied that his "rule number one" for SCSI patches is that submitters must find a developer to review their work. Bergmann said that he knows that the SCSI subsystem works that way, but noted that other subsystems have different rules. In each case, the specific rules in force make sense to the maintainers involved. For the core ARM tree, for example, patches have to be added to the maintainer's patch tracker. In the virtual filesystem (VFS) area, patches have to be "really good", at which point they will silently appear in a mainline pull request. Linus Torvalds added that, for VFS patches, the maintainer tends to be more responsive if Torvalds is copied on patches. Patches for networking code always get a timely response. Ben Herrenschmidt said that he often gets responses rejecting patches for "stupid stuff" (trivial issues like coding style) and that he finds it frustrating.
Torvalds said that it will never be possible for all of the kernel's subsystems to be consistent in their handling of patches. Different subsystems have different models of development that have grown over time. The networking subsystem is good at taking patches, he said, because that's simply how the maintainer (Dave Miller) works; the patchwork system also works well for him. "But nobody else should try to be Dave". A better approach, he said, might be to create better documentation of each subsystem's rules.
Herrenschmidt said that the community likes to complain about companies that are shipping a lot of out-of-tree code. We would like them to get that code into the mainline. But that task is frustrating and demotivating now, so many of these companies decide that it's not worth the effort; the community needs to make it easier. He repeated that some maintainers are overly obnoxious about trivial issues, making it hard to get large work upstream. The fact that the rules vary across subsystems and are not written down makes it worse.
The group concluded that an effort should be made to document each subsystem's rules for patch acceptance. The information might be added to the submitting-patches document, though that document is already far too long. Perhaps, eventually, the get_maintainer.pl script could be enhanced with a better understanding of subsystem-specific rules. The documentation is hopefully forthcoming in the near future but, at the session, the documentation maintainer warned that his rules for patch acceptance are especially nasty.
Moving on, Pinchart noted that the community could make an effort to be nicer to new developers, preferably in a way that doesn't frustrate existing developers. He noted, for example, that the automated emails from the 0-day testing service can seem harsh — the first response to a first-time patch is essentially an automated criticism. Perhaps, he said, the messages could explain that they originate from a robot and should not be taken personally. Dan Williams said that he has arranged for separate 0-day testing just to avoid this sort of situation.
There was a suggestion to set up a new mailing list for the purpose of running a specific patch through the testing service. One potential problem is that the robot currently generates no reply if it finds no problems with the patch; that would have to change in this setting. Silence in response to patches, Kees Cook noted, is demotivating. Torvalds suggested that 0-day replies could start with "I love you but..."; Fengguang Wu, who was at the Summit, seems to have already made some changes in that direction. Pinchart said that he always starts off a review by thanking the submitter for the patch and that he often offers a Reviewed-by tag if specific changes are made. That gives both a positive message and a hint of light at the end of the tunnel that can help new submitters.
Patch submitters could benefit from better feedback on how to fix problems, especially in cases where reviewers give conflicting advice. In such cases, it was said, the maintainer needs to take a stand to resolve the situation, but Bottomley said that, if reviewers cannot agree on a patch, he doesn't want it at all. It is up to the submitter, he said, to bring the reviewers to some sort of agreement. Herrenschmidt complained that, sometimes, patches are simply bikeshedded to death; Ted Ts'o suggested escalating to Torvalds when that happens. Torvalds added that he will indeed route around obstructive maintainers when he has to.
There was a brief discussion regarding a current patch logjam in the SCSI target subsystem. The maintainers agreed that they would go ahead and take the more straightforward changes in an attempt to move things forward.
Sometimes an interesting patch becomes abandoned, often because the developer has moved on to other tasks. Perhaps reviving such patches would be a good project for an Outreachy intern. Bottomley said that one useful signal about the wisdom of accepting a specific patch is whether it is being pushed consistently; that suggests that the developer will be around to deal with problems after it is merged. Torvalds agreed that the abandonment of a patch says something, and that such patches probably should not be applied.
The problem of developers who have limited time to get a patch merged remains, though. Ts'o said that he will make an effort to respond quickly to developers who are known to be going away soon — interns, for example. Herrenschmidt said that he has seen some maintainers giving rude responses to developers who are working under deadlines. Bottomley responded that such patches are often created by contractors; they tend to be bad and are best left out. But such patches then just languish in some Android vendor tree, which is not a good outcome either. Ts'o said that the community's normal expectation is that the developer of a patch will stay around to maintain it after merging; Herrenschmidt agreed that contractors need to have some sort of convincing maintenance story for their work.
The topic of aggressive behavior on the mailing lists came up; it was agreed that all developers should call out such behavior when they see it. Bottomley wondered what the problem was, since the mailing lists have been steadily calming down for years. Apparently one remaining problem is perceived aggression in commit messages, where the changelog for a fix will say unflattering things about the patch that introduced the problem in the first place. There was relatively little sympathy in this case, though.
There is evidently at least one company that will not assign female developers to specific subsystems because of problems that have been experienced in the past. Everybody agreed that such situations should be taken immediately to the Linux Foundation Technical Advisory Board for resolution.
At the end of the session, Pinchart said that he was thinking about starting some sort of maintainer survey, patterned on the teacher evaluations used at many universities. This effort is likely to proceed, initially as an opt-in mechanism for maintainers who are interested. The feedback provided would be anonymized and would not be made public.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to this event].
An update on the Android problem
Android has been a great boon to the kernel community, having brought a great deal of growth in both the user and the development communities. But Android has also been a problem in that devices running it ship with kernels containing large amounts (often millions of lines) of out-of-tree code. That fragments the development community and makes it impossible to run mainline kernels on this hardware. The problematic side of Android was discussed at the 2017 Maintainer Summit; the picture that resulted is surprisingly optimistic.Greg Kroah-Hartman started by saying that he has been working for some time with the system-on-chip (SoC) vendors to try to resolve this problem, which he blames primarily on Qualcomm for having decided not to work upstream. Qualcomm has since concluded that this decision was a mistake and is determined to fix it, but the process of doing so will take years. The other SoC vendors are also committed to closing the gap between the kernels they provide and the mainline but, again, getting there will take a while.
Google's new rules requiring the use of long-term support kernels with
Android and
keeping up with updates should also help. If vendors do not follow those rules,
he said, he will eventually stop maintaining the LTS releases. For now,
though, he is running an experiment where he will support the 4.4.x kernels
for a period of six years. Vendors are coming around to using those
updates, he said, but there is a new problem in the form of carriers who are
proving unwilling to ship those updates. He is trying to get carriers to put
one out every six months for now.
Rom Lemarchand, Google's Android kernel manager, said that newer devices are shipping with 4.4 kernels now. The SoC market cycle is such that these chips will always run a two-year-old kernel. The two-year support lifetime for LTS kernels thus didn't work well for SoC vendors; just about the time that they ship something, the support goes away. Hopefully the six-year support period will work better. Updates are still a problem, though; vendors still are working under the mentality that they only need to take patches that have CVE numbers attached to them, which is not the case. Kroah-Hartman added that they weren't even taking all of the patches with CVE numbers. Kees Cook said that none of the vendors have decent testing for their kernels and don't want to merge any changes at all. They don't, he said, want to admit that they are bringing in LTS patches.
Along the lines of testing, there was some discussion of the Linux Test Project (LTP). This project has tended to be viewed dismissively by kernel developers, but it is evidently the recipient of more resources and has been getting better. There may eventually be value in integrating LTP into the kernel self tests. Linus Torvalds said that even an improved LTP is not that interesting compared to real workloads, though, so he would much rather see Android running on mainline kernels. This is evidently being worked on, but is not there yet. Lemarchand said that the HiKey boards are staying as close to mainline and can boot a 4.9 kernel, but Arnd Bergmann pointed out that the HiKey boards are no longer being produced.
Somebody asked: has any Android phone ever done a major kernel upgrade after it has been shipped? That is evidently a difficult proposition, since there a number of regulatory certifications that must be redone. But the Galaxy Nexus and Galaxy S phones both saw major kernel upgrades, so it is possible. Torvalds noted that there are a lot of Android devices that are not phones, tablets for example, that might prove to be better development devices. It would be nice if mainline developers could run their own kernels on real devices. Bergmann said that the gap is shrinking on some devices, and Kroah-Hartman repeated that he is working toward this goal with the SoC vendors, but the process should be expected to take about six years.
Cook said that applying the larger updates involved in following the LTS kernels completely should eventually make vendors more comfortable with larger kernel changes in general. Sean Paul said that running mainline kernels on Android devices may well become possible soon, but phones still probably will not jump to new major releases. Even that would be good, though, Bergmann said; the current out-of-tree code problem defeats the goal of building a single ARM kernel for all devices. Fixing that would enable third-party distributors to ship systems for multiple phones. Torvalds said that, even if vendors don't upgrade their devices, the ability to do so would enable some useful regression testing. James Bottomley said that the whole situation is a repeat of the enterprise Linux problem from many years ago.
Ted Ts'o asked if there were any ARM Chromebooks that could be used as development machines; Paul answered that the ones based on Rockchip SoCs were close. Torvalds asked about the status of the Mali GPU driver; Bergmann responded that there had been one person working on reverse-engineering that device, but he didn't work well with other developers. Now somebody else is making progress with the older GPUs, but nobody is working on current-generation devices. It was said that everybody within ARM is in favor of solving the problem by open-sourcing ARM's driver — except for one recalcitrant high-level manager.
Torvalds said that, if the Mali problem could be solved, the community as a whole would be in good shape. Bergmann said that there are currently four ARM GPUs with good free-software support, but they are all older. Going forward, Mali seems to be the GPU of choice for Android devices, so that is the problem that needs to be solved. Lemarchand said that pressure is being applied from the Android side as well.
The final conclusion of this session was that, while the Android problem has not gone away, the situation is far better than it was one year ago.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to this event].
The state of Linus
A traditional Kernel-Summit agenda item was a slot where Linus Torvalds had the opportunity to discuss the aspects of the development community that he was (or, more often, was not) happy with. In 2017, this discussion moved to the smaller Maintainers Summit. Torvalds is mostly content with the state of the community, it seems, but the group still found plenty of process-related things to talk about.The kernel development process is going well, with one big exception, Torvalds said: developers still seem unable to distinguish the merge window from the period after -rc1 is released and he doesn't understand why. An extreme example was the MIPS subsystem which, as a result, was not merged at all for two release cycles. Most of the issues are not so extreme, but the problem is ongoing.
It is not, he said, that he will not pull new stuff into a -rc release — that actually happens fairly often. But, any maintainer who sends a pull request that contains work that is not an obvious fix outside of the merge window should include a long explanation. It should say why the pull request is happening now rather than during the next merge window, and it needs to include a description of the code that is being pushed. That description is often missing, he said, but it's important. During the merge window, he tends not to look at the code much, but that changes during the rest of the development cycle. Ben Herrenschmidt then joked that anybody wanting to sneak code in past Torvalds should be sure to send it during the merge window.
Torvalds went on to say that he does, in fact, look more closely at the code in some subsystems, even during the merge window. He is currently unhappy with the security subsystem, for example, so it gets extra scrutiny. What developers want is for him to be so happy with them that he never feels the need to look that closely at the code they send. "Then you can do anything".
When asked whether he would rather see a pull-request explanation in the emailed request or in the message stored with the tag in Git, he replied that, while the tag message is a little easier for him to deal with, it doesn't matter that much. What's more important, though, is that he gets a "human explanation"; he doesn't like explanations that just list the patches that are included. Maintainers who have other maintainers below them should also require good explanations and feed them upward. Torvalds wants the kernel to have good commit messages in general, and that applies to merge messages as well. If, for example, a developer is using bisection to find a bug and ends up at a merge commit, they will want to know what changes the merge brought with it.
On the whole, though, things are working well, he said; there are no huge problems. When asked about group maintainership at the top level, he said that he is open to the idea, but doesn't think that there is a lot of need for it. He manages to be responsive, even when he's off diving in some remote part of the world.
When asked about the use of signed tags for pull requests, he replied that he requires them for requests from trees that reside outside of kernel.org. But even for kernel.org trees, he doesn't necessarily trust the DNS server that points him there, so signed tags are still useful. That was your editor's cue to mention this article showing where signed tags are (and are not) in use. Kees Cook asked for permission to occasionally break his signature to test the system; Torvalds said that would be OK, but that those tests already happen routinely since James Bottomley uses keys with a three-month expiration. He did point out that he does not normally fetch keys, though, and so will not notice a key revocation. If a maintainer has to revoke a key, Torvalds needs to be told about it directly. Ted Ts'o suggested that any developer who generated a key on a Yubico device should check that key to be sure that it is secure.
Referring back to the LWN article, Ts'o noted that the map of pull requests is quite flat, and that most requests go straight to Torvalds. Might that lead to bandwidth problems at the top of the hierarchy? Torvalds replied that he didn't see a problem there; he would rather get five pull requests than one, as that makes it easier for him to verify things. Pulls generally take almost no time unless there are conflicts. He does do a build after every pull, which can slow things down a bit.
Stephen Rothwell, the maintainer of linux-next, asked if he should be verifying tags on trees pulled there. Torvalds replied that, while linux-next is wonderful for build coverage, almost nobody actually runs kernels from there, so checking the tags is less important. The value of linux-next is that it encourages maintainers to keep their code out in public and shows that the contents of a pull request aren't just something that somebody came up with the night before. It would be nice if there were more testing of linux-next, though. Almost every release, his wireless networking and graphics break, and he's not happy with that. Chris Mason said that he does get bug reports from people running linux-next, but it only happens a few times each year.
Rothwell let it be known that he gets irritated by trees that are rebased after -rc3 or so. Some trees, he said, seem to do that automatically, but it's not something that should happen without a good reason. Arnd Bergmann said that it's common for trees to rebase on top of a late -rc to pull in their own fixes that went straight to the mainline. But the better solution there would be to merge the fixes branch directly.
Rothwell said that he has recently been doing closer checking of the Signed-off-by tags on patches and has found a few problems where the tag doesn't match the committer of the patch. There are also cases where there is no signoff at all.
At the end of the session, Torvalds said that there was one other small problem he would like to see fixed. Many subsystems work in topic branches, but do no work in the repository's trunk branch. They merge those branches into the trunk and send the result up in a pull request. That means that the trunk branch doesn't have any work in it, and that tends to confuse him. It would be better, he said, to just merge the development branches together and send the result, leaving out the unused branch.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to this event].
Maintainers Summit: SPDX, cross-subsystem development, and conclusion
The 2017 Maintainers Summit, the first event of its type, managed to cover a wide range of topics in a single half-day. This article picks up a few relatively short topics that were discussed toward the end of the session. These include a new initiative to add SPDX license tags to the kernel, the perils of cross-subsystem development, and an evaluation of the summit itself.
SPDX tags
Greg Kroah-Hartman told the group that, of the approximately 60,000 files
found in the kernel repository, some 11,000 have no license text at all.
That can be a bit problematic, since the Developer Certificate of
Origin that covers contributions to the kernel refers to "the
open source license indicated in the file
". To fix this problem,
Kroah-Hartman has put together a series of patches adding one-line SPDX tags. He asked whether there would be an
objection to a patch adding those tags to 11,000 files; none were raised.
The first
set of SPDX patches was subsequently pulled for 4.14-rc8.
Linus Torvalds said that he would eventually like to see SPDX tags on all 60,000 files in the kernel. There are people who want do automatic license tracking who would benefit from those tags. He's happy to add a single line to files with no license text at all to start with. For the files that do have license information, there are about 700 variants of the GPL text in the kernel, mostly varying in trivial ways (white space, or which address was used for the FSF office). SPDX offers a way to bring some uniformity to those license declarations. Adding tags to those files is a bigger job, though. While no-license-text files are implicitly GPLv2, files that contain license text must get a tag that matches what the text says. If the file is dual-licensed, for example, the SPDX tag must reflect that.
Might it be possible to get rid of all that license boilerplate and rely completely on the SPDX tags? That would be nice, Torvalds said, but that is not something that can be done in any sort of automatic way. Removing copyright information from files is fraught with all kinds of hazards and must be done carefully. So, for the short term, adding the tags to those 11,000 files with no text will have to do.
Kees Cook asked whether all new files added to the kernel should have SPDX lines; Torvalds answered in the affirmative. One remaining glitch is the files that define the user-space API; they will need to be annotated with a tag that includes the user-space GPL exception.
Cross-subsystem development
The kernel's maintainer model is normally quite effective at avoiding conflicts between developers; almost all work fits within a single subsystem. But, occasionally, a developer must make changes that affect a large set of subsystems; these changes can be hard to merge without creating a lot of conflicts. The best way to handle these changes has been a Kernel-Summit topic before, and it came up again here.
Cook, who has been pushing a wide-ranging set of timer API conversions, started off by saying that he often doesn't know how to direct such patches. It depends partly on whether they are API conversions or new features. The former are often best merged by him directly, while the latter often have prerequisites that complicate the picture and usually have to be merged though the relevant subsystem tree. He has tried to mark the two different types of patches in various ways, and still isn't clear on the best way to proceed. Among other things, that leads to ambiguity regarding whether he expects another maintainer to merge a specific patch or whether he wants an acknowledgment from that maintainer before merging the patch himself.
Developers who are generating large cross-subsystem patches should ensure that the relevant maintainers get a copy of the "0/N" message explaining the series as a whole. That often doesn't happen (git send-email tends to be the culprit here) leaving maintainers missing some important context.
Ted Ts'o mentioned the RichACL patches, which have been circulating for years and are now up to version 27. Much of the work in this series applies to the virtual filesystem (VFS) layer, so he doesn't think it's appropriate for him to review it; meanwhile, the parts that are specific to ext4 are irrelevant until the VFS piece has been reviewed. Arnd Bergmann said that he, too, has VFS patches (year-2038 fixes) that are needing review. Torvalds agreed that VFS can be a problem area when it comes to patches like this. There has been talk of adding a second VFS maintainer in the past, but that has gone nowhere. He would still like to solve that problem, but the community first needs to find a candidate to do the work.
Somebody asked about the status of the trivial tree, which traditionally has handled tiny patches and can be suitable for some cross-subsystem work. Jiri Kosina replied that he had suspended the maintenance of that tree due to lack of time, but it is back in operation now.
Cook noted that the path taken by patches to the mainline isn't always clear, and asked whether the maintainer hierarchy should be represented in the MAINTAINERS file. Torvalds replied that the information should already be there, but Cook said it's far from clear when the maintainer relationships don't match the kernel's directory hierarchy. Bergmann said that the arm-soc tree, which sits between system-on-chip maintainers and Torvalds, is deliberately omitted from the MAINTAINERS file. The arm-soc maintainers don't want to receive random email, and the maintainers who need to feed patches through arm-soc know where the maintainers are. But, he said, there would still be value in documenting this relationship somewhere.
Torvalds said that he gets annoyed by cleanup patches that have been split up excessively. He would rather see a single commit that just gets the job done; that makes merging easier.
The discussion concluded by going back to the timer changes. Cook said that perhaps he should have just sent the mechanical API changes after -rc1 came out. It was asserted that nobody would complain about such a merge — except that, perhaps, the maintainers of the graphics tree (none of whom were in the room) would.
Evaluating the Maintainers Summit
As lunch called, the participants at the summit briefly discussed the event itself. Torvalds said that he liked it, and that the size of the group (around 30 maintainers) was about right. Bottomley said that there were a number of Kernel-Summit sessions that would have benefited from Torvalds's presence; without him, they were unable to bring various discussions to a conclusion. The tracepoint ABI question was one such issue. Torvalds reported that he had talked with Steve Rostedt and come to a conclusion on how to proceed there; the details can be found in this article.
Next year, the Kernel and Maintainers Summits will be held alongside the Linux Plumbers Conference. There was some discussion about whether the Maintainers Summit should come first, or whether it should be held at the end, as it was in 2017. The conclusion was that holding the Maintainers Summit on the last day gives an opportunity to revisit issues that couldn't be resolved in the other sessions, so that is how things will be in 2018.
[Your editor would like to thank the Linux Foundation, LWN's travel sponsor, for supporting his travel to this event].
Page editor: Jonathan Corbet
Next page:
Brief items>>