|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for April 16, 2020

Welcome to the LWN.net Weekly Edition for April 16, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

An uproar over the Fedora Git forge decision

By Jake Edge
April 15, 2020

We last looked in on the question of a Git forge for Fedora at the end of January—which seems like nearly a lifetime ago, but is, in truth, only around two-and-a-half months back. At that time, requirements were being gathered for an open decision-making process that would seemingly play out with lots of community participation. That is not at all what transpired, however, and much of the Fedora community feels that its needs have not been taken into consideration. There are a number of lessons that can be learned from all of this.

Announcement woes

After a lengthy requirements-gathering thread on the fedora-devel mailing list back in January, things went rather quiet until the March 28 posting of "CPE Weekly", which is a newsletter that covers the activities of the Red Hat Community Platform Engineering (CPE) team. That is the organization behind the Git forge effort; tucked into the end of the newsletter was the "announcement" that the team had chosen GitLab as the forge for Fedora and CentOS, while still continuing to run Fedora Pagure with community assistance for projects that want to use it as their Git host.

There was, it seems, a plan to announce the decision on the Fedora Community Blog (and on Blog.CentOS.org). But, as noted by CPE manager Leigh Griffin, that did not happen due to "unavailability and illness" of a volunteer who was going to do it, which meant the first mention of the decision ended up in the already scheduled newsletter. The net result, as Neal Gompa pointed out, was that "the delivery of this decision sucked".

Beyond that, though, Gompa went through the user stories that had been gathered as part of the decision-making process at great length; he said that many of them could not be satisfied with any open-source solution, so in some sense he is not surprised that CPE looked beyond Pagure. But many of the requirements identified also make it clear that the open-source GitLab Community Edition (CE) would not fulfill the needs listed, so he thinks that CPE is really aiming for the proprietary Ultimate/Gold edition.

As might be guessed, Griffin largely disagreed with much of Gompa's point-by-point analysis; he also said that no decision had been made on which of GitLab's offerings would be used. The requirements were gathered from multiple stakeholders within Red Hat, including Fedora, CentOS, Enterprise Linux (RHEL), and CPE itself, but were generally not really evaluated, just collected: "It was not our place to question valid use cases or requirements from our stakeholders."

Julen Landa Alustiza was surprised that the decision was made without following some of the steps that were promised. He pointed to the original blog post that announced the effort, which mentioned things like "a live video call and associated IRC meetings will be held and advertised in advance to discuss the requirements, talk about concerns and address any questions". None of those things happened, he said. Griffin agreed:

We felt that the discussions came to a natural conclusion for each stakeholder group. I should have returned to that thread and updated folks that we had taken all the requirements in, that's on me, you have my apologies.

Others were surprised by how the process played out as well. David Kaufman wondered about the role of the Fedora Engineering Steering Committee (FESCo), Fedora Council, and the CentOS Governing Board in the decision. As Griffin pointed out, though, those organizations do not run the infrastructure in question—CPE does. As Miro Hrončok put it:

FESCo cannot actually make anyone do anything. So even if FESCo says: "The dist-git will run on Pagure", CPE can say: "Whatever, but we are not maintaining it."

Not sure what would be the result of such situation, but I am pretty sure it would not be very nice.

Fedora project leader Matthew Miller, who is also a member of the council, described the council's involvement:

We were asked to help collect the user stories and requirements. We also gave the strong feedback that the community wasn't interested in GitHub for a variety of reasons, which isn't a user-story per se but obviously important.

He quoted Hrončok's caution, noting that the council is in the same position as FESCo: advisory-only. But he is optimistic that things will work out well:

[...] Ultimately, the CPE team is going to put huge amounts of time, effort, and money into running this. So, I feel very comfortable in saying that this really is that team's decision to make.

At the same time, having a dist-git and a git-forge at all is totally meaningless if the Fedora contributor community's needs aren't served. I see the Council's role here in making sure they are.

And, while obviously communication could have been better and maybe a longer process would have helped, ultimately, I have a lot of trust for the people who work on CPE. Let's help them figure out how to best help all of us.

While that conversation was ongoing, Griffin posted the (delayed) announcement of the decision, which set off another enormous thread. Many of the same complaints (and defenses) were heard, but the overarching unhappiness from multiple community members was perhaps best summed up by Hrončok:

[...] I am not [criticizing] the decision per se, I am just very sad with the communication around it.

Most importantly the following:

  • 1) At the beginning, it appeared that Fedora will be in the loop when the decision will be made, but it wasn't. After collecting the requirements, there was no Fedora involvement.
  • 2) The use cases Fedora collected were (IMHO artificially) merged to "more general" requirements.
  • 3) The "I respect that there is disagreement but Gitlab is the choice we are making" attitude.

Requirements lost

But it turns out that some of the requirements that the Fedora community thought it was providing to the process got watered down along the way. In particular, the first two items on the proposed requirements list, which were that the solution be self-hosting and also be free and open-source software (FOSS), were never quite communicated to CPE as requirements. Along with others, Kevin Kofler raised the point, noting that considering the GitLab-hosted solutions, which are proprietary, "should be an absolute no go". Michael Catanzaro agreed:

Some failure of process or communication must have occurred somewhere along the lines, because open source should have been the first and most important requirement. A proprietary software solution is incompatible with the ethos and purpose of the Fedora project. I ask CPE to revise its requirements list to include open source as the first and most important requirement from the Fedora community. If that's incompatible with CentOS's need for merge request approvals or whatever else, then we need to accept that sharing the same forge is simply not going to work.

If open source is really not going to be a requirement, then we can ask Council to politely reject CPE's offer to continue hosting our forge, and instead round up some budget to keep Pagure running without help from CPE.

That communication failure was teased out in a sub-thread that discussed the staffing levels of the CPE team. Quite a ways down in a lengthy message from Adam Williamson, he pointed to the difference between the proposed list and the final summarized list that was used to make the decision:

However, the top three points on the Fedora list relate to F/OSS and self-hosting principles. These points are entirely missing from the "summarized" list. They have not been "merged" with "duplications" or "similar in theme requirements". The "spirit" of them has not been "captured" in a "User Story". They are just *not there*.

Griffin noted that the list submitted to CPE is not the same as the one proposed (even though earlier Griffin had mistakenly said that it was). After a request from Williamson, Fedora Program Manager Ben Cotton posted the final list, which did not, in fact, contain the self-hosting and FOSS requirements. Williamson was understandably dismayed:

Uh...wow.

So, Leigh was correct, and the F/OSS and self-hosting requirements are entirely removed from this list. Not adjusted or de-emphasized or given nuance, but simply removed entirely.

[...] The end result of this is that we (Fedora) have somehow indicated to CPE that we have no preference whatsoever for F/OSS tooling. I do not believe that should have been the case.

[...] One thing I'll note here: this is *exactly* the kind of thing that would have come to light very quickly if the open process which was committed to at the start had actually been followed through on.

Cotton said that he "included a reminder that FOSS is always our strong preference where viable" when he sent the list, but he apologized for not making it into a "user story" as part of the requirements. In the previous message, he had also apologized for not posting the final list at the same time it was sent, which might have caught the problem earlier. As is perhaps not surprising at this point, Griffin acknowledged that CPE simply treated the FOSS and self-hosting items as a "strong preference" and not a requirement, but: "It will certainly influence our decision to opt for Gitlab CE for the Fedora preference to stay open source."

There was some additional unhappiness expressed when the next "CPE Weekly" was posted on April 4. It contained an apology for the communication breakdown and a request to continue engaging with the team:

Firstly, let us apologize again to the communities for our drop in communication between the requirement collecting phase and the decision making phase. As we have said before, it was in no way, shape or form an intentional lapse of communications. However we do recognize that it was still nonetheless a decision that was not made in public, and for that we can only now offer our apologies for this mistake and learn a hard lesson from it.

[...] While the discussions on the lists are deeply emotional, they are still incredibly valuable to us to truly comprehend the importance of our next steps in ensuring we make the right choices in the coming months.

Randy Barlow said that it was not the only thing that could have been done. "You can hold off on making a decision, and follow an open process instead." In addition, he was skeptical that the community could really benefit from engaging with CPE "when it has demonstrated that it does not meaningfully engage with the community". Chris Murphy agreed:

Flaws have been discovered in the process used to arrive at the decision, and therefore it's not clear whether the decision is valid. The mistake has been admitted, and treating it as moot suggests in fact nothing has been learned from the mistake.

Griffin, however, clearly feels that the decision is the right one, no matter what flaws there were in the process. "Our choice of Forge that CPE will support is valid from the requirements we received". Part of the disconnect may be that Griffin and CPE see the council as the "engagement point" with Fedora, while the community has always been rather more freewheeling than that. In addition, FESCo has typically gotten involved in changes of this nature, as Hrončok, who is a FESCo member, noted in a comment on a council issue ticket:

A question of dist-git is an Engineering question (with strong FOSS and community aspects). I see Council as a governance body for the Fedora project, but things that impact the Fedora distro on technical level should IMHO be reviewed at FESCo.

CPE management decided to go [through] Council, and sadly nobody from the Council looped FESCo into the decision process here (neither the Fedora Council Engineering Rep nor the Fedora Project Leader nor Fedora Program Manager nor anybody else).

That council issue resulted in an April 15 statement from the council about the Git forge decision. It expresses support for the decision made, while accepting blame for the communication woes. It also notes that the version of GitLab to be used has not been determined yet, but that "we are looking at the possibility of using GitLab CE for our dist-git specifically" no matter what version is used for other purposes.

The cynical among us can perhaps be forgiven for noting that all of the mistakes made fell pretty much perfectly to ensure that GitLab was chosen, while providing cover for CPE in making that decision. Even those who are not overly cynical may have some suspicions that the process was, to a certain extent, designed to arrive at that conclusion. As people suspected back when the process was announced in February, perhaps it was simply an exercise to justify dropping Pagure. For example, Williamson reflected on what he saw as the Kafka-esque nature of the process:

At the outset of this whole mess, quite a lot of people said "well this is obviously just all a pretext for dropping Pagure and going to hosted Gitlab". Much offence seemed to be taken at this, and much was made of this absolutely not being the case at all, and Pagure being definitely a contender, and - as was pointed out upthread - how there would be public meetings and feedback sessions and a whole three-ring circus before a final decision was made. Which very definitely hadn't already been made, or anything.

And now three days into this thread, you're saying "well, CPE doesn't have the resources to maintain Pagure". So, what, people were right in the first place, and this was really just the Dump Pagure Project all along?

If so what was the point of all this half-baked kabuki nonsense? Why not just say so up-front? If CPE never thought it had the resources to maintain Pagure and Pagure was never really a contender, and Github was as clearly a non-starter as Leigh says it was, why didn't we just say "yeah no we're going to Gitlab" four months (or whatever it was) ago and save all of this silliness? We agree that this process wasn't actually very open at all, but *even if it had been*, if the result was preordained, what would have been the point?

Once again, the pre-ordination of the decision was denied by Griffin, but even in that message there are hints that the Pagure option was quickly eliminated. It would seem that it might have been far easier and created much less rancor to simply state up front that Pagure was not viable under the requirements that were likely pretty well-known long before the gathering phase was started. Committing to a formalized process and then dropping the ball in various ways throughout that process has clearly left a bad taste in many mouths. Even unpopular decisions can be made directly and openly in fractious communities if the communication is handled well. When the communication is botched, however, all bets are off—even relatively popular decisions can go badly awry.

Comments (13 posted)

Video conferencing with BigBlueButton

By Jonathan Corbet
April 10, 2020
While social distancing often comes naturally to free-software developers, there are still times when we wish to talk to each other. In the absence of community conferences, the next-best alternative is often video conferencing. While video conferences tend to be held using centralized, proprietary systems, there are free alternatives as well. LWN recently looked at Jitsi but this effort did not stop there; next on the list is BigBlueButton, a system that is oriented toward the needs of online educators but is applicable beyond that use case.

BigBlueButton is not a new project; it has been under development since 2007. That history shows in a number of ways; for example, the actual conferencing component was originally implemented in Flash and has only recently been supplemented by an HTML5/WebRTC-based solution. The code is licensed under the Lesser GPL; the web site doesn't say which version, but comments in the code say version 3 or later. The code itself is a massive collection of Java, Scala, and JavaScript (at least) code — almost 1,800 directories worth.

Installing BigBlueButton

The heavyweight nature of BigBlueButton shows in the installation instructions, which go on at some length. There is also a script to do all of that work, along with a suggestion to use the time-honored "wget|bash" execution method. Your editor used the script, but only after eyeballing it (thus guaranteeing that it is secure) and making a change as described below.

The system's server requirements are quite specific; in particular, only the Ubuntu 16.04 release is supported. BigBlueButton wants at least 8GB of memory ("with swap enabled") and four CPUs. The installation script will refuse to run on any other distribution or if the system has less than 4GB of installed memory. Your editor capitulated on Ubuntu 16.04 but, being of a generally parsimonious nature, tweaked the script to make it accept the low-end 2GB virtual server used to test the system.

The Ubuntu 16.04 requirement is justified in the name of stability, and one can easily imagine that the developers wouldn't want to support this towering stack of software on multiple platforms. But that is an ancient distribution at this point, and it is losing support in one year; there is going to have to be a mass migration of BigBlueButton systems over the next twelve months.

The installation script takes about a half hour to get its job done. This work includes setting up a number of third-party repositories for needed components; the installation instructions correctly note that "the default version of ffmpeg in Ubuntu 16.04 is old", for example, so the script obtains a more current version from elsewhere. By the end of the process, not only was the software installed, but the script had helpfully obtained a TLS certificate for the server from Let's Encrypt. Those trying this at home should note that BigBlueButton can take several minutes to actually get started after being launched; they should not assume quickly that the installation has failed.

While BigBlueButton handles the conferencing tasks, it does not concern itself with front-end tasks like managing "rooms" or authenticating users. There is, instead, a reasonably well-documented API that is intended to be used by the front end. Given its roots in the educational community, it is not entirely surprising that applications like Moodle use this API to integrate with BigBlueButton. For those wanting to run a standalone system, there is a front end called Greenlight that can optionally be installed with the BigBlueButton installation script.

Greenlight, as it turns out, is a Ruby-on-Rails application, adding nicely to the menagerie of languages running on the server. And, while the installation script loads BigBlueButton directly onto the server, Greenlight gets installed as a Docker container. That makes management a bit interesting; after some digging, it turned out that the way to create an initial administrative account on the server is a command like this:

    docker exec greenlight-v2 bundle exec rake \
           user:create["root","bbb-admin@lwn.net","password","admin"]

Running this command promptly threw the server into an existential out-of-memory crisis, making it clear that the project was serious when it said 4GB is necessary — even before a single video conference is established. Once this little obstacle was overcome, the system was up and running.

Then what?

Connecting to the web server, once it gets going, yields a generic "about Greenlight" page and the opportunity to create an account. And the inevitable "yes I agree to let you use cookies" box to check, of course. The default configuration will happily let users connect without encryption, with no redirection, despite the fact that HTTPS works just fine. Account creation is straightforward; by default, anybody can create an account on the site.

Once logged in, the user is placed in the "home room". A big, blue (naturally) "Start" button is there to launch a meeting in that room. Users may create other rooms at will. Each room has its own name, but that name is not useful for others wanting to join a session in that room; instead, there is a special "invite URL" that must be handed to potential participants.

Hitting the big, blue button will launch a meeting in the selected room. The first question all participants get is whether they only want to listen to what is going on or send audio as well; clearly the latter is needed for any sort of interactive experience. What follows is the "echo test", where the microphone is fed through to the speakers (generating some nice feedback if a headset is not in use) and asking the user to confirm that it is working as expected. As a one-time feature this is nice, but the system requires this step every time one enters a meeting room; given that the user has already agreed to cookies, it would seem possible to set a cookie saying "audio worked the last 20 times, we can probably skip the test now", but that doesn't happen.

A separate step is required to send video. At this time, the user can select which camera to use and the resolution of the stream to be sent; there is no easy way to change that resolution afterward. It's worth noting that BigBlueButton defaults to relatively low (640x480) resolution, unlike Jitsi, which tries to cram as much video data as it can down the pipe. The lower resolution seems entirely adequate, and will certainly reduce the bandwidth requirements overall.

The conference screen comes up with a presentation window by default; video windows are squeezed in as participants make them available. Video and audio quality seem generally good. If all you want to do is to have a conversation among a group of people, the result is like a lot of other video-conferencing systems, especially once one sends the presentation window away to make more space. BigBlueButton uses a tiled view showing all participants with a video feed. One can "focus" on a specific participant (making their window bigger) or make one person take up the whole screen. There does not appear to be a mode where the current speaker is shown, which is arguably not a huge loss.

There is a lot more to BigBlueButton than just video chat, though. The presentation window really is set up for running through presentations; it is also relatively straightforward to upload a presentation in (for example) LibreOffice format. Only the participant designated as the "presenter" can do this, though; the room moderator can move the "presenter" token around at will. The presenter can also use a rudimentary set of tools to draw on the slides if desired.

[BigBlueButton]

There is a "multi-user whiteboard" mode that the presenter can turn on; it makes the drawing tools available to all participants in the meeting. Everybody can have fun writing graffiti on the presentation slides at that point. There does not appear to be a way to have a shared whiteboard that is independent of the presentation slides; if all one wants is the whiteboard, the answer appears to be to upload a "presentation" consisting of a single blank page.

On the left is a column of icons for the participants; those indicate whether they are sending audio among other things. Each participant can set a "status" in their icon, indicating sadness or confusion or, by "raising their hand", a desire to speak. The icons are small and the status is not easy to see, though; if the number of participants is large, picking out the one with their hand raised seems like a challenge.

The room moderator has some extra options attached to the icons, including the ability to turn any participant into the presenter. Participants can be muted, either individually or all at once. There is also an option to promote other participants to moderator status, but that feature did not appear to actually work.

There is a chat feature, naturally, for out-of-band text conversations; those can be either shared or private. The moderator has the ability to lock down private chats, though, in an attempt to focus the attention of students who surely have no other way to send messages to each other. There is a separate area for shared notes. BigBlueButton also supports closed captioning — in multiple languages simultaneously — but users have to provide the person actually entering the caption text.

The moderator can click a button at the top of the screen to record the session; all participants will be notified when this happens. Once the meeting ends, the server will launch a long-running FFmpeg process to convert the recording to an exportable format; the result will eventually appear on the room owner's screen. Recordings are private by default but can be made public. It's worth noting that the recordings are not just a video file; they are a sort of web application that includes the session notes, chat text, slide thumbnails, and more. Getting a plain video file that can, for example, be uploaded elsewhere does not appear to be directly supported.

There is a "breakout room" feature available to the moderator, who selects a number of rooms to create and how long they will exist. The moderator can also assign participants to specific rooms or hit a button to scatter them randomly. Once the breakout rooms are launched, participants are effectively jailed within them until the selected time period expires; there does not appear to be a way to end the exercise ahead of schedule. Recording does not appear to work within the breakout rooms.

The system has a simple "polling feature" that the presenter can invoke; each participant will see a set of buttons and can pick the one that corresponds to their answer to whatever question is being asked. It's worth noting that while participants cannot see how others vote, the presenter can. The presenter can "publish" the results, ending the poll and placing the vote counts (but not individual votes) on top of the presentation window.

Unlike Jitsi, BigBlueButton does not have an app for mobile devices; instead, the web interface is responsive and scales itself appropriately. For the most part it works well, with the occasional glitch. Some options, for example, cover the discussion with a dialog; Android users will naturally hit the "back" button to make that go away, and will be disappointed when it backs them out of the meeting entirely. In general, though, the developers seem to have done a good job of mobile support.

All told, BigBlueButton provides a reasonably full-featured and solid experience, whether the intent is giving presentations or having a group meeting.

The management side

One of your editor's complaints about Jitsi was the lack of overall management tools; there are few ways for a server administrator to even know what is happening on the system, much less apply controls to it. BigBlueButton seems to be rather more developed in that regard.

By default, anybody can create an account, anybody with an account can create rooms, and anybody with the URL for a room can join into it. All of those can be changed, though. Account creation can be made invitation-only, or administrator approval can be required. Specific [Roles dialog] privileges (the ability to create rooms, manage rooms created by others, tweak system settings, etc.) can be given to users as desired. Server administrators can see all of the existing rooms and their status — and step into any of them if desired.

The owner of a room can set its access policy. An access code can be set on the room; that is a six-digit number that is chosen (seemingly randomly) by the system, rather than the owner. Participation can be limited to users who have authenticated with the site. It's also possible to require moderator approval for anybody wanting to join a room. Moderators can also, of course, boot participants out of a room if needed.

Like Jitsi, BigBlueButton claims to be able to interface with a SIP server to implement dial-in access to meetings; this is not a feature that your editor was able to try out.

Finally, it's worth returning to resource usage. Running BigBlueButton on that 2GB server worked in the end, and a number of meetings (up to about ten streams) have been hosted on it. The system swapped rather heavily but was able to get the job done. The run-time resources required to add another participant to a meeting don't seem qualitatively worse than Jitsi's; BigBlueButton requires a bigger base to run on, but seems to scale similarly. Its bandwidth requirements are lower, presumably due to the lower-resolution video settings. (It should be noted that some scaling issues only turn up with large meetings; your editor doesn't have enough friends to run one of those.)

To summarize, while both Jitsi and BigBlueButton are capable video-conferencing systems, BigBlueButton appears to be a more solid and feature-complete offering. In particular, the administration features and shared whiteboard capability stand out. The free-software community often struggles to create top-quality graphical applications; here we have (at least) two of them. It would appear that there is no real need to be using proprietary solutions in this area.

Comments (17 posted)

5.7 Merge window part 2

By Jonathan Corbet
April 13, 2020
By the end of the 5.7 merge window, 11,998 non-merge changesets had been pulled into the mainline repository for this development cycle. That is 1,218 more than were seen during the 5.6 merge window; it would appear that current world events have not succeeded in slowing down the kernel community — at least, not yet. The latter half of the merge window tends to see more fixes and fewer new features, but there are still a number of interesting things that showed up after the first-half summary was written.

Architecture-specific

  • The ability for 32-bit Arm systems to host KVM guests has been dropped.
  • The s390 "fake NUMA" implementation has been removed; there are evidently no scenarios where it can provide performance benefits for s390 systems.
  • the RISC-V architecture has gained support for CPU hotplugging.

Core kernel

  • The control-group memory controller now implements "recursive memory.low protection". The memory.low value indicates a minimum amount of memory that the members of the group should be able to share but, in current kernels, it is inflexible and must be configured at every level of the control-group hierarchy. If that hierarchy is mounted with the memory_recursiveprot option, though, a memory.low value set in a high-level node automatically applies to all children unless explicitly overridden, allowing more flexible run-time allocation of memory within the hierarchy. See the above-linked changelog for more details on how it works.
  • It is now possible to spawn a process directly into a control group using clone3().
  • The cgroupfs filesystem used to manage control groups now supports extended attributes; the use case appears to be to allow notes to be left for user-space management daemons.
  • The userfaultfd() mechanism now has the ability to write-protect pages in the target process. A small amount of documentation can be found in this commit.

Filesystems and block I/O

  • As expected, there is a new implementation of the exFAT filesystem; this one is provided by Samsung with Microsoft's blessing.
  • The F2FS filesystem now supports compression with zstd.
  • The Ceph filesystem can perform file create and unlink operations locally without waiting for the server to respond, speeding tasks (such as an rsync operation) that do a lot of that kind of work.

Hardware support

  • Clock: MediaTek MT2712 realtime clocks, Qualcomm SM8250 global clock controllers, Qualcomm SC7180 modem clock controllers, Spreadtrum SC9863A clocks, and Ricoh RC5T619 realtime clocks.
  • Miscellaneous: UniPhier XDMAC external DMA controllers, Ingenic JZ4780 EFUSE memory, devices connected via the Modem Host Interface (MHI) bus, Qualcomm SC7180 and OSM L3 interconnect buses, CoreSight cross trigger interfaces, Meson AXG MIPI + PCIE analog PHYs, Freescale Layerscape PCIe Gen4 controllers, Amlogic Meson secure power domains controllers, SGI Octane front-panel LEDs, Azoteq IQS620A/621/622/624/625 multi-function sensors, Ricoh RN5T618/RC5T619 power-management ICs, Spreadtrum thermal sensors, Freescale i.MX8MM temperature sensors, ChromeOS embedded controller type-C connectors, and Texas Instruments K3 RTI watchdogs.
  • Pin control and GPIO: Qualcomm IPQ6018 pin controllers, Dialog Semiconductor DA9062 PMIC pin and GPIO controllers, and Mellanox BlueField 2 SoC GPIO controllers.
  • Sound: Amlogic AIU audio output subsystems, Amlogic T9015 digital-to-analog converters, Texas Instruments TLV320ADCX140 codecs, Realtek RT5682 codecs, Broadcom BCM63XX I2S modules, and Maxim MAX98360A amplifiers.
  • vDPA: the kernel now supports vDPA devices which, according to this commit, have a data path compliant with the virtio specification. These devices can be virtual themselves, but they can also implement virtio in the hardware. Two Intel devices are the first to use this support.

Miscellaneous

  • The GPIO subsystem provides a new ioctl() command that allows a process to be informed when the properties of any GPIO line change. This commit contains an example utility that uses this feature.

Virtualization and containers

  • There is a new free-page reporting mechanism by which a guest can inform the host that specific pages are no longer in use. The host can then respond by reclaiming the pages. Some documentation can be found in this commit and this commit.
  • KVM has been fixed to address the problems introduced by split-lock detection; out-of-tree, VMX-based hypervisors will still have problems.

Internal kernel changes

  • The development of explicit pinning of user-space pages continues in the hope of finally solving a number of longstanding problems with get_user_pages(). In particular, the tracking of pinned pages has been implemented, but the decisions on how such pages should be handled are yet to be made.
  • The dynamic debugging mechanism can now be used in the absence of the debugfs virtual filesystem via a new control file at /proc/dynamic_debug/control.
  • The new vm_insert_pages() is a batched version of vm_insert_page(); it puts multiple pages into a user-space virtual memory area with reduced locking overhead.
  • The minimum version of binutils required to build the kernel has been raised to 2.23.
  • The new LLVM=1 command-line option causes the kernel to be built using LLVM utilities rather than GCC and binutils. To use the integrated assembler, though, LLVM_IAS=1 must also be supplied; the old AS=clang option no longer works for this purpose.
  • The last patches applied before the merge window closed sorted the MAINTAINERS file (back) into alphabetical order, thus probably creating a bountiful supply of merge conflicts going forward. To help ensure that this supply is truly bountiful, the order of the fields within each entry has also been sorted alphabetically.

The time has come to fix the bugs in all that new code and turn 5.7 into a proper release. If the usual schedule holds, that release can be expected on either May 31 or June 7.

Comments (10 posted)

Concurrency bugs should fear the big bad data-race detector (part 2)

April 14, 2020

(Many contributors)

This article was contributed by Marco Elver, Paul E. McKenney, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Kostya Serebryany, Alan Stern, Andrea Parri, Akira Yokosawa, Peter Zijlstra, Will Deacon, Daniel Lustig, Boqun Feng, Joel Fernandes, Jade Alglave, and Luc Maranget.

In part 1 of this article, we gave an overview of the Kernel Concurrency Sanitizer (KCSAN) and looked how it can detect data races in the kernel. KCSAN uses the definition of "data race" that is part of the Linux-Kernel Memory Consistency Model (LKMM), but there is more that KCSAN can do. This concluding part of the article describes other ways that the tool can be used to find data races and other kinds of problems in concurrent code. It provides some ideas on strategies and best practices, briefly considers some alternative approaches, and concludes with some known limitations.

Applying KCSAN to different types of code

When dealing with data races, we need to be aware of the code's requirements and purpose. Some code tolerates data races but other code does not. Where that tolerance also results in improved scalability of the code or design in question, the data_race() marking can be applied to any expression where data races are intentional, thus documenting this fact and also telling KCSAN that data races in the expression should be ignored.

The rest of this section discusses how to get the most out of KCSAN for different types of code.

Data-racy reads from shared variables that are used only for diagnostic purposes should typically use data_race(), since it is normally not a problem if the values are off by a little. Examples include the reads used to construct lockdep reports, monitoring and statistics (including /proc and /sys output), the argument to WARN_ON_ONCE() when the return value is ignored, and other situations where the reads are not in any way an integral part of the core concurrency design for the shared variables in question. There are of course exceptions to this rule. For example, if user-space code requires selected /sys output to give a coherent snapshot of in-kernel state, then that output must be considered to be a first-class part of the core concurrency design and must therefore use proper synchronization.

Reads whose values are checked might also use data_race(). Examples include:

  • Reads that are being fed only into cmpxchg() and friends (possibly with some computation on the way) such that cmpxchg() is guaranteed to fail if the compiler does anything unexpected with the load. But please keep in mind that a data_race() load feeding into a cmpxchg_relaxed() might still be subject to load fusing on some architectures. Therefore, it is best to capture the return value from the failing cmpxchg() for use in the next iteration of the loop, which provides the compiler much less scope for mischievous optimizations. This approach also saves a memory reference in many cases.
  • Reads within a sequence-locking read-side critical section, whose values are ignored unless the subsequent read_seqretry() returns false. However, data_race() is only needed in sequence-locking read-side critical sections for reads that access variables updated outside of the corresponding write-side critical section. Reads of variables updated only within the write-side critical section are automatically ignored by KCSAN, which understands and automatically recognizes sequence locks.
  • Reads that feed into heuristics, such that occasional errors are compensated for. But again data_race() loads are still subject to load fusing, which can result in consistent errors, which in turn are quite capable of breaking heuristics.
  • If compilers are assumed never to invent stores just prior to a normal store, then concurrent normal loads might be able to make some assumptions about the value loaded. For example, kernel-space addresses typically have at least a few upper bits set because the low-numbered addresses are normally reserved for user space. In such a system, a comparison to a NULL pointer will give the correct answer no matter how the compiler slices and dices the loads and stores.

Reads from and writes to variables that are not supposed to be subject to data races should use plain C-language reads and writes, thus enabling KCSAN to flag bugs involving unintended (and thus likely buggy) data races. Important categories of non-data-racy situations include:

  • Accesses protected by strict locking.
  • Initialization- and cleanup-time accesses. This covers a wide variety of situations, including the uniprocessor phase of system boot, variables to be used by not-yet-spawned kthreads, structures not yet published to reference-counted or RCU-protected data structures, and the cleanup side of each of these situations.
  • Per-CPU variables that are not supposed to be accessed from other CPUs. This case occurs in RCU, for example.
  • Private per-task variables, including on-stack variables, some fields in the task_struct structure, and task-private heap data.

As noted above, data-racy reads for diagnostic accesses to otherwise data-race-free variables should use data_race(). The non-diagnostic accesses should not be marked. After all, accesses to data-race-free variables need to remain unmarked in order to allow KCSAN to detect buggy concurrent accesses.

Note that marking accesses racing with accesses marked data_race() is optional, but for documentation purposes it is still recommended to do so where appropriate.

According to the strict LKMM data-race definition, other use cases should use READ_ONCE() or stronger for intentionally data-racy reads and WRITE_ONCE() or stronger for intentionally data-racy writes. However, there are exceptions depending on taste, and KCSAN can be configured to suit:

  • Plain reads that race with writes that do not change the value of the shared variable can be forgiven by building with CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=y (selected by default). This is useful in cases where a variable is compile-time initialized to (say) zero and then repeatedly set to (say) the value one during execution. In such cases, the compiler would need to be rather demonic to load some other value other than one when running concurrently with a write that left the value unchanged. However, some developers might prefer to check the value so as to avoid same-value stores altogether, thus possibly also improving cache locality.
  • Unmarked writes (aligned and up to word size) can be treated as if they had used WRITE_ONCE() by building with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y (also selected by default). Experience has shown that compilers are much less likely to destructively optimize in-kernel writes than reads. Some developers might therefore choose to use READ_ONCE() but omit the corresponding WRITE_ONCE(). Other developers might prefer the documentation benefits and long-term peace of mind accruing from explicit use of WRITE_ONCE(), especially those developers whose code stores certain constants.

As before, accesses that are intended to be non-data-racy should use plain C-language loads and stores, which allows KCSAN to find accesses that violate your synchronization rules.

Although KCSAN cannot directly detect unnecessary use of READ_ONCE() and WRITE_ONCE(), one way of indirectly detecting this is to remove the READ_ONCE() or WRITE_ONCE() from accesses that are suspected to be unnecessary, and then run KCSAN using a benchmark that has high probability of exercising the suspected race conditions. However, some care is required given that the data raciness of a particular access might depend on Kconfig options or kernel boot parameters and sysfs settings.

The list in this section represents best practices at this point in time. Both KCSAN and the best practices surrounding its use can be expected to change as we learn how best to use this data-race detector and also as KCSAN evolves.

Developer/Maintainer data-race strategies

An earlier article described four approaches to applying markings for shared variables, which can be paraphrased as follows: (1) Never, (2) Clear and present bug, (3) Data races, and (4) Always. The following list describes how to use KCSAN for each option, and adds a fifth option that has resulted from experience using KCSAN.

  • Never: In this case, you may add "KCSAN_SANITIZE_file.o := n" to the respective Makefile (or "KCSAN_SANITIZE := n" for all targets in the Makefile). Note that data races between one access having KCSAN enabled, and the other not having it enabled, will result in a "race of unknown origin" report, pointing at the access for which KCSAN was enabled. If you do not want to see any such data races, set CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=n.
  • For any access to any shared variable for which there is a possibility of a data race, and for which it can be clearly shown that specific compiler optimizations could result in bugs: Analyze most KCSAN reports for vulnerability to compiler optimizations. This is not a one-off task because data races that appear "benign" today, might not remain benign [PDF] in the future. It is up to the developer to prove that the compiler cannot break the code. When carrying out such proofs, whether formal or informal, please recall that plain C-language accesses are subject to load/store fusing, invented loads/stores, dead-code elimination, load/store tearing, and possibly other manipulations from clever compiler developers. If a given data race is deemed to be benign, you can prevent KCSAN from complaining about it via the data_race() macro. As long as all conflicting access are marked (either data_race() or READ_ONCE(), WRITE_ONCE(), etc.), KCSAN will be silent. Alternatively, tell KCSAN to ignore all accesses in a function via the __no_kcsan function attribute.
  • For any access to a shared variable for which there is a possibility of a data race for at least one of those accesses: Enable the most aggressive KCSAN reporting and act on all KCSAN reports. All accesses to any shared variable involved in a KCSAN data race report needs to use any one of the appropriate marked atomic operations (READ_ONCE(), WRITE_ONCE(), etc., but avoid data_race() except in diagnostic code).
  • For all accesses to all shared variables: KCSAN is a bit more forgiving than this, in that it only reports true concurrent access. However, most of the time, if KCSAN reports a data race on a shared variable, you can then apply the appropriate marking to all accesses to that variable, even if they may not actually race with another access.
  • For any access to any shared variable for which there is a possibility of a data race, except for certain cases where a reasonable compiler (as opposed to a theoretical demonic compiler that just barely adheres to the standard) would not break concurrent algorithms: Act on most KCSAN reports, but also configure KCSAN to match your notion of what constitutes a reasonable compiler (but note that the defaults are quite forgiving of data races). If some future compiler becomes unreasonable and thus applies optimizations that break concurrent code, the Linux community would disable those optimizations by specifying compiler flags. However, it is worthwhile to look at other accesses to the variables in KCSAN reports because, similar to lockdep, KCSAN reports only those data races exercised by actual executions.

Patch submissions based on KCSAN reports also need a strategy: As noted in part 1, you should not respond to KCSAN reports by mindlessly adding READ_ONCE(), data_race(), and WRITE_ONCE(). Instead, a patch addressing a KCSAN report must clearly identify the fix's approach and why that approach is appropriate.

Of course, the KCSAN report should be included in the commit log, preferably with file/line numbers; simplify the report, where appropriate, by removing irrelevant portions of the stack traces, for example. Furthermore, the commit log should include a detailed description of the problem that the KCSAN report identified and the reasoning behind the fix. Where possible, this reasoning can reference or excerpt the comments and documentation defining the design rules violated by the old code. In a few cases, this detailed description might obviate the KCSAN report, but in that case, please do at least credit KCSAN.

If the commit's effect is to merely silence the warning with no other algorithmic change, many maintainers will treat the commit with great suspicion—and rightly so. Therefore, such commits should include a clear explanation why silencing the warning is the right thing to do.

Finally, the commit log should include instructions to reproduce the data race, if possible, along with any non-default Kconfig options.

Taking KCSAN beyond LKMM

Data races are often symptoms of logic bugs, which is why KCSAN's ability to locate them is so valuable. Therefore, as noted earlier, reacting to each and every KCSAN report by mindlessly adding a READ_ONCE() or a WRITE_ONCE() is a grave mistake, especially when the code being modified has multiple callers. The value of these reports is instead that they inform the developer of unexpected concurrent accesses, and the correct response might instead be to apply proper synchronization, for example, by acquiring the corresponding lock. In this case, adding the READ_ONCE() would resolve the data race, but would simply obscure the underlying bug without actually helping anything at all. So again please think carefully about KCSAN's reports instead of mindlessly reacting to them.

This is why the previous section recommends leaving accesses unmarked in many cases: leaving them unmarked means that a race-condition bug will result in a data race, permitting KCSAN to find it for you.

Although KCSAN's ability to identify data races is extremely useful, data-race detection is just the tip of its iceberg. For example, suppose that there is a bug in which a pointer is leaked from an RCU read-side critical section, perhaps as follows:

	// Reader:
	p = NULL;
	rcu_read_lock();
	list_for_each_entry_rcu(p, list1_head, list1) {
		if (p->key = mykey)
			break;
	}
	if (p)
		do_something_with(p);
	rcu_read_unlock(); // At this point, *p is no longer protected.
	do_something();
	if (p)
		do_something_else_with(p); //BUG:  Leaked pointer!

Suppose further that the updater splices list1_head onto list2_head using list_splice_init_rcu() as follows:

	// Updater:
	list_del_rcu(&fp->list1);
	list_splice_init_rcu(list1_head, list2_head, synchronize_rcu);
Quick quiz 3: But isn't it rather unlikely that do_something() will consume enough time for the grace period to complete?
Answer

By the time that list_splice_init_rcu() returns, RCU readers might be legitimately traversing all of the elements on list2_head, including those recently on list1_head, thus preventing KCSAN from providing any useful information. However, there is a point within list_splice_init_rcu() where non-leaky RCU readers will not have access to the elements from list1_head. At this point, calls to ASSERT_EXCLUSIVE_ACCESS(), as shown below, will allow KCSAN to detect leaky RCU readers:

	static inline void __list_splice_init_rcu(struct list_head *list,
						  struct list_head *prev,
						  struct list_head *next,
						  void (*sync)(void))
	{
		struct list_head *first = list->next;
		struct list_head *last = list->prev;

		INIT_LIST_HEAD_RCU(list);
		sync();
		ASSERT_EXCLUSIVE_ACCESS(*first); // KCSAN, any off-CPU accesses?
		ASSERT_EXCLUSIVE_ACCESS(*last);
		last->next = next;
		rcu_assign_pointer(list_next_rcu(prev), first);
		first->prev = prev;
		next->prev = last;
	}

	static inline void list_splice_init_rcu(struct list_head *list,
						struct list_head *head,
						void (*sync)(void))
	{
		if (!list_empty(list))
			__list_splice_init_rcu(list, head, head->next, sync);
	}

KCSAN can also help when memory is freed into an emergency reserve that is used only under out-of-memory conditions. For example, consider this example:

	if (atomic_dec_and_test(&p->refcount)) {
		ASSERT_EXCLUSIVE_ACCESS(*p);  // KCSAN, any off-CPU accesses?
		do_some_cleanup(p);
		list_add(&p->freelist, &emergency_reserve);
	}
Quick quiz 4: Why not also apply KCSAN in the more common case where the list elements are instead freed via kfree()? After all, KCSAN should be able to directly detect the resulting use-after-free bug, shouldn't it?
Answer

The call to ASSERT_EXCLUSIVE_ACCESS() allows KCSAN to detect buggy concurrent accesses that failed to acquire the needed reference.

There is another set of patterns, where concurrent reads are allowed, but concurrent writes are forbidden; bugs in the implementation of these patterns man not be data races. Consider the case where we are only supposed to have a single writer, but multiple concurrent readers; to avoid data races, all these accesses must be marked. Concurrent marked writes, however, that are racing with the single writer are bugs. Unfortunately, due to being marked, they are no longer data races. Could KCSAN help us in this case?

For example, a common pattern involves a variable that is updated only when its lock is held, but that can be read locklessly. Given that there are lockless marked reads scattered hither and yon, it is all too easy to add a buggy lockless marked write, and such writes can be quite difficult to find, especially in cases involving locks held across deeply nested function calls. Of course, the lockdep_is_held() function can be extremely useful in these situations, but in the all-too-common case where the variable in question is a non-uniquely named field in a structure, it can be surprisingly hard to place calls to this function everywhere it is needed and nowhere that it is not.

Again, KCSAN can help:

	spin_lock(&my_lock);
	WRITE_ONCE(read_me_locklessly, foo);
	ASSERT_EXCLUSIVE_WRITER(read_me_locklessly); // KCSAN, lockless writes?
	spin_unlock(&my_lock);

Quick quiz 5: So KCSAN can unconditionally locate bugs stemming from unauthorized lockless writes?
Answer

Of course, ASSERT_EXCLUSIVE_WRITER() can also be used to detect off-CPU writes to per-CPU variables that are supposed to be read-only to other CPUs, as well as similar bugs that do not directly result in data races.

We are confident that continued experience with KCSAN will result in additional tricks for finding other types of concurrency bugs.

In addition, we expect that KCSAN will continue to gain more capabilities. One late-breaking example is a bit-granular variant of ASSERT_EXCLUSIVE_WRITER() called ASSERT_EXCLUSIVE_BITS(). This supports bitmask use cases where some bits are read-only or are supposed to be modified only under the protection of an exclusive lock, while others can be modified concurrently. Note again that a knee-jerk application of READ_ONCE() to silence KCSAN would be obscuring a real bug, which is most definitely not what we want.

Alternative approaches

An alternative data race detection approach for the kernel can be found in the Kernel Thread Sanitizer (KTSAN). KTSAN is a data-race detector that explicitly establishes which memory operations are ordered (this is also known as a "happens-before [PDF]" data-race detector [PDF]). This information can then be used to determine data races as defined by the LKMM.

To build a correct happens-before relation, KTSAN must be aware of all ordering rules of the LKMM and the synchronization primitives. Unfortunately, any omission leads to large numbers of false positives; this is especially detrimental in the context of the kernel, which includes numerous custom synchronization mechanisms. To track the happens-before relation, KTSAN's implementation requires metadata for each memory location (shadow memory), which corresponds to four extra pages of shadow memory for each page of kernel memory; that can translate into overhead of tens of GiB on a large system. In contrast, KCSAN's memory overhead is only a few MiB depending on configuration. Relying on the happens-before relation, however, allows KTSAN to detect more subtle bugs, such as missing memory barriers (implicit or otherwise). KCSAN is currently oblivious to any memory-ordering guarantees and simply assumes that memory barriers are placed correctly (and developers should therefore carefully consider the required memory-ordering requirements that remain unchecked).

Another approach relies on lockdep, which can sometimes point to data races due to failure to acquire the necessary locks. This, however, requires explicit lockdep annotations. KCSAN can detect the resulting data races without any added annotation, but on the other hand, for KCSAN to diagnose the problem, both ends of the data race must execute at about the same time. Therefore, lockdep can detect more bugs with less test time if the required annotations are in place, but KCSAN can detect those bugs without the annotations, albeit requiring more test time. These approaches should thus be used in concert, with lockdep annotations being added to code that KCSAN has shown is prone to being invoked without holding the required locks.

Quick quiz 6: Is there a place for a user-space CSAN?
Answer

All of these tools use dynamic techniques, which means that none of them will diagnose problems on code paths that are not reached by the test cases. On the other hand, experience to date indicates that dynamic analysis approaches (especially when paired with a coverage-guided fuzzer such as syzkaller) are more effective than static-analysis approaches at locating these classes of bugs in large code bases. For example, the LKMM tooling can more deterministically detect data races and other concurrency issues, but is limited to only a few tens of lines of code, six orders of magnitude short of what is required for the Linux kernel. Nevertheless, the LKMM is extremely useful at design time and for educational purposes.

Summary and known limitations

Clearly, bugs involving data races should fear a data-race detector, but KCSAN's ASSERT_EXCLUSIVE_WRITER() and ASSERT_EXCLUSIVE_ACCESS() macros provide artificial data races that can aid in the detection of other forms of race conditions. KCSAN-detectable concurrency bugs discussed earlier in this article include:

  • Failing to acquire needed locks.
  • Off-CPU accesses to CPU-private per-CPU variables.
  • Use-after-free bugs for non-heap memory.
  • Leaking pointers from RCU read-side critical sections.

These types of bugs can be quite challenging to find by testing or by manual code review. Therefore, KCSAN's automated code review should prove to be helpful.

Of course, KCSAN does have limitations:

  • As a runtime tool, KCSAN detects races only between accesses that execute on different CPUs at about the same time. (But note that a recent KCSAN patch allows checking for data races between accesses at process level and in interrupt handlers within the same CPU.)
  • Although KCSAN provides a number of configuration options, it is currently unable to analyze the compiled code's use of a loaded value. We expect KCSAN to become more capable over time, but features requiring changes to compilers will take some time to reach KCSAN's users.
  • Manual analysis is sometimes required to work out which subsystem caused a given KCSAN report. This is, of course, a limitation shared by all tools that display stack traces.
  • KCSAN does not understand much about your synchronization design, so deep knowledge of the code is required to respond meaningfully to KCSAN reports.

Despite these limitations, KCSAN has proven to be useful, locating a number of subtle and hard-to-spot concurrency bugs in the short time it has been available. We expect that its tireless and thorough automated reviews of concurrent code will be a valuable addition to the Linux kernel's storied 10,000 code-review eyes.

Answers to quick quizzes

Quick quiz 3: But isn't it rather unlikely that do_something() will consume enough time for the grace period to complete?

Answer: True enough. After all, KCSAN cannot help unless the data-racy accesses happen at about the same time. However, there are ways to help KCSAN help you:

  • Test on a CONFIG_PREEMPT=y kernel, so that the end of an un-nested rcu_read_unlock() implies a quiescent state.
  • Add test code to rcu_read_unlock() that infrequently adds random delays when not within another RCU read-side critical section.
  • For testing, pass synchronize_rcu_expedited() to list_splice_init_rcu() instead of synchronize_rcu().

Back to quick quiz 3.

Quick quiz 4: Why not also apply KCSAN in the more common case where the list elements are instead freed via kfree()? After all, KCSAN should be able to directly detect the resulting use-after-free bug, shouldn't it?

Answer: Yes, it might, but KASAN is normally a better choice than KCSAN for use-after-free bugs. However, finding racy use-after-free bugs with KASAN can be quite challenging, in part because KASAN has no way to know that the bug was in fact due to a race condition rather than a deterministic use-after-free situation. In addition, KASAN cannot help when the memory is not free, but instead placed into a private pool as in the earlier example. However, KCSAN can still help if you invoke ASSERT_EXCLUSIVE_ACCESS() just after removing others' access to the memory.

In short, we recommend starting with KASAN when tracking use-after-free bugs, but looking to KCSAN in those rare cases where KASAN is flummoxed due to race conditions or private memory pools.

Back to quick quiz 4.

Quick quiz 5: So KCSAN can unconditionally locate bugs stemming from unauthorized lockless writes?

Answer: We are sorry, but, although we believe that KCSAN will prove to be an extremely valuable tool, it is not magic.

KCSAN is a runtime tool, which means that it will not detect bugs in code paths that are not actually executed. Furthermore, the unauthorized-write bug must execute reasonably close to the time that the ASSERT_EXCLUSIVE_WRITER() is executed, and that write must occur on some other CPU. Additionally, KCSAN does impose significant overhead, and that overhead might well turn your bug into a heisenbug.

Again, KCSAN is not magic. If it is magic that you are looking for, we refer you to any number of works of fiction, with "Harry Potter", "A Wizard of Earthsea", and the television series "Grimm" coming immediately to mind. And sometimes escaping into fiction for awhile is a good way to free your mind (oops, no, that is "The Matrix") to locate the bug. But in our experience, a good night's sleep usually works even better.

Back to quick quiz 5.

Quick quiz 6: Is there a place for a user-space CSAN?

Answer: User space has different constraints than the kernel. For user-space C and C++ programs, we can generally rely on a limited set of synchronization primitives (pthreads) and adherence to the standard memory models (C11, C++11). As such, ThreadSanitizer (TSAN) is a more complete solution in user space that is already well-established. A user-space variant of CSAN would therefore not add much value over TSAN. Note that, while KCSAN as implemented in the Linux kernel provides additional facilities to express properties where bugs won't manifest as data races (ASSERT_EXCLUSIVE macros), fundamentally TSAN could provide similar facilities for user space. In other contexts, where TSAN is not yet implemented, and constraints are closer to the Linux kernel, such as firmware, or specialized OSes, CSAN would be an attractive alternative.

Back to quick quiz 6.

Acknowledgments

We would like to thank everyone who has given feedback, comments, or otherwise participated in the work discussed in this article. Some notable discussions and feedback resulted from patches to address data races found by KCSAN: in particular, we would like to thank Eric Dumazet and Qian Cai for addressing numerous data races and their continued feedback, Linus Torvalds, Ingo Molnar, and Herbert Xu for their helpful and critical feedback. We are very grateful to Blake Matheny for their support of this effort.

Comments (none posted)

A new parser for CPython

By Jake Edge
April 9, 2020

A new parser for the CPython implementation of the Python language has been in the works for a while, but the announcement of a Python Enhancement Proposal (PEP) for it indicates that we may see it fairly soon. The intent is to add the parser, and make it the default for Python 3.9, which is due in October. If that plan holds, the current parser will not be going away for another year or so after that. The change should go completely unnoticed within the community; the benefits are mainly for the CPython core developers in the form of easier maintenance.

Parsing expression grammar

In early April on the python-dev mailing list, Guido van Rossum announced that he, Pablo Galindo Salgado, and Lysandros Nikolaou had written PEP 617 ("New PEG parser for CPython"). It proposes replacing the existing LL(1) parser with a new one based on a parsing expression grammar (PEG). The existing parser is not fully LL(1), and the new one would simplify things:

This new parser would allow the elimination of multiple "hacks" that exist in the current grammar to circumvent the LL(1)-limitation. It would substantially reduce the maintenance costs in some areas related to the compiling pipeline such as the grammar, the parser and the AST [abstract syntax tree] generation.

Starting in July 2019, Van Rossum put out a series of blog posts that give a nice practical overview of PEG parsers and how to implement them—in Python, naturally. The job of a parser is to take input in a specific language and turn it into an abstract syntax tree, which is used by later passes of the compiler to generate code. In the introductory post of the series, Van Rossum described some of the shortcomings of the existing CPython parser in the context of how a PEG parser can do better. In part, it comes down to "lookahead"; an LL(1) parser can only look at a single extra token to make its parsing decisions, but that is not sufficient to distinguish all cases in Python. So the existing parser kicks the can down the road by accepting some illegal constructs that are caught as syntax errors in a later pass.

There are some other problems that he points out, which a PEG parser can avoid:

So how does a PEG parser solve these annoyances? By using an infinite lookahead buffer! The typical implementation of a PEG parser uses something called “packrat parsing”, which not only loads the entire program in memory before parsing it, but also allows the parser to backtrack arbitrarily. While the term PEG primarily refers to the grammar notation, the parsers generated from PEG grammars are typically recursive-descent parsers with unlimited backtracking, and packrat parsing makes this efficient by memoizing the rules already matched for each position.

There is, of course, a cost for infinite lookahead, in the form of increased memory usage. But the PEP notes that the performance of the new parser is within 10% of that the old, both in terms of speed and memory. While PEG with packrat parsing consumes more memory, the new parser does not create a concrete syntax tree, as the existing parser does; instead, it directly creates the abstract syntax tree, which makes up for most of the memory consumed by the new techniques.

One of the areas where PEG/packrat shines, especially in comparison to LL(1), is in handling left recursion in the language grammar specification. As the PEP puts it:

[...] a limitation of LL(1) grammars is that they cannot allow left-recursion. This makes writing some rules very unnatural and far from how programmers normally think about the program. For instance this construct (a simpler variation of several rules present in the current grammar):
    expr: expr '+' term | term
cannot be parsed by an LL(1) parser. The traditional remedy is to rewrite the grammar to circumvent the problem:
    expr: term ('+' term)*

Beyond just making the language specification less understandable, the lack of left recursion support also loses the associativity that naturally comes from expressing things that way. It flattens the parse tree such that further work needs to be done: "Being forced to write the second rule not only leads to the parse tree not correctly reflecting the desired associativity, but also imposes further pressure on later compilation stages to detect and post-process these cases."

Another difference between an LL(1) and a PEG parser is in the handling of the "|" (or) operator. Since an LL(1) parser can only look ahead one token, rules like the following:

    rule: A | B
may be disallowed as ambiguous if both A and B can start with the same token. Without looking further ahead, the parser cannot decide which to choose. "The rule may be transformed to equivalent LL(1) rules, but then it may be harder for a human reader to grasp its meaning." PEG parsers will only try B if A fails, so it treats the | like a short-circuiting or operator.

Moving quickly

Van Rossum said that the PEP authors are "hoping for a speedy resolution" on it, so that the code can land before Python 3.9-beta1, which is currently scheduled for mid-May. That is the last point where new features can be added to a major release, so if the new parser is to be the default for 3.9, the switch must happen by that point. The intent is that the existing parser will still be available with the -p old command-line switch until Python 3.10, presumably coming in late 2021.

The reception overall has been positive, which is not really surprising, since there are apparently few, if any, downsides. Paul Moore asked if there were user-visible changes, but Van Rossum said that "the intention is that the returned AST is identical in each case". He did note that there had been some small changes to the old parser mostly to fix some "bugs related to line/column numbers".

Victor Stinner also wondered about projects that rely on the abstract syntax tree. Galindo Salgado replied that projects that use the ast module in the standard library "will not need to do anything special". But Van Rossum cautioned that might be overstating things; if the code works with the in-progress Python 3.9 bug fixes to the parser, it should be fine, but it is possible the tweaks made might mean that some code "could require (small) adjustments". In addition, the error reporting from the new parser is somewhat different, the effects of which have not really been explored yet.

Nathaniel Smith suggested running both parsers and confirming that the same AST is being generated during the alpha and beta periods. Van Rossum liked the idea and filed a bug to track it. He also noted that they have been using Python Package Index (PyPI) as a source of additional code to test the new parser with.

As described in the PEP, there are actually three parts to the new parser. The lowest layer is a "PEG meta-grammar that automatically generates a Python parser that is used for the parser generator itself". The next layer is that parser generator, which reads a grammar file, written in an extended Backus-Naur form (EBNF) dialect Van Rossum created for Python 30 years ago, to produce a PEG parser (in either C or Python) to parse the input grammar. And, finally, the actual parser generated, which can produce AST objects for both C (for use in CPython) and and Python (for the ast module); it will be incorporated into the Python executable. In that way, no hand-written parsers are being used. The meta-grammar is fully specified in the PEP; the idea behind it is well-described in another post from Van Rossum's PEG series.

The migration plan from the PEP makes it clear that the grammar for Python will remain LL(1) until the existing parser is removed in the Python 3.10 time frame. After that, though, the implication is that the grammar could move away from LL(1); that has impacts for other Python implementations (and their parsers). Fabio Zadrozny wondered about those impacts. Van Rossum said that other implementations may need to change if and when the language changes to take advantage of "the freedom PEG gives us", but that does not necessarily mean they will need to switch to a PEG parser. "I'm sure there will be other ways to parse the same language." He did mention a new Python feature that the PEG parser would facilitate:

For example, I've been toying with the idea of introducing a "match" statement similar to Scala's match expression by making "match" a keyword only when followed by an expression and a colon.

There was some discussion of whether context-sensitive keywords made sense for Python. There was a point in time where the community believed that all keywords should be reserved everywhere (so they could not be used as variable names, for example), but Van Rossum thinks that time may have passed. In particular, the addition of async and await could have been done using context-sensitive keywords, but was not, which may have been a mistake:

We went through the same thing with `async` and `await`, and the experience there was worse: a lot of libraries in the very space `async def` was aimed at were using `async` as a parameter name, often in APIs, and they had to scramble to redesign their APIs and get their users to change their programs.

In retrospect I wish we had just kept `async` as a context-sensitive keyword, since it was totally doable.

One would guess the steering council would take up the PEP soon, either to pronounce on it directly or to appoint a delegate to do so. There does not seem to be much in the way of opposition, nor any real reason to delay. Unless problems of that sort arise, we can probably expect the new parser in roughly six months—less for those testing beta versions.

Comments (34 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: Guix 1.1; Leap & SUSE Linux Enterprise; KDE & Qt; Zimbra; Quotes; ...
  • Announcements: Newsletters; conferences; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds