LWN.net Weekly Edition for December 2, 2021

Welcome to the LWN.net Weekly Edition for December 2, 2021

This edition contains the following feature content:

Fedora revisits the Git-forge debate: are proprietary Git forges appropriate for Fedora development?
Python identifiers, PEP 8, and consistency: an initiative to regularize identifiers in the Python library.
What to do in response to a kernel warning: what should happen when things go wrong in the kernel?
In search of an appropriate RLIMIT_MEMLOCK default: should resource limits come with reasonable defaults and, if so, what should they be?
A different approach to BPF loops: a new workaround for difficulties with loops in BPF programs.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Fedora revisits the Git-forge debate

By Jake Edge
December 1, 2021

A seemingly straightforward question aimed at candidates for the in-progress Fedora elections led to a discussion on the Fedora devel mailing list that branched into a few different directions. The question was related to a struggle that the distribution has had before: whether using non-free Git forges is appropriate. One of the differences this time, though, is that the focus is on where source-git (or src-git) repositories will be hosted, which is a separate question from where the dist-git repository lives.

Background

The dist-git repository is the place where the distribution does most of its work to create and test packages, as described by Tomáš Tomeček back in May 2020. It works reasonably well for that, especially for those who are familiar with using it, but it is quite different than the Git repositories used by the upstream projects. It also has some seriously rough edges:

For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.

A blog post by Tomeček that introduces source-git gives a bit more information about dist-git. Each Fedora package has a dist-git repository that provides everything needed to create the binary RPM that gets shipped to users. That includes the RPM spec, source code, Fedora-specific patches, tests, and so on.

The dist-git repository has been (and still is) hosted on the Fedora Pagure code-hosting system, but maintaining Pagure had become something of a headache for the Red Hat Community Platform Engineering (CPE) team that manages the infrastructure for Fedora. Back in January 2020, that led to a gathering of requirements for what was needed in a Git-forge solution that would serve the needs of all of the distributions that Red Hat manages (Fedora, CentOS, and RHEL). The process ended acrimoniously after the botched announcement that CPE had chosen GitLab in what many in the Fedora community saw as a fait accompli. Roughly a year and a half later, a hosted GitLab service for Fedora was announced, which included some of the proprietary "ultimate tier" features, but Pagure would still be sticking around for the time being:

I know some of you are wondering what this means for Pagure.io and dist-git. The Community Platform Engineering (CPE) team will continue to run them in a supported mode as we do now. With the availability of GitLab, CPE and the Source Git SIG will continue to explore the FESCo [Fedora Engineering Steering Committee] feedback on what could make dist-git viable on GitLab in the future. We will also look for ways to integrate Gitlab into more of our tooling to provide options for the community. Right now though, this just adds another option for you to use. If you want to keep your work on Pagure, then by all means do.

The source-git effort (which is part of the Packit project) is a mechanism to make Git repositories for Fedora packages that mirror their upstream counterparts, to the point where the upstream developers will feel completely at home. It turns out that many Fedora packages were already being created that way, but in an ad hoc fashion. So source-git adds tooling to use the distribution-specific patches and build configuration (e.g. RPM spec files) to create a source RPM that can be used in dist-git to create the binary RPM for users. Currently, source-git is integrated with GitLab and GitHub, but not Pagure.

Question

It is against that backdrop Michael Catanzaro asked the candidates for FESCo and the Fedora Council about their thoughts on hosting source-git repositories: "do you support allowing Fedora src-git repositories to be hosted on gitlab.com, which a proprietary software git forge?" He noted that there are at least two options for an open-source solution, Pagure and the open-source version of GitLab, so he wanted to know where the candidates stood on that question. It turns out that the question is not entirely straightforward because it stems from something of a misunderstanding of the purposes of the source-git project—and how it interoperates with dist-git. It was also relatively easy to misunderstand what Catanzaro was actually asking, so the still unresolved question of where dist-git will be hosted was also re-litigated to a certain extent.

For example, FESCo candidate Fabio Valentini said that he was opposed to moving away from Pagure to "a proprietary solution from a vendor with an 'open core' business model", in part because it "sends the wrong message to the FOSS community". But he quickly realized that he had misunderstood the question; he has some concerns about source-git working with less popular forges, but is not completely opposed to hosting those repositories on GitHub and GitLab.

Fedora project leader Matthew Miller, who is not one of the candidates, took issue with the idea that Fedora might be sending the wrong message with its forge choices. There is a need to be pragmatic on certain choices (e.g. binary firmware) so that people can actually use the distribution on their machines, he said. Meanwhile, much of the open-source community has already chosen to use the Git* forges, so it makes sense to meet them where they "live":

More than that, though, if what we do sends a message... the message we are sending right now isn't working. People _aren't_ showing up in droves to help build Pagure. People aren't even showing up in significant numbers to _use_ Pagure outside of the Fedora space. And, again, that's not because Pagure isn't good — it is, with a great fundamental design where it's git all the way down. But our use of it hasn't changed the world, I don't see that improving. It doesn't _actually_ advance our mission and vision at all for us to be symbolically right if it doesn't change people's behavior, and... it doesn't.
Whenever this topic has come up, I've see lots of this, except people don't say the first part out loud: "Of course _I_ use GitHub for most of my stuff, because of all of the advantages — but Fedora, Fedora should never." We can't let ourselves be held by that.
I don't think Gitlab open core is ideal. But I think it's closer than GitHub. And I deeply believe that free software and real open source is _just plain better_ as a model, and I think they'll eventually realize that too. I think we could have a LOT more impact working WITH GitLab to move towards an all-open model than we will continuing on the current path.

In a response to Catanzaro's question, Miller made it clear that he believes that the answer to that "_has_ to be 'yes', and not just for GitLab but also GitHub". If the source-git repositories are going to be close to the upstream repositories, they need to be able to be on the same places that host those projects.

Beyond that, he segued into the question of dist-git as well. He noted the long list of open Pagure issues and said that the project is in an "emergency maintenance-only state". He pointed out that historically Fedora has been unable to set up and run an open-source GitLab instance, so that is not really a viable alternative either. He would much rather there be an open-source option, but does not really see one, though he hopes that GitLab could perhaps become that option some day. It is a matter of resource allocation, he said, and ended his message with a list of multiple other projects he would rather see the project focus on.

Catanzaro was somewhat surprised by that response, in part because he did not quite see how source-git and dist-git would work together.

src-git is exclusively used for downstream Fedora packaging, so I don't expect upstreams to be interested at all, unless upstream developers are also the Fedora packagers, right? I'm also assuming that direct commits to dist-git will be blocked if src-git is enabled -- because otherwise how would we keep them in sync and avoid breaking src-git? -- and the only way to commit to dist-git would be via src-git. Is that right too?

Miller said that there are quite a few upstream developers who are also Fedora packagers, so they will likely be interested in these source-git repositories. In addition, the Packit Service is a component that will keep the two repositories in sync so that contributors can use whichever of the two workflows they are most comfortable with. As David Cantrell put it: "No one is required to use it. If you are comfortable with using dist-git as you do now, that's fine."

Packit developer Hunor Csomortáni cautioned that source-git is still an experimental tool, though there is a goal to add it as an official option for Fedora packages:

All the work that is ongoing in this space is experimentation and custom tooling developed to support select packages. Yes, we (the Packit team) have a goal to propose to the Community to adopt the workflow as an alternative way of doing packaging work and allow source-git repositories to be hosted within Fedora realms, but that proposal has not been written yet.

For now, the question is only about where these experimental repositories can be hosted, but if source-git gets adopted as an option, it would be preferable to have both source-git and dist-git in the same forge, Csomortáni said. Currently there are "mixed signals" about where that would be, however. It would make sense to combine the repositories on a single forge for a few different reasons:

To have a better, less fragmented developer experience, lower maintenance and administration costs, and to ensure Community control over them, I think it would make sense to host official source-git repositories next to dist-git repos, in a sibling namespace, and be served by the same Git-forge.

Catanzaro was unconvinced by Miller's assertion that Fedora could not run its own open-source GitLab instance. "If GNOME and KDE and freedesktop.org and Debian and Purism can all do it, I'm pretty sure Fedora can too." Miller said that he would love to be proved wrong, but Stephen John Smoogen posted a message describing the problems that have cropped up when trying do so in the past. Those efforts have run aground on the same kinds of problems that plague all projects: "Getting an open source gitlab is not impossible. It just takes a lot of free time, systems and work done somewhere to make it happen."

No real conclusions came out of the discussion, but it would seem that at least the ideas behind source-git and its goals will be better understood. There is still the lurking problem of dist-git moving to the proprietary GitLab instance that CPE has set up; it is likely that the other open-source alternatives (Pagure or open-source GitLab) are not going to make the cut. That is understandably frustrating for many free and open-source software advocates in Fedora (and elsewhere), but also may be the least-bad option at this point. There are other important goals that the distribution has, in the eyes of some anyway, so putting energy into those things may be a better way for the project to advance—time will tell.

Comments (20 posted)

Python identifiers, PEP 8, and consistency

By Jake Edge
November 30, 2021

While there are few rules on the names of variables, classes, functions, and so on (i.e. identifiers) in the Python language, there are some guidelines on how those things should be named. But, of course, those guidelines were not always followed in the standard library, especially in the early years of the project. A suggestion to add aliases to the standard library for identifiers that do not follow the guidelines seems highly unlikely to go anywhere, but it led to an interesting discussion on the python-ideas mailing list.

To a first approximation, a Python identifier can be any sequence of Unicode code points that correspond to characters, but they cannot start with a numeral nor be the same as one of the 35 reserved keywords. That leaves a lot of room for expressiveness (and some confusion) in those names. There is, however, PEP 8 ("Style Guide for Python Code") that has some naming conventions for identifiers, but the PEP contains a caveat: "The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent".

But consistency is just what Matt del Valle was after when he proposed making aliases for identifiers in the standard library that do not conform to the PEP 8 conventions. The idea cropped up after reading the documentation for the threading module in the standard library, which has a note near the top about deprecating the camel-case function names in the module for others that are in keeping with the guidelines in PEP 8. The camel-case names are still present, but were deprecated in Python 3.10 in favor of names that are lower case, sometimes with underscores (e.g. threading.current_thread() instead of threading.currentThread()).

The PEP

PEP 8 suggests that function names "should be lowercase, with words separated by underscores as necessary to improve readability", which is what the changes for threading do. In addition, the PEP says that names for variables, methods, and arguments should follow the function convention, while types and classes should use camel case (as defined by the PEP, which includes an initial capital letter, unlike other camel-case definitions out there). Del Valle calls that form of capitalization "PascalCase" and noted that there are various inconsistencies in capitalization in the standard library:

I realize that large chunks of the stdlib predates pep8 and therefore use various non-uniform conventions. For example, the logging module is fully camelCased, and many core types like `str` and `list` don't use PascalCase as pep8 recommends. The `collections` module is a veritable mosaic of casing conventions, with some types like `deque` and `namedtuple` being fully lowercased while others like `Counter` and `ChainMap` are PascalCased.

Given the precedent in threading, he wondered if it would be feasible to "add aliases across the board for all public-facing stdlib types and functions that don't follow pep8-recommended casing". The "wart" of inconsistent naming conventions in his code bothers him, perhaps more than it should, he said, but he thought others might feel similarly, which could perhaps lead to the problem being solved rather than endured. Beyond that, though, it makes it somewhat more difficult to teach good practices in the language:

I always try to cover pep8 very early to discourage people I'm training from internalizing bad habits, and it means you have to explain that the very standard library itself contains style violations that would get flagged in most modern code reviews, and that they just have to keep in mind that despite the fact that the core language does it, they should not.

Reactions

Overall, the reception was rather chilly, though not universally so. The commenters generally acknowledged that there are some unfortunate inconsistencies, but the pain of making a change like what he proposed is too high for the value it would provide. Eric V. Smith put it this way:

The cost of having two ways to name things for the indefinite future is too high. Not only would you have to maintain it in the various Python implementations, you'd have to explain why code uses "str" or "Str", or both.

Among Del Valle's suggested changes were aliasing the "type functions" to their PascalCase equivalents (e.g. str() to Str()), as Smith mentions. But that would be a fundamental change with no real upside and a high cost, Smith said. Mike Miller agreed with that, but wondered if there might be some middle ground, noting some common confusion with the datetime module:

One of my biggest peeves is this:
    import datetime # or
    from datetime import datetime
Which is often confusing... is that the datetime module or the class someone chose at random in this module? A minor thorn that… just doesn't go away.

Neil Girdhar also thought that changing str() and friends was "way too ambitious. But some minor cleanup might not be so pernicious?" On the other hand, Jelle Zijlstra brought some first-hand experience with changes of this sort to the discussion. He had worked on explicitly deprecating (i.e. with DeprecationWarning) some of the camel-case identifiers in the threading module; "in retrospect I don't feel like that was a very useful contribution. It just introduces churn to a bunch of codebases and makes it harder to write multiversion code."

Chris Angelico had a number of objections to Del Valle's ideas, but existing code that already reuses the names of some of the identifiers is particularly problematic:

Absolutely no value in adding aliases for everything, especially things that can be shadowed. It's not hugely common, but suppose that you deliberately shadow the name "list" in your project - now the List alias has become disconnected from it, unless you explicitly shadow that one as well. Conversely, a much more common practice is to actually use the capitalized version as a variant:
class List(list):
    ...
This would now be shadowing just one, but not the other, of the built-ins. Confusion would abound.

Angelico, along with others in the thread, pointed to the first section of PEP 8, which is titled "A Foolish Consistency is the Hobgoblin of Little Minds" (from the Ralph Waldo Emerson quote). That section makes it clear that the PEP is meant as a guide; consistency is most important at the function and module level, with project-level consistency being next in line. Any of those is more important than rigidly following the guidelines. As Angelico put it: "When a style guide becomes a boat anchor, it's not doing its job."

Paul Moore had a more fundamental objection to aliasing the type functions, noting that the PEP does not actually offer clear-cut guidance. He quoted from the "Naming Conventions" section and showed how it led to ambiguity:

"""
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.
"""

To examine some specific cases, lists are a type, but list(...) is a function for constructing lists. The function-style usage is far more common than the use of list as a type name (possibly depending on how much of a static typing advocate you are...). So "list" should be lower case by that logic, and therefore according to PEP 8. And str() is a function for getting the string representation of an object as well as being a type - so should it be "str" or "Str"? That's at best a judgement call (usage is probably more evenly divided in this case), but PEP 8 supports both choices. Or to put it another way, "uniform" casing is a myth, if you read PEP 8 properly.

But there are tools, such as the flake8 linter, that try to rigidly apply the PEP 8 "rules" to a code base; some projects enforce the use of these tools before commits can be made. But linters cannot really determine the intent of the programmer, so they are inflexible and are probably not truly appropriate as an enforcement mechanism. Moore said:

Unfortunately, this usually (in my experience) comes about through a "slippery slope" of people saying that mandating a linter will stop endless debates over style preferences, as we'll just be able to say "did the linter pass?" and move on. This of course ignores the fact that (again, in my experience) far *more* time is wasted complaining about linter rules than was ever lost over arguments about style :-(

Changes

Del Valle acknowledged that "some awkward shadowing edge-cases are the strongest argument against this proposal", but Angelico disagreed. "The strongest argument is churn - lots and lots of changes for zero benefit.". Del Valle recognized that the winds were strongly blowing against the sweeping changes he had suggested, but in the hopes of "salvaging *something* out of it" he reduced the scope substantially: "Add pep8-compliant aliases for camelCased public-facing names in the stdlib (such as logging and unittest) in a similar manner as was done with threading"

While Ethan Furman was in favor of such a change, others who had also mentioned the inconsistencies in unittest and logging did not follow suit. Most who replied to Furman recommended switching to pytest instead of unittest for testing, though alternatives to logging were not really on offer.

Guido van Rossum had a succinct response to the idea: "One thought: No." That essentially put the kibosh on it (not formally, of course, but Van Rossum's opinion carries a fair amount of weight), so Del Valle withdrew it entirely. It is clear there was no groundswell of support for it, even in more limited guises, but the discussion touched on various aspects of the language and its history. It seems clear that if Python had been developed in one fell swoop, rather than being added to in a piecemeal fashion over decades, different choices would have been made. More (or even fully) consistent identifiers within the project's code base may well have been part of that.

But, at this point, it is far too late for a retrofit, at least for many; even if everyone agreed on how to change things, the upheaval, code churn, and dual-naming would be messy. And the gain, while not zero, is not huge. Beyond that, the day when the inconsistent names could actually be removed is extremely distant—likely never, in truth. So users and teachers of the language will need to keep in mind some semi-strange inconsistencies in the darker corners, warts, which exist in all programming (and other) languages. Humans are not consistent beasts, after all.

Comments (12 posted)

What to do in response to a kernel warning

By Jonathan Corbet
November 18, 2021

The kernel provides a number of macros internally to allow code to generate warnings when something goes wrong. It does not, however, provide a lot of guidance regarding what should happen when a warning is issued. Alexander Popov recently posted a patch series adding an option for the system's response to warnings; that series seems unlikely to be applied in anything close to its current form, but it did succeed in provoking a discussion on how warnings should be handled.

Warnings are emitted with macros like WARN() and WARN_ON_ONCE(). By default, the warning text is emitted to the kernel log and execution continues as if the warning had not happened. There is a sysctl knob (kernel/panic_on_warn) that will, instead, cause the system to panic whenever a warning is issued, but there is a lack of options for system administrators between ignoring the problem and bringing the system to a complete halt.

Popov's patch set adds another option in the form of the kernel/pkill_on_warn knob. If set to a non-zero value, this parameter instructs the kernel to kill all threads of whatever process is running whenever a warning happens. This behavior increases the safety and security of the system over doing nothing, Popov said, while not being as disruptive as killing the system outright. It may kill processes trying to exploit the system and, in general, prevent a process from running in a context where something is known to have gone wrong.

There were a few objections to this option, starting with Linus Torvalds, who pointed out that the process that is running when a warning is issued may not have anything to do with the warning itself. The problem could have happened in an interrupt handler, for example, or in a number of other contexts. "Sending a signal to a random process is just voodoo programming, and as likely to cause other very odd failures as anything else", he said.

Torvalds suggested that a better approach might be to create a new /proc file that will provide information when a system-tainting event (such as a warning) happens. A user-space daemon could poll that file, read the relevant information when a warning is issued, then set about killing processes itself if that seems like the right thing to do. Marco Elver added that there is a tracepoint that could provide the relevant information with just a bit of work. Kees Cook threw together an implementation, but Popov didn't like it; that approach would allow a process to continue executing after the warning happens, he said, and by the time user space gets around to doing something about the situation, it may be too late.

James Bottomley argued that all of the approaches discussed so far were incorrect. If a warning happens, he said, the kernel is no longer in a known state, and anything could happen:

What WARN means is that an unexpected condition occurred which means the kernel itself is in an unknown state. You can't recover from that by killing and restarting random stuff, you have to reinitialize to a known state (i.e. reset the system). Some of the reason we do WARN instead of BUG is that we believe the state contamination is limited and if you're careful the system can continue in a degraded state if the user wants to accept the risk.

Thus, he said, the only rational policies are to continue (accepting the risk that bad things may happen) or kill the system and start over — the options that the kernel provides now.

Popov had suggested that the ELISA project, which is working toward Linux deployments in safety-critical applications, might support the addition of pkill_on_warning. But Lukas Bulwahn, who works on the project (but who was careful to say he doesn't speak for ELISA), disagreed. The right solution, he said, is to kill the system on warnings, but also to ensure that warnings are only issued in situations where things have truly gone off the rails:

Warnings should only be raised when something is not configured as the developers expect it or the kernel is put into a state that generally is _unexpected_ and has been exposed little to the critical thought of the developer, to testing efforts and use in other systems in the wild. Warnings should not be used for something informative, which still allows the kernel to continue running in a proper way in a generally expected environment.

He added that being truly safe also requires ensuring that a call to panic() will really stop the system in all situations — something that is not as easy to demonstrate as one might think. A panic() call might hang trying to acquire a lock, for example.

Christoph Leroy said that warnings should be handled within the kernel so that the system can keep running as well as it can. Given that, he continued, "pkill_on_warning seems dangerous and unrelevant, probably more dangerous than doing nothing, especially as the WARN may trigger for a reason which has nothing to do with the running thread". Popov, however, disagreed with the idea that one can expect all warnings to be handled properly within the kernel:

There is a very strong push against adding BUG*() to the kernel source code. So there are a lot of cases when WARN*() is used for severe problems because kernel developers just don't have other options.

Indeed, his patch would, when the new option is enabled, have warnings behave in almost the same way as BUG() calls, which bring about the immediate end of the running process by default. As he noted, developers run into resistance when they try to add those calls because their effect is seen as being too severe.

It's not clear that adding an option to make warnings more severe as well is the best solution to the problem. A good outcome, in the form of some movement toward better-defined notion of just what a warning means and what should happen when one is generated, could yet result from this discussion, though. Like many mechanisms in the kernel, the warning macros just sort of grew in place without any sort of overall design. Engaging in a bit of design now that there is a lot of experience with how developers actually use warnings might lead to a more robust kernel overall.

Comments (23 posted)

In search of an appropriate RLIMIT_MEMLOCK default

By Jonathan Corbet
November 19, 2021

One does not normally expect a lot of disagreement over a 13-line patch that effectively tweaks a single line of code. Occasionally, though, such a patch can expose a disagreement over how the behavior of the kernel should be managed. This patch from Drew DeVault, who is evidently taking a break from stirring up the npm community, is a case in point. It brings to light the question of how the kernel community should pick default values for configurable parameters like resource limits.

The kernel implements a set of resource limits applied to each (unprivileged) running process; they regulate how much CPU time a process can use, how many files it can have open, and more. The setrlimit() man page documents the full set. Of interest here is RLIMIT_MEMLOCK, which places a limit on how much memory a process can lock into RAM. Its default value is 64KB; the system administrator can raise it, but unprivileged processes cannot.

Once upon a time, locking memory was a privileged operation. The ability to prevent memory from being swapped out can present resource-management problems for the kernel; if too much memory is locked, there will not be enough left for the rest of the system to function normally. The widespread use of cryptographic utilities like GnuPG eventually led to this feature being made available to all processes, though. By locking memory containing sensitive data (keys and passphrases, for example), GnuPG can prevent that data from being written to swap devices or core-dump files. To enable this extra security, the kernel community opened up the mlock() system call to all users, but set the limit for the number of pages that can be locked to a relatively low value.

Uses of memory change over time. GnuPG does not really need more locked memory than it did years ago, but there are now other ways that users can run into the locked-memory limit. BPF programs, for example, are stored in unswappable kernel memory, with the space used being charged against this limit. These programs tend to be relatively small, but 64KB is likely to be constraining for many users. The big new consumer of locked memory, though, is io_uring.

Whenever the kernel sets up a user-space buffer for I/O, that buffer must be locked into memory for the duration of the operation. This locking is a short-lived affair and is not charged against the user's limit. There is, however, quite a bit of work involved in setting up an I/O buffer and locking it in memory; if that buffer is used for frequent I/O operations, the setup and teardown costs can reach a point where they slow the application measurably. As a way of eliminating this cost, the io_uring subsystem allows users to "register" their buffers; that operation sets up the buffers for I/O and leaves them in place where they can be used repeatedly.

I/O buffers can be large, so locking them into memory can consume significant amounts of RAM; it thus makes sense that a limit on how much memory can be locked in this way should be imposed. So, when buffers are registered, the kernel charges them against the same locked-memory limit. This is where the 64KB limit becomes truly constraining; to make the use of io_uring worthwhile, one almost certainly wants to use much larger buffers than will fit in that space. The 64KB default limit, as a result, has the potential to make io_uring unavailable to users unless it is increased by distributors or administrators — and that tends not to happen.

To avoid this problem, DeVault would like to raise that limit to 8MB. Expecting the problem to be addressed elsewhere, he said, is not realistic:

The buck, as it were, stops with the kernel. It's much easier to address it here than it is to bring it to hundreds of distributions, and it can only realistically be relied upon to be high-enough by end-user software if it is more-or-less ubiquitous.

Matthew Wilcox pointed out that there are plenty of other ways for a malicious user to lock down at least 8MB of memory, so he saw no added danger from this change, but with a couple of reservations. Perhaps it would be better to somehow scale the limit, he said, so that it would be smaller on machines with small amounts of memory. He also wondered if 8MB was the right value for the new limit, or whether io_uring users would need still more. Jens Axboe, the maintainer of io_uring, replied that "8MB is plenty for most casual use cases", and those are the cases that should "just work" without the need for administrator intervention.

Andrew Morton, though, was not convinced about this value — or any other:

We're never going to get this right, are we? The only person who can decide on a system's appropriate setting is the operator of that system. Haphazardly increasing the limit every few years mainly reduces incentive for people to get this right.

DeVault answered that "perfect is the enemy of good", and that he lacked the time to try to convince all of the distributors to configure a more realistic default. Morton's further suggestion that the limit should have been set to zero from the beginning to force a solution in user space was not received well. And that, more or less, is where the conversation wound down.

One line of thought here seems to be that the kernel community should not try to come up with usable defaults for parameters like RLIMIT_MEMLOCK; that will force downstream distributors to think about what their users need and configure things accordingly. But that seems like a recipe for the status quo, where a useful new feature is, in fact, not useful on most systems. Putting some thought into reasonable default values is something one normally expects from a software project; it's not clear why the kernel would be different in this regard. So this change will, in all likelihood, eventually find its way in, but perhaps not until the emails-to-lines-changed ratio becomes even higher.

Comments (37 posted)

A different approach to BPF loops

By Jonathan Corbet
November 29, 2021

One of the key features of the extended BPF virtual machine is the verifier built into the kernel that ensures that all BPF programs are safe to run. BPF developers often see the verifier as a bit of a mixed blessing, though; while it can catch a lot of problems before they happen, it can also be hard to please. Comparisons with a well-meaning but rule-bound and picky bureaucracy would not be entirely misplaced. The bpf_loop() proposal from Joanne Koong is an attempt to make pleasing the BPF bureaucrats a bit easier for one type of loop construct.

To do its job, the verifier must simulate the execution of each BPF program loaded into the kernel. It makes sure that the program does not reference memory that should not be available to it, that it doesn't leak kernel memory to user space, and many other things — including that the program will actually terminate and not lock the kernel into an infinite loop. Proving that a program will terminate is, as any survivor of an algorithms class can attest, a difficult problem; indeed, it is impossible in the general case. So the BPF verifier has had to find ways to simplify the problem.

Initially, "simplifying the problem" meant forbidding loops altogether; when a program can only execute in a straight-through manner, with no backward jumps, it's clear that the program must terminate in finite time. Needless to say, BPF developers found this rule to be a bit constraining. To an extent, loops can be simulated by manually unrolling them, but that is tiresome for short loops and impractical for longer ones. So work soon began on finding a way to allow BPF programs to contain loops. Various approaches to the loop problem were tried over the years; eventually bounded loop support was added to the 5.3 kernel in 2019.

The problem is thus solved — to an extent. The verifier checks loops by simulating their execution for each combination of initial states and demonstrating that each loop terminates before executing the maximum number of allowed instructions. This verification can take some time and, for some programs, the verifier is simply unable to conclude that the loops will terminate, even though those programs may be correct and safe. There are simply too many possible states and iterations to work through.

The difficulty of verifying loops is complicated by the fact that, by necessity, the verifier works with BPF code, which is a low-level instruction set. The semantics of a loop encoded in a higher-level language are gone by this time. The code may just iterate over the elements of a short array, for example, but the verifier has to piece that together from the BPF code. If there were a way to code a bounded loop in a way that the verifier could see, life would be a lot easier.

That, in short, is the purpose of Koong's patch. It adds a new helper function that can be called from BPF code:

    long bpf_loop(u32 iterations, long (*loop_fn)(u32 index, void *ctx),
    		  void *ctx, u64 flags);

A call to bpf_loop() will result in iterations calls to loop_fn(), with the iteration number and the passed-in ctx as parameters. The flags value is currently unused and must be zero. The loop_fn() will normally return zero; a return value of one will end the iteration immediately. No other return values are allowed.

Essentially, bpf_loop() takes the mechanics of the loop itself out of the BPF code and embeds it within the kernel's BPF implementation instead. It allows the verifier to know immediately that the loop will terminate, since that is outside the control of the BPF program itself. It is also easy to calculate how many instructions may be executed within the loop in the worst case; that and the limit on stack depth will prevent programs that run nearly forever as the result of nested loops.

For BPF programmers, the benefit is that any loop that can be implemented using bpf_loop() becomes much easier to get past the verifier; whole layers of bureaucracy have been shorted out, as it were. Note that loops that, for example, follow a linked list are possible with bpf_loop(); the developer need only supply a maximum possible length as the number of iterations, then terminate early when the desired element has been found or the end of the list has been hit. The form of programs may shift a bit to fit the template, but it should be possible to make that change in many cases.

Another significant advantage is that the time required to verify BPF programs is greatly reduced, since the verifier does not need to actually simulate the execution of all those loops. Some benchmarks show what a difference that can make; one program that takes nearly 30 seconds to verify in current kernels can be verified in 0.15s instead. That significantly increases the practicality of many types of BPF program.

There are many reasons why Fortran remained dominant in numerical applications for so long; one of those is that do loops, by their predictable structure, are relatively easy to vectorize. The purpose of bpf_loop() is different, but it works by the same mechanism: constraining what can be expressed in the language to make it easier for the computer to understand what is really being done. That, in turn, should make it easier for developers to convince the computer that it can safely run their programs.

Comments (77 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Amazon AL2022; Julia 1.7; PHP 8.1; Vizio lawsuit update; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>