Leading items

Welcome to the LWN.net Weekly Edition for May 28, 2020

This edition contains the following feature content:

A pandemic-era LWN update: some news from LWN.
Some sessions from the Python Language Summit: a reprise of some of the more interesting discussions from the recently concluded summit.
Testing in Go: philosophy and tools: how testing is supported in the Go language.
The rest of the OSPM coverage:
- The pseudo cpuidle driver: a development tool for cpuidle governors.
- Saving frequency scaling in the data center: a call for a different compromise between performance and power efficiency.
- The deadline scheduler and CPU idle states: an unlikely combination of technologies that just might make sense.
- Imbalance detection and fairness in the CPU scheduler: how do you balance a workload that cannot be balanced?
- Hibernation in the cloud: an unlikely combination of technologies might just make sense.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A pandemic-era LWN update

By Jonathan Corbet
May 27, 2020

We are living through interesting times that present challenges in a number of areas, including running a business. While we think of LWN primarily as a community resource, it is also a business that is not unaffected by the ongoing pandemic. It is, we figure, a good time for a status update, especially since we have some news to share.

Never has our 2002 decision to move to a subscription model looked like a better idea. Revenue from advertising has reached a level that is essentially indistinguishable from zero, with little sign that it will improve anytime soon. But we didn't depend on advertising because we work directly for our readers; as long as you all support us, we will be in good shape.

Subscriptions have definitely fallen off a bit in the last few months, and we've had subscribers dropping off with a note saying that they had lost their job and needed to cut expenses. But the drop-off has not yet reached a point where we are seriously concerned about it; for that, we can only say "thank you!" to all of you for continuing to support us as the world gets weirder. A special thank-you is due to all of you subscribing at the Project Leader or Supporter levels; it really does make a difference.

After much thought we concluded that we could continue to work to fill the empty staff position we have had for some time. To that end, we are pleased to announce that John Coggeshall is joining the LWN team. John introduces himself this way:

John has been an open-source hacker since 1996 and a core contributor to the PHP project. In his 20+ years in open-source technology he has worked with companies big and small in roles ranging from developer to CTO. As an author, John has published five books on PHP and web development along with hundreds of articles for various publications. Beyond his various contributions to the PHP language itself, John's open-source projects include the PayPal SDK for PHP and the Microsoft Information Card components bundled with Zend Framework. More recently John has taken an interest in IoT development, building open-source IoT libraries for the Arduino platform focusing on the ESP8266 MCU. John currently hails from the Detroit metro area.

LWN readers will have already been introduced to John by way of his article "PHP showing its maturity in release 7.4", which ran in early May. You'll be hearing a lot more from him in the coming months.

Overall, it would appear that that pandemic has not done much to slow down the free-software community, so we are staying as busy as ever. Whether that will continue remains to be seen; there are a lot of unknowns out there at the moment, and it will take time to see how things will play out.

Back in 1997 when work began on what eventually became LWN, we were driven by a strong sense of optimism about the future of Linux and free software. That optimism has been tested by ups and downs over time, but it has largely been borne out; Linux has been more successful than any of us could have imagined, and LWN is still here at the center of it. And we are still optimistic; we have managed to pull together an outstanding community of readers that will continue to support us for as long as we keep doing good work.

That is exactly what we intend to do. We look forward to seeing you on the net and, someday, at in-person events again. Thanks again to all LWN readers, and may you all stay in the best of health.

Comments (62 posted)

Some sessions from the Python Language Summit

By Jake Edge
May 27, 2020

The Python Language Summit is an annual gathering for the developers of various Python implementations, though, this year, the gathering actually happened via videoconference—as with so many other conferences due to the pandemic. The invite-only gathering typically has numerous interesting sessions, as can be seen in the LWN coverage of the summit from 2015 to 2018, as well as in the 2019 summit coverage on the Python Software Foundation (PSF) blog. Those writeups were penned by A. Jesse Jiryu Davis, who reprised his role for this year's summit. In this article, I will summarize some of the sessions that caught my eye.

Language specification

Mark Shannon shared his thoughts on a more formal definition of the Python language. It would not only help developers of alternative implementations understand the nuances and corner cases of the language, it would also help developers of the CPython reference implementation fully understand that code base. He noted that Java has a language specification and he thinks that Python could benefit from having one as well.

Shannon proposed splitting the specification up into three parts: code loading (parsing, importing, and so on), execution, and the C API. For his presentation, he looked in more detail at the execution specification. For example, he broke down a function call into a series of steps: create a stack frame, move the function arguments from the current frame to the new one, save the instruction pointer, and push the frame onto the stack.

Breaking things down that way will allow developers to rework how certain features are interrelated. The example he gave was that iterators came first, so generators were defined in terms of iterators, even though generators are the lower-level concept. If you were starting from scratch, it would make more sense to specify iterators as being built on generators. In his nascent formal semantics repository, Shannon made a start on defining iterators in terms of generators.

The goal of the work would be to assist in language development, but it will take a fair amount of work to get there:

He concluded that a semi-formal spec of Python would help alternative implementations match CPython, would make PEPs less ambiguous, and would clarify whether any existing "odd behavior is a feature or a bug." It would be possible to reason about the correctness of optimizations. However, writing the spec is work, and it could deter good PEPs in the future if authors are daunted by writing their proposals in terms of the spec.

The audience reaction appears to have been positive, overall, with some amount of confusion as to how to get there—and how exactly the specification would be used once it exists. It remains to be seen if Shannon (or someone else) wants to put in a fairly large chunk of work for an unclear amount of benefit.

HPy

One of the problems that CPython has struggled with over the years is its C API for extensions. On one hand, it has led to a number of high-powered, popular extensions like NumPy, but on the other, the API is too closely tied to CPython internals. The too-close tie not only hampers alternative implementations (e.g. PyPy), but also stands in the way of changes that CPython developers might like to make—efforts to remove the Global Interpreter Lock (GIL), in particular.

That is the backdrop against which Antonio Cuni presented HPy (the "H" is for "handle"), which is a new API that might provide a way forward. HPy came about from a conversation between CPython, PyPy, and Cython developers at last year's EuroPython; it could replace the existing C API with one that is based on handles, rather than direct pointers to CPython objects.

Currently, Python objects in C extensions are PyObject pointers that have their lifetimes managed through reference counts. HPy would instead turn those into HPy types that effectively wrap the underlying PyObject pointers, which decouples the extension from the reference counts. A mapping between HPy objects and PyObject pointers would need to be maintained, but if, say, PyPy wanted to move objects around as part of its garbage-collection strategy, it would simply need to update the map appropriately. Handles need to be explicitly closed, and only once per handle, which is different than the Py_INCREF()/Py_DECREF()-style of reference-count management used today.

There is a debug mode that will help catch multiple calls to HPy_Close(), which should help with porting extensions to the API. To a large extent, HPy "basically works", Cuni said, but there are still lots of things that need to be addressed, including support for custom Python types in extensions.

Cuni said the "HPy strategy to conquer the world" is to create a zero-overhead façade that maps HPy to the C API (using compile-time macros), then port third-party C extensions to pure HPy, one function at a time. It must be faster on alternative implementations than their existing C API emulations; early benchmarks show a 3x speedup on PyPy and 2x on GraalPython, a JVM-based Python.

There are plans afoot to port parts of NumPy to HPy; NumPy is something of the "gold standard" in terms of Python C extensions. The PyPy team has done a lot of work to make NumPy work with PyPy; no change to the C API is likely to go far without somehow supporting NumPy. Eventually, Cuni would like to write a Python Enhancement Proposal (PEP) to add HPy to CPython; it would not replace the existing C API, but would coexist with it, at least for while.

Property-based testing

Handwritten tests do a good job of finding problems and preventing regressions, but they are limited to the types of tests that the developer can think of—bugs that come from unforeseen areas or interactions may well be missed. There are alternatives that try to fill in those gaps, either through exhaustive testing or by fuzzing; using coverage-guided fuzzing can yield even better results. But fuzz testing is generally geared toward finding inputs that cause programs to crash; property-based testing is an alternative for finding logic bugs of various sorts.

Zac Hatfield-Dodds gave a summit presentation on property-based testing; he is one of the leads for the Hypothesis project, which is "a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for". Hatfield-Dodds proposed adding these kinds of tests to the Python standard library.

Instead of looking for a particular mapping from an input to an output, as many unit tests do, a property-based test would describe what properties the function should always maintain: commutative, sorted values, idempotent, and so on. The framework takes those descriptions and alters the inputs to see if it can find places where it breaks.

Hypothesis searches for bugs by randomizing the input, or trying interesting values that tend to trigger edge cases, or retrying inputs that triggered bugs in previous runs. When Hypothesis finds a bug, it evolves the input, searching for the simplest input that reproduces the same bug.

Hatfield-Dodds suggested that property-based tests be created for CPython, its builtins, the standard library, other implementations like PyPy, and more. Those tests could be run as part of the continuous integration (CI) for CPython and shared with other language implementations. They could also be integrated with coverage-guided fuzzing frameworks, such as for the OSS-Fuzz project.

Several developers in the audience noted that property-based testing looks useful. Łukasz Langa pointed to an effort to use the techniques in Hypothesis on a Python code formatter, which found a lot of bugs. Paul Ganssle used property-based testing for his reimplementation of date.fromisoformat() in the datetime module; it worked well, but those tests were not merged with the new code. A subsequent bug was introduced that would likely have been caught if those tests had been run, so he was strongly behind the idea of adding that kind of testing. It is not clear where things go from here, but the technique seems like a promising addition to the testing arsenal.

Mobile Python

Russell Keith-Magee returned to the summit to give a presentation on Python for mobile systems. He presented in 2015 and last year on the status of the long-running project to make CPython available on iOS and Android. He began his presentation this year by noting that the BeeWare project has its tools mostly running on Android now, thanks to a grant from the PSF.

CPython has been running on iOS for a while now, but Android has been problematic until recently. The strategy used to be to compile the Python to Java bytecode, then run that on Android, "but Android devices are now fast enough, and the Android kernel permissive enough, to run CPython itself".

Distribution size is an issue for mobile platforms, however. Each app bundles the entire Python runtime, so making that as small as possible is a priority for the project. There have been some ideas on slimming down CPython by removing much or all of the standard library in order to minimize its size. The idea of a "kernel Python" (which was inspired by a presentation from Amber Brown at the 2019 summit) is one that a number of different projects would like to see.

Senthil Kumaran observed, "BeeWare, MicroPython, Embedded Python, Kivy all seem to have a need for a kernel-only Python," and suggested they combine forces to create one.

Currently, Keith-Magee maintains CPython forks to support iOS for Python 3.5 through 3.8; for Android, he has a handful of patches and a list of unit tests that need to be skipped. There is no continuous-integration (CI) testing as he has not found a service that provides phones to run on. If mobile Python is to become a reality, the changes for iOS and Android need to get upstream and some kind of CI system needs to be established.

Mobile Python suffers a chicken-and-egg problem: there is no corporate funding for Python on mobile because Python doesn't support mobile, so there is no one relying on mobile Python who is motivated to fund it.

He wondered if the CPython core developers were interested in changing the situation; it will take both money and work to get there, but there is no point in doing it if there is no appetite for it in the core. It sounds like several audience members were in favor of adding support for mobile Python, including Python founder Guido van Rossum and former release manager Ned Deily. Whether that translates to a renewed push, with some funding from the PSF or elsewhere, remains to be seen.

And more

There were, of course, lots of other sessions, as well as two rounds of lightning talks. Those interested in different facets of Python development will find taking a spin through the reports rewarding. At the end of the videoconference, Victor Stinner said: "Thanks TCP/IP for making this possible." That is a sentiment that will likely be shared widely these days.

Comments (5 posted)

Testing in Go: philosophy and tools

May 26, 2020

This article was contributed by Ben Hoyt

The Go programming language comes with tools for writing and running tests: the standard library's testing package, and the go test command to run test suites. Like the language itself, Go's philosophy for writing tests is minimalist: use the lightweight testing package along with helper functions written in plain Go. The idea is that tests are just code, and since a Go developer already knows how to write Go using its abstractions and types, there's no need to learn a quirky domain-specific language for writing tests.

The Go Programming Language by Brian Kernighan and Alan Donovan summarizes this philosophy. From chapter 11 on testing:

Many newcomers to Go are surprised by the minimalism of Go's testing framework. Other languages' frameworks provide mechanisms for identifying test functions (often using reflection or metadata), hooks for performing "setup" and "teardown" operations before and after the tests run, and libraries of utility functions for asserting common predicates, comparing values, formatting error messages, and aborting a failed test (often using exceptions). Although these mechanisms can make tests very concise, the resulting tests often seem like they are written in a foreign language.

To see this in practice, here's a simple test of the absolute value function Abs() using the testing package and plain Go:

    func TestAbs(t *testing.T) {
        got := Abs(-1)
        if got != 1 {
            t.Errorf("Abs(-1) = %d; want 1", got)
        }
    }

Contrast that with the following version, written using the popular (though I would argue non-idiomatic) Ginkgo library that provides a means to write RSpec-style tests for Go:

    Describe("Abs", func() {
        It("returns correct abs value for -1", func() {
            got := Abs(-1)
            Expect(got).To(Equal(1))
        })
    })

The functions Describe, Expect, etc, make the test "read like English", but means that there is suddenly a whole new sub-language to learn. The thinking of Go contributors such as Donovan is that there are already tools like == and != built into the language, so why is To(Equal(x)) needed?

That said, Go doesn't stop developers from using such libraries, so developers coming from other languages often find using them more familiar than vanilla testing. One relatively lightweight library is testify/assert, which adds common assertion functions like assert.Equal(), and testify/suite, which adds test-suite utilities like setup and teardown. The "Awesome Go" website provides an extensive list of such third-party packages.

One useful testing tool that's not part of the testing package is reflect.DeepEqual(), which is a standard library function that uses reflection to determine "deep equality", that is, equality after following pointers and recursing into maps, arrays, and so on. This is helpful when tests compare things like JSON objects or structs with pointers in them. Two libraries that build on this are Google's go-cmp package and Daniel Nichter's deep, which are like DeepEqual but produce a human-readable diff of what's not equal rather just returning a boolean. For example, here's a (deliberately broken) test of a MakeUsers() function using go-cmp:

    func TestMakeUser(t *testing.T) {
        got := MakeUser("Bob Smith", "bobby@example.com", 42)
        want := &User{
            Name:  "Bob Smith",
            Email: "bob@example.com",
            Age:   42,
        }
        if diff := cmp.Diff(want, got); diff != "" {
            t.Errorf("MakeUser() mismatch (-want +got):\n%s", diff)
        }
    }

And the human-readable output is:

    user_test.go:16: MakeUser() mismatch (-want +got):
          &main.User{
            Name:  "Bob Smith",
        -   Email: "bob@example.com",
        +   Email: "bobby@example.com",
            Age:   42,
          }

Built-in `testing` features

The built-in testing package contains various functions to log information and report failures, skip tests at runtime, or only run tests in "short" mode. Short mode provides a way to skip tests that are long running or have a lot of setup, which can be helpful during development. It is enabled using the -test.short command line argument.

Go's test runner executes tests sequentially by default, but there's an opt-in Parallel() function to allow running explicitly-marked tests at the same time across multiple cores.

In Go 1.14, the testing package added a Cleanup() function that registers a function to be called when the test completes. This is a built-in way to simplify teardown, for example to delete database tables after a test finishes:

    func createDatabase(t *testing.T) {
        // ... code to create a test database
        t.Cleanup(func() {
            // ... code to delete the test database
            // runs when the test finishes (success or failure)
        })
    }

    func TestFetchUser(t *testing.T) {
        createDatabase(t) // creates database and registers cleanup
        user, err := FetchUser("bob@example.com")
        if err != nil {
            t.Fatalf("error fetching user: %v", err)
        }
        expected := &User{"Bob Smith", "bob@example.com", 42}
        if !reflect.DeepEqual(user, expected) {
            t.Fatalf("expected user %v, got %v", expected, user)
        }
    }

Go 1.15 is adding a test helper, TempDir(), that creates (and cleans up) a temporary directory for the current test. There's a high bar for adding to the testing package, but Russ Cox on the core Go team gave his approval for this addition: "It seems like temporary directories do come up in a large enough variety of tests to be part of testing proper."

Table-driven tests

A common idiom in Go to avoid repetition when testing various edge cases is called "table-driven tests". This technique iterates over the test cases in a "slice" (Go's term for a view into a resizable array), reporting any failures for each iteration:

    func TestAbs(t *testing.T) {
        tests := []struct {
            input    int
            expected int
        }{
            {1, 1},
            {0, 0},
            {-1, 1},
            {-maxInt, maxInt},
            {maxInt, maxInt},
        }
        for _, test := range tests {
            actual := Abs(test.input)
            if actual != test.expected {
                t.Errorf("Abs(%d) = %d; want %d", test.input, actual, test.expected)
            }
        }
    }

The t.Errorf() calls report the failure but do not stop the execution of the test, so multiple failures can be reported. This style of table-driven test is common throughout the standard library tests (for example, the fmt tests). Subtests, a feature introduced in Go 1.7, gives the ability to run individual sub-tests from the command line, as well as better control over failures and parallelism.

Mocks and interfaces

One of Go's well-known language features is its structurally-typed interfaces, sometimes referred to as "compile-time duck typing". "Interfaces in Go provide a way to specify the behavior of an object: if something can do this, then it can be used here." Interfaces are important whenever there is a need to vary behavior at runtime, which of course includes testing. For example, as Go core contributor Andrew Gerrand said in the slides for his 2014 "Testing Techniques" talk, a file-format parser should not have a concrete file type passed in like this:

    func Parse(f *os.File) error { ... }

Instead, Parse() should simply take a small interface that only implements the functionality needed. In cases like this, the ubiquitous io.Reader is a good choice:

    func Parse(r io.Reader) error { ... }

That way, the parser can be fed anything that implements io.Reader, which includes files, string buffers, and network connections. It also makes it much easier to test (probably using a strings.Reader).

If the tests only use a small part of a large interface, for example one method from a multi-method API, a new struct type can be created that embeds the interface to fulfill the API contract, and only overrides the method being called. A full example of this technique is shown in this Go Playground code.

There are various third party tools, such as GoMock and mockery, that autogenerate mock code from interface definitions. However, Gerrand prefers hand-written fakes:

[mocking libraries like gomock] are fine, but I find that on balance the hand-written fakes tend be easier to reason about and clearer to see what's going on, but I'm not an enterprise Go programmer so maybe people do need that so I don't know, but that's my advice.

Testable examples

Go's package documentation is generated from comments in the source code. Unlike Javadoc or C#'s documentation system, which make heavy use of markup in code comments, Go's approach is that comments in source code should still be readable in the source, and not sprinkled with markup.

It takes a similar approach with documentation examples: these are runnable code snippets that are automatically executed when the tests are run, and then included in the generated documentation. Much like Python's doctests, testable examples write to standard output, and the output is compared against the expected output, to avoid regressions in the documented examples. Here's a testable example of an Abs() function:

    func ExampleAbs() {
        fmt.Println(Abs(5))
        fmt.Println(Abs(-42))
        // Output:
        // 5
        // 42
    }

Example functions need to be in a *_test.go file and prefixed with Example. When the test runner executes, the Output: comment is parsed and compared against the actual output, giving a test failure if they differ. These examples are included in the generated documentation as runnable Go Playground snippets, as shown in the strings package, for example.

Benchmarking

In addition to tests, the testing package allows you to run timed benchmarks. These are used heavily throughout the standard library to ensure there are not regressions in execution speed. Benchmarks can be run automatically using go test with the -bench= option. Popular Go author Dave Cheney has a good summary in his article "How to write benchmarks in Go".

As an example, here's the standard library's benchmark for the strings.TrimSpace() function (note the table-driven approach and the use of b.Run() to create sub-benchmarks):

    func BenchmarkTrimSpace(b *testing.B) {
        tests := []struct{ name, input string }{
            {"NoTrim", "typical"},
            {"ASCII", "  foo bar  "},
            {"SomeNonASCII", "    \u2000\t\r\n x\t\t\r\r\ny\n \u3000    "},
            {"JustNonASCII", "\u2000\u2000\u2000☺☺☺☺\u3000\u3000\u3000"},
        }
        for _, test := range tests {
            b.Run(test.name, func(b *testing.B) {
                for i := 0; i < b.N; i++ {
                    TrimSpace(test.input)
                }
            })
        }
    }

The go test tool will report the numbers; a program like benchstat can be used to compare the before and after timings. Output from benchstat is commonly included in Go's commit messages showing the performance improvement. For example, from change 152917:

    name                      old time/op  new time/op  delta
    TrimSpace/NoTrim-8        18.6ns ± 0%   3.8ns ± 0%  -79.53%  (p=0.000 n=5+4)
    TrimSpace/ASCII-8         33.5ns ± 2%   6.0ns ± 3%  -82.05%  (p=0.008 n=5+5)
    TrimSpace/SomeNonASCII-8  97.1ns ± 1%  88.6ns ± 1%   -8.68%  (p=0.008 n=5+5)
    TrimSpace/JustNonASCII-8   144ns ± 0%   143ns ± 0%     ~     (p=0.079 n=4+5)

This shows that the ASCII fast path for TrimSpace made ASCII-only inputs about five times as fast, though the "SomeNonASCII" sub-test slowed down by about 9%.

To diagnose where something is running slowly, the built-in profiling tools can be used, such as the -cpuprofile option when running tests. The built-in go tool pprof displays profile output in a variety of formats, including flame graphs.

The `go test` command

Go is opinionated about where tests should reside (in files named *_test.go) and how test functions are named (they must be prefixed with Test). The advantage of being opinionated, however, is that the go test tool knows exactly where to look and how to run the tests. There's no need for a makefile or metadata describing where the tests live — if files and functions are named in the standard way, Go already knows where to look.

The go test command is simple on the surface, but it has a number of options for running and filtering tests and benchmarks. Here are some examples:

    go test             # run tests in current directory
    go test package     # run tests for given package
    go test ./...       # run tests for current dir and all sub-packages
    go test -run=foo    # run tests matching regex "foo"
    go test -cover      # run tests and output code coverage
    go test -bench=.    # also run benchmarks
    go test -bench=. -cpuprofile cpu.out
                        # run benchmarks, record profiling info

Go test's -cover mode produces code coverage profiles that can be viewed as HTML using go tool cover -html=coverage.out. When explaining how Go's code coverage tool works, Go co-creator Rob Pike said:

For the new test coverage tool for Go, we took a different approach [than instrumenting the binary] that avoids dynamic debugging. The idea is simple: Rewrite the package's source code before compilation to add instrumentation, compile and run the modified source, and dump the statistics. The rewriting is easy to arrange because the go command controls the flow from source to test to execution.

Summing up

Go's testing library is simple but extendable, and the go test runner is a good complement with its test execution, benchmarking, profiling, and code-coverage reporting. You can go a long way with the vanilla testing package — I find Go's minimalist approach to be a forcing function to think differently about testing and to get the most out of native language features, such as interfaces and struct composition. But if you need to pull in third party libraries, they're only a go get away.

Comments (14 posted)

The pseudo cpuidle driver

By Jonathan Corbet
May 21, 2020

OSPM

The purpose of a cpuidle governor is to decide which idle state a CPU should go into when it has no useful work to do; the cpuidle driver then actually puts the CPU into that state. But, at the 2020 Power Management and Scheduling in the Linux Kernel summit (OSPM), Abhishek Goel presented a new cpuidle driver that doesn't actually change the processor's power state at all. Such a driver will clearly save no power, but it can be quite useful as a tool for evaluating and debugging cpuidle policies.

Goel began by saying that this work was motivated by a performance problem encountered with high-latency idle states — deep CPU sleep states that take a long time to restart from. A GPU-oriented workload that generated lots of interrupts was involved; the time between those interrupts was just enough to cause the governor to choose a deep idle state. That created latency which added up over time as the workload progressed. The temporary workaround was to increase the target latency (the expected sleep time) for those idle states by a factor of three to five, biasing the idle-state choice toward the shallower states. It solved the problem, but is "not elegant"; others will undoubtedly find workloads that go wrong in other ways.

Rafael Wysocki interjected to suggest using the pm_qos mechanism instead; its purpose is to address just this sort of latency issue, and he was curious to know why it didn't work. Part of the problem, evidently, is that pm_qos will disable the deeper idle states entirely, but there can be value in having them remain available for the truly long idle periods. Parth Shah added that, on the Power architecture, this is even more true; without getting to those deeper idle states little energy will be saved.

Goel tried providing a debugfs interface for the cpuidle core that would allow the residency attributes of the various idle states to be tweaked at run time. It is useful for validating idle states, he said, but synchronization of the changes within the kernel is painful. Changing these attributes can also lead to strange behavior due to distortions of the CPU's idle history. He wanted a better solution.

The result was the pseudo cpuidle driver. It is a loadable module that allows the user to create customized idle states and tweak the attributes as they go. Doing things this way avoids both the synchronization and history-distortion problems. The module is loaded with a set of parameters describing the number of states, along with the target residency and exit latency of each. The actual "idle states" are implemented by putting the CPU into a busy-wait loop, spinning until the next wakeup event happens; the governor then spins a bit longer to simulate the exit latency time.

This behavior clearly does a poor job of saving power, but it is useful to evaluate how specific policies affect system performance. It can be used for tuning a governor, or to compare the effects of different governors entirely. It also turns out to be useful for CPU design, he said; designers can try out various idle states and see how they will actually perform.

Future work, he concluded, could include simulating idle states at the core and chip levels as well as basic CPU states. He is also planning to add some tracing capabilities to the driver.

Wysocki led off the discussion by pointing out one potential problem. In a real system, the response to a hardware interrupt will be delayed by the exit latency of the processor. The only way to simulate that delay would be to do the busy-waiting with interrupts disabled, but then the interrupt (which would normally wake the system) will be missed entirely. That particular aspect of hardware behavior, it seems, cannot be emulated in this way. That said, he agreed that the driver looks useful for studying cpuidle governors, and seems worth having.

At the conclusion of the session, Juri Lelli asked if there were any sort of performance numbers comparing this driver with real hardware; Goel answered that he didn't have those yet.

Comments (none posted)

Saving frequency scaling in the data center

By Jonathan Corbet
May 21, 2020

OSPM

Frequency scaling — adjusting a CPU's operating frequency to save power when the workload demands are low — is common practice across systems supported by Linux. It is, however, viewed with some suspicion in data-center settings, where power consumption is less of a concern and there is a strong emphasis on getting the most performance out of the hardware. At the 2020 Power Management and Scheduling in the Linux Kernel summit (OSPM), Giovanni Gherdovich worried that frequency scaling may be about to go extinct in data centers; he made a plea for improving its behavior for such workloads while there is still time.

He started with a quote from a car-rally driver: "if in doubt, go flat out". This may not actually be the best advice for drivers of motor vehicles, he said, but it may be the right approach for frequency scaling. Users in data centers often switch to the "performance" CPU-frequency governor, which is not frequency scaling. This governor, which simply runs at full speed all the time, reflects one outcome of the tension between energy efficiency and raw performance. Program managers tend to be interested in performance first, and performance is the first thing that customers see. The cost of power usage is only discovered later, resulting in attempts to hack efficiency into a data-center deployment as an afterthought. It would be better to have that efficiency there from the outset, he said.

He asked the audience a question: assume you are a regional bank running an on-premises data center. Which CPU-frequency governor would you choose? The intel_pstate powersave governor would be the smart choice for now. But the intel_pstate performance governor beckons. The "schedutil" governor is the upcoming next generation. Would you pick one of those, or just go with whatever default the distribution picked? The choice is not obvious. Frequency scaling looks like a risk for a data-center user. Can the distribution vendor be trusted to have made the right choice? For distributors, the bank is a customer who must be catered to. Which governor would you set as the default?

Mobile-first considered harmful

He would like to see frequency scaling be the obvious choice for data-center users and make the performance governor obsolete. Eventually he would like to see the schedutil governor win universally; it's "too cool not to win". But that is hampered by the (in his view) mobile-first viewpoint taken by the developers working with frequency scaling. The object is to save every last bit of energy with the idea that the performance governor exists for users who don't share that goal. That results in frequency scaling stagnating on the x86 architecture, which is relatively rare in power-sensitive settings.

(Your editor, who has been watching for a long time, was amused by this. For many years the complaint was that "big iron" dominated kernel-development priorities; that situation would appear to have changed.)

So what happens if distributors default to the performance governor for x86 systems? One advantage would be that the task of getting the powersave governor into shape could be dropped, along with the complexity that governor brings. On the other hand, he said, the x86 community will lose its grip on technology that it will certainly need someday. Avoiding the powersave governor on server systems will simply paper over bugs that, in the long run, need to be fixed. The last time this topic came up at SUSE (where Gherdovich works) the powersave contingent won, but the issue will come up again.

It seems, though, that the performance governor isn't an obvious choice even now. Dhaval Giani said that it can prevent a CPU from going into "turbo" mode, causing some benchmarks to regress. Rafael Wysocki pointed out that frequency scaling is increasingly done in the processor itself, which can cause strange results when the performance governor is selected.

Gherdovich answered that there appears to be some tension here. The schedutil governor is getting smarter, but the "hardware-managed p-states" feature (called HWP) is pushing things the other way and taking the kernel out of the decision loop. It's not clear how things will play out, and whether frequency scaling will ultimately be controlled in the operating system or the firmware. Wysocki said that the two approaches are not mutually exclusive, though; the operating system works at different time scales than HWP does. It is possible to bring the two into agreement, but there aren't many ways to provide feedback between them now. He has a patch that tries to improve the situation; he will attempt to post it soon.

Continuing, Gherdovich said that defaults set by distributors are critically important; they are the first thing that users see. A distribution will be evaluated with its default settings; if the results are bad, users will move on without ever getting to the point where they try to tune things. So distributors tend to emphasize their default settings when running tests, resulting in far fewer bug reports for non-default CPU-frequency governors. If performance is the default, powersave will get little attention. Additionally, regressions are not something that can be tolerated; if frequency scaling is ever turned off, it will be almost impossible to turn it back on. The chances of creating performance regressions would just be too high.

That, he said, leads to a downward spiral for non-performance CPU-frequency governors. The algorithms in those governors will increasingly be tuned for settings outside of the data center, causing data-center users to lose confidence in them entirely. Distributors will just default to performance, there will be no bug reports, bugs won't get fixed, and frequency scaling will just get worse.

Compromise needed

How do we avoid this dark future? Frequency scaling needs to compromise a bit in the direction of performance, he said, if it wants to win the data center. Often the correct choice for the CPU frequency is obvious, and the governor should go with it. But if there is no information available or the indications conflict with each other, that is the time to favor performance. Any other algorithm will be irrelevant on servers.

For now, he said, the process that is making frequency scaling unsuitable for data centers has not advanced far, but he worries that the priority for upstream developers seems to strongly favor saving power, and he would like to change that somewhat. Wysocki said that anybody who sends patches to the kernel has an agenda — it's why they wrote the patch in the first place. What data-center advocates need to do is to respond to patches that show an agenda falling too heavily on the battery side.

Gherdovich was seemingly ready for that; he countered by bringing up this patch merged by Wysocki in 2019. The "I/O boost" heuristic in the powersave governor assumes that, if a task has been waiting for I/O, it will have work to do once that I/O completes, so the governor increases the processor's operating frequency to get that work done quickly. Prior to the patch, it increased the frequency all the way to the maximum; afterward, the frequency ramps up more slowly. This patch regresses the dbench benchmark, Gherdovich said. Wysocki responded that the purpose of the patch was to avoid starving the integrated GPU of power, and to match an equivalent change made to the schedutil governor.

There may be good reasons for the change, Gherdovich said, but that patch is currently reverted in SUSE kernels, which is clearly a stopgap solution. There are a couple of other out-of-tree patches in those kernels as well, as it turns out. The "idle boost" patch works like I/O boost; it temporarily increases the frequency when a processor exits the idle state. The "ramp up faster" change is an old patch that nobody likes; it causes the frequency to ramp up more quickly when utilization hits a threshold. These patches are expensive to maintain, and SUSE would much rather stick with the mainline.

Patrick Bellasi asked whether any attempt had been made to use uclamp to get the desired performance results; that has not been done. Mel Gorman added that uclamp is disabled in SUSE kernels since it imposes a significant (3-4%) overhead even when it is not used. Bellasi (the author of the uclamp work) was evidently surprised by this and asked for further information, so that problem, at least, may eventually be fixed.

Gherdovich concluded by putting up some numbers. Reverting the I/O-boost patch increases dbench performance by 10%, he said. The performance per watt of power used drops by 23%, which is not a big problem on a server system; users typically do not want to lose that 10% of throughput even if it's costly in energy terms. The full "spicy-powersave" patch set — the I/O boost revert plus "idle boost" and "ramp up faster" — improves kernel build times by 10% with no power cost at all.

At the end, Wysocki asked how much performance data-center users were willing to lose to save some power; Gherdovich didn't have a precise answer but did say that 10% is too much. Wysocki expressed a wish that the CPU-frequency governor work would, in the end, converge on a single solution for everybody, probably in the form of the schedutil governor.

[See Gherdovich's slides [PDF] for details and all the performance results.]

Comments (16 posted)

The deadline scheduler and CPU idle states

By Jonathan Corbet
May 22, 2020

OSPM

As Rafael Wysocki conceded at the beginning of a session at the 2020 Power Management and Scheduling in the Linux Kernel summit (OSPM), the combination of the deadline scheduling class with CPU idle states might seem a little strange. Deadline scheduling is used in realtime settings, where introducing latency by idling the CPU tends to be frowned upon. But there are reasons to think that these two technologies might just be made to work together.

Why would one even bother to try to combine the deadline scheduler and CPU idle states? One should never miss opportunities to save energy, he said. Plus, on some systems, avoiding idle states is not really an option; without them, the CPU will overheat and thermal throttling will kick in. Meanwhile, the combination seems viable to him. In theory, at least, all of the information needed to select idle states is present; the scheduler has work estimates and deadlines for all tasks, and it has the idle-state properties for the CPU. It's just a matter of using that information correctly.

His idea is to maintain a global latency quality-of-service request that depends on all deadline tasks in the system. That will show that, sometimes, there is no room for idle states; if enough deadline tasks have been admitted to use all of the available CPU time, the CPU clearly cannot go idle. But other times there will be some room. He proposed two rules to govern transitions into idle states:

The latency limit cannot exceed the difference between the next deadline for any task and its runtime. If a task has 2ms worth of work to do by a deadline 5ms from now, nothing can impose a latency greater than 3ms.
That limit, when multiplied by the number of deadline tasks, cannot exceed the amount of run time available after all deadline run-time reservations have been subtracted. In other words, the system cannot lose more CPU time to exit latency than it would have left over if all deadline tasks use their full reservation.

Juri Lelli said that the basic idea makes sense. Daniel Bristot de Oliveira, instead, said that while the first rule makes sense, the second is too pessimistic. Not all wakeups will happen on an idle CPU, so the exit-latency penalty will not always have to be paid. With the SCHED_FIFO realtime class, you know about the maximum latency for any given task, but that is not true for deadline tasks, which have no latency guarantees. Some delay for a deadline task at wakeup time is acceptable as long as it still makes its deadline.

One complication, Wysocki said, is that the processor may have to go into the C1 idle state every so often, regardless of what the operating system would want to have happen. That led to some discussion about how the forced C1 time could maybe be modeled; Tommaso Cucinotta suggested that it could be set up as a special deadline task of its own, at which point the scheduler's admission control policy could account for it. Wysocki thought it was an interesting idea, but he would still like to address the possibility of opportunistically going idle for additional time if the workload allows it.

Lelli pointed out that the scheduler reserves some time for non-realtime tasks now to be sure that they can run at least a little bit. Perhaps something similar could be done to reserve time for the idle thread? Cucinotta said that this idling would have to happen at the right time. Lelli said that it may be necessary to synchronize idle times across CPUs as well, but Wysocki said he is not thinking about deeper idle states or idling entire packages at this time.

Lelli asked if there were patches to look at now; Wysocki said that he hasn't done any real work yet. That is a good thing, since he learned things in this discussion that will influence what he eventually comes up with.

At this point Wysocki was finished, but the conversation continued. Dietmar Eggemann noted that, while admission control for deadline tasks is done globally, the actual scheduling of deadline tasks is done on a per-CPU basis. At which level, he asked, would idle time be taken into consideration? Bristot said that this division is an artifact of the difference between deadline-scheduling theory and the practice of an actual implementation. Cucinotta said that it's always possible to partition the system to move the admission-control decisions downward.

From there the discussion went deeply into deadline-scheduling theory; see the recording, once it's available, for the details.

Comments (none posted)

Imbalance detection and fairness in the CPU scheduler

By Jonathan Corbet
May 22, 2020

OSPM

The kernel's CPU scheduler is good at distributing tasks across a multiprocessor system, but does it do so fairly? If some tasks get a lot more CPU time than others, the result is likely to be unhappy users. Vincent Guittot ran a session at the 2020 Power Management and Scheduling in the Linux Kernel summit (OSPM) looking into this issue, with a focus on detecting load imbalances between CPUs and what to do with a workload that cannot be balanced.

Imbalance detection

In the 5.7 kernel, he began, the runnable_load_avg signal has been removed in favor of runnable_avg, which is the sum of the "runnable" time of every scheduler entity (either an individual task or a control group containing tasks) in a run queue. The runnable time is defined as the time a task actually spends running, but also the time it spends waiting to run. This change addresses a problem that had been seen in capacity tracking when a task is migrated from one CPU to another.

Specifically, moving a task off of a CPU moves that task's utilization with it, causing that CPU to appear to suddenly have a lot of spare capacity. But if other tasks on the CPU were spending a lot of time waiting to run, that capacity doesn't really exist; the utilization of those tasks was artificially reduced by the fact that they couldn't run as much as they needed to. Including the waiting time prevents that false capacity from appearing when one task moves away, giving the remaining tasks time to expand their actual utilization. The calculation of when a CPU (or set of CPUs) is overloaded now looks at runnable_avg, which must exceed the CPU capacity by a threshold before the scheduler will try to move tasks away.

NUMA balancing is still not using this metric, though, so there is currently a mismatch between normal load balancing and NUMA balancing. That can lead to conflicting decisions at times. It might make sense to change the balancing at the NUMA level, but NUMA nodes can contain a lot of CPUs, and he worries about the impact of summing that many runnable_avg values. He has not started working on this problem, but it's at the top of his list.

Peter Zijlstra noted that developers are still "chasing the fallout" from the changes that have been made so far. Guittot acknowledged that, saying he's not sure if the NUMA issues play into that or not.

Another issue relates to the threshold used with runnable_avg; it is currently a fixed value. But runnable_avg is dependent on the number of runnable tasks, since more tasks will lead to more waiting time. That makes it easier to cross the threshold as the number of tasks increases.

He presented an example to show how the calculations can vary. Imagine N tasks, all with the same utilization. If they all wake up simultaneously and land in the run queue together, the runnable_avg value that results will be proportional to N². If, instead, each wakes up just as the previous one is completing the work, runnable_avg will be directly proportional to N. As N grows, the difference between the two scenarios will become large.

To fix this, he is playing with scaling the threshold by the number of running tasks. That delays the crossing of the threshold and subsequent determination that the CPUs are overloaded. Benchmarking is ongoing, with no significant results to report so far. He's still looking for a benchmark that demonstrates the problem in a useful way.

Fairness with difficult workloads

Guittot then moved on to a fairness problem: how do you balance a case that simply cannot be balanced? Sometimes the granularity of the load on the system just doesn't match the CPU topology. Three tasks running on two CPUs is one example of such a situation. If two tasks are kept on one CPU, they will get half of the running time that the third task gets, which is unfair. This problem only comes about when the system is overloaded.

Going to a more complex example, he described nine tasks running on an eight-CPU system. This load cannot be balanced, but it should be fair. He ran some tests using the rt-app benchmark, comparing the amount of work each task was able to complete. The average unfairness he found was about 20%, with one test reaching 40%. Given that the unfairness cannot go above 50%, that is a pretty bad result.

There are a couple of rules that control when the scheduler will try to move a task to balance the system. The first is that it will look for a task whose utilization is less than twice the calculated imbalance value. In the scenario described here, this rule will almost never find a task to move, causing load balancing to fail. So the second rule kicks in: it moves a task when the number of load-balancing failures exceeds a threshold. At that point, the scheduler is rather less selective when it comes to picking a task. That leads to unfair scheduling.

Looking at the problem, he found that some CPUs never manage to pull tasks from others; that causes the tasks that are running on those CPUs to get more than their fair amount of time. This seems to be a result of the fact that load balancing happens nearly simultaneously on all CPUs. This also happens at the CPU group level; load balancing at that level also happens at about the same time. But the balancing running within the group will run more quickly, since it has fewer CPUs to consider. That leads to tasks moving around within a group, but rarely between them.

Another problem is that the same task is often chosen to migrate; it will get less CPU time as a result. There is an unexpected synchronous pattern between the scheduling period and the load balancer that causes the same task to often be waiting to execute when balancing happens. There is a simple fix for both problems: tweak the load-balancing intervals at the various levels so that they don't run simultaneously and don't line up with the scheduling period.

Another fix is to reduce the active load balancing that happens when normal load-balancing attempts fail. Active load balancing can move tasks that should not necessarily be moved, so it should only be done when it's clear that it makes sense.

He has also been looking at the min_vruntime value on each CPU; this value can be seen as a proxy for how much the least-scheduled task on that CPU has been able to run. If min_vruntime is not increasing equally across CPUs, that is a sign of unfair scheduling. This approach does not scale well, since min_vruntime only applies to CPU-level run queues rather than tasks or group-level queues. Still, by taking a cue from min_vruntime, he was able to reduce the average unfairness on the rt-app test to about 15% — better, but not a complete solution. The maximum unfairness fell to 18%, which is a significant improvement.

So he decided to try a different metric: the ratio of the load average and the utilization average. That gives a good fairness metric, but is not ideal either. There is a big mismatch between the period over which the utilization average is calculated and the load-balancing period; the utilization average is also capped at the max capacity of the CPU. So instead he is looking at "sched_avg", the sum of the average utilization of all the run queues. This helps reduce the cases where load balancing bounces tasks quickly between groups.

This change reduced average unfairness to 12% with a maximum of 16%. But the "always moving the same task" problem is not fully solved though. This could be mitigated by considering each task's utilization average before moving it; a task that has been discriminated against recently will have lower utilization and should not be moved again soon. At this point, he said, fairness appears to be limited by the imbalance_pct threshold which keeps load balancing from happening when the imbalance appears to be too small; this is something to look at next.

Questions and comments

After Guittot concluded, Zijlstra said that he had a number of remarks, but that he would save them for email. The alternative, he said, would be to confuse everybody, including himself. There is another possibility that he thinks might be interesting.

Qais Youssef asked if the fairness issue was specific to long-running tasks. The periods where contention happens might be small, so might not appear with short-running tasks. That suggests that moving tasks around should not happen right away. Guittot agreed that the problem is easier to see with long-running tasks.

Zijlstra said he has seen fairness problems in high-performance computing workloads, where it is common to spawn a whole set of jobs, then wait for them all to complete. People running these workloads would like the jobs to complete at about the same time; they hate scheduling jitter. If one CPU is running some other, random task (an SSH server daemon, for example) that will slow its job over time. Users in this scenario would like to see these tasks spread across CPUs to maximize the fairness and increase throughput. Making this problem visible, he said, would require introducing some interference when running benchmarks.

Valentin Schneider noted that this discussion related to symmetric multiprocessing systems. But what about a big.LITTLE system? If you have a machine with four large CPUs and four small ones running eight CPU-hog tasks, four of those tasks will be stuck on the little CPUs. Should the scheduler rotate tasks around to increase fairness? That is a hard one, Guittot said, because there will be no perceived imbalance in the system. Youssef said that a "race to idle" approach might work better than complete fairness in such situations; the right solution is not always entirely clear.

At that point, the questions were done and the session came to a close.

Comments (1 posted)

Hibernation in the cloud

By Jonathan Corbet
May 25, 2020

OSPM

Hibernation is normally thought of as a laptop feature — and an old and obsolete laptop feature at that. One does not normally consider it to be relevant in cloud settings. But, at the 2020 Power Management and Scheduling in the Linux Kernel summit (OSPM), Andrea Righi argued that there may actually be a place for hibernation on cloud-based systems if it can be made to work reliably.

The core idea behind hibernation is powering the system down entirely, but restoring it to its previous state, including any running processes, when the system is powered up. To do that, the contents of memory, which will not survive loss of power, must be written to persistent storage before turning things off. The advantage of hibernation is that the system can retain its state indefinitely without power; the cost is a great deal of I/O at both hibernation and resume times.

Hibernation was a hot topic back in 2004, when it was usually known as "software suspend"; see the LWN kernel index entry for software suspend to understand just how hot. Work in this area slowed around 2008, though, when suspend-to-RAM functionality (often just called "suspend") became widely available. Support for hibernation was dropped entirely in Ubuntu 12.04. The Fedora 29 release included an experiment with suspend-then-hibernate functionality, but that "didn't go well" and was dropped. Hibernation mostly seems like a dead topic, he said.

So it is interesting that Amazon added support for hibernating EC2 instances at the end of 2018. Hibernation has suddenly arrived in the cloud, which is a rather different use case than has been seen before. The value there is to be able to pause a workload to save money. For example, Amazon's "spot instances" run at low priority when there are spare resources available; they can be shut down with ten minutes notice at any time. That is "not nice", but "you get what you pay for". This is a setting where hibernation can help; rather than just losing state when the instance is shut down, it can hibernate and resume working when resources are again available.

How it works

Hibernation works by writing memory contents to a "hibernation image" on disk; that image is somewhat smaller than the RAM in the system. Data can be compressed on its way to the image, and recoverable pages (clean, file-backed data in the page cache primarily) can simply be dropped. Rafael Wysocki added that hibernation was designed with the assumption that the bulk of user data will be swapped out at any given time; the amount that is left will be less than 50% of RAM. On the next boot, Righi continued, the kernel will look at the specified resume image; if a valid signature is found, the image will be restored back into memory. Then some tricky, architecture-specific code jumps back into the old kernel state and the system resumes where it left off.

The biggest issue with hibernation, he said, is whether it is reliable. That can't always be counted on; any device in the system can prevent hibernation if something isn't in the right state. That is not a huge problem for hibernation itself, since no data is lost, though it can be an issue if you are relying on it working. But any device can mess up the resume process as well, and that is a much bigger problem. It is also possible for the kernel to run out of memory while hibernating or resuming, which will kill the whole thing.

Beyond that, there are still bugs present in this code, despite its long history; Righi mention one that was fixed in late 2019. There are security implications, since the hibernation image holds sensitive data in persistent storage. Memory and disk speed can be a problem; he dealt with one customer who was reporting that hibernation was timing out; it turns out that they were running on an instance with slower storage, and that the timeout period had not been wisely chosen. It is also possible that the memory needing to be saved won't fit into the hibernation image.

Debugging hibernation problems is a special challenge in any setting, he said, and it can be worse in cloud settings, especially if you do not have access to the hypervisor.

Improving the reliability of hibernation depends a lot on better hardware support. Here cloud settings may have an advantage, because the "hardware" tends to be uniform regardless of where an instance is running. It can be helpful to reduce memory usage when the image is being stored, which argues against the use of stacked block devices; kernel code should avoid large allocations in the hibernation and resume paths.

Performance (as measured in hibernation time) can be improved by decreasing the size of the hibernation image; that can be done by tweaking /sys/power/image_size. A smaller size will cause more recoverable memory to be dropped, cutting down on the amount of I/O required at the cost of colder caches on resume. A larger image size has the opposite effect; hibernation takes longer, but the system will run faster after it resumes.

Then, there is the trick of running swapoff after resuming the system as a way of forcing all data from the swap area back into RAM. It can reduce the time required for the system to stop paging and get back to full speed. But using swapoff turns out to be slow because the swap code does not properly use readahead when bringing the data back into RAM. There is a fix for this problem in linux-next now. Wysocki said that the kernel could just do this swapping-in at resume time automatically, which would be a nicer solution to the problem.

For the future, Righi said, the kernel could perform opportunistic swapping during idle times; that would put more data into persistent storage and speed up hibernation. He has tried some hacks in this area; they work, but he would like a better solution. In conclusion, he said, hibernation can bring some real benefits for cloud-based systems, but the reliability issues need to be addressed first.

Rafael responds

Once Righi finished, Wysocki essentially took over with a talk of his own, saying that he wanted to respond to a few of Righi's points. He agreed that hibernation is not used much currently, but he uses it himself for desktop systems that don't have a battery. He has not seen a single failure since 2016. That said, the whole system was designed around the assumption that hibernation and resume would happen on the same machine; it's surprising that it works for cloud instances at all.

He acknowledged that there were plenty of nasty bugs in x86 hibernation support, but most of those were fixed in 2016. Support in the architecture code is solid, but there are still problems with some drivers. Most drivers support suspend these days, though, and the hibernation support generally derives from that, so device-level support for hibernation is broad. Most laptops he has tried work out of the box without problems, though he admitted he hasn't done huge amounts of testing.

He repeated that there could be problems where the resume happens on a different machine from where hibernation took place. Given the hardware emulation provided by the hypervisor, the system should be essentially the same, but he stressed that there are "no warranties".

The real problem with hibernation support is that it places huge stresses on the memory-management subsystem. It forces data out to swap by allocating all of the memory in the system, with the out-of-memory killer disabled. Strange things can happen when you do that but, from the point of view of the memory-management developers, it's an obscure corner case and they never find the time to improve it.

With regard to security, he said, if a cloud provider makes hibernation available, it's up to them to take care of encrypting the hibernation image and such. Attempts to add encryption support to the kernel have run afoul of "security people", who didn't like the duplication of functionality. Somehow, the key used to encrypt the image has to be passed to the resume kernel, which is not easy. So there are some challenges to face there.

On that note, the session wound down, and the 2020 edition of OSPM came to a close.

Comments (24 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>

Leading items

Language specification

HPy

Property-based testing

Mobile Python

And more

Built-in testing features

Table-driven tests

Mocks and interfaces

Testable examples

Benchmarking

The go test command

Summing up

Mobile-first considered harmful

Compromise needed

Imbalance detection

Fairness with difficult workloads

Questions and comments

How it works

Rafael responds

Built-in `testing` features

The `go test` command