Quality in open source: testing CRIU

July 20, 2016

This article was contributed by Sergey Bronnikov

Checkpoint/Restore In Userspace, or CRIU, is a software tool for Linux that allows freezing a running application (or part of it) and checkpointing it to disk as a collection of files. The files can then be used to restore and run the application from the point where it was frozen. The distinctive feature of the CRIU project is that it is mainly implemented in user space.

Back in 2012, when Andrew Morton accepted the first checkpoint/restore (C/R) patches to the Linux kernel, the idea to implement saving and restoring of running processes in user space seemed kind of crazy. Yet, four years later, not only is CRIU working, it has also attracted more and more attention. Before CRIU, there had been other attempts to implement checkpoint/restore in Linux (DMTCP, BLCR, OpenVZ, CKPT, and others), but none were merged into the mainline. Meanwhile CRIU survived, which attests to its viability. Some time ago, I implemented support for the Test Anything Protocol format into the CRIU test runner; creating that patch allowed me to better understand the nature of the CRIU testing process. Now I want to share this knowledge with LWN readers.

Things were simple in the beginning: three developers and a small feature set. As the project evolved, more developers joined and more features were added. The project's growth posed a number of testing problems that needed to be solved:

Make running tests easy enough so that any developer would be able to test changes they had made.
The number of combinations of features and configurations grew exponentially, so running tests manually started taking too much time. Test automation was in order.
To save the precious time of both developers and users, there was a need to cover as much CRIU functionality as possible with tests and to avoid regressions in new versions as well.
Make testing transparent and the test results public.
Code review became insufficient for accepting new changes. CRIU's maintainer wanted to get more details on patches before adding them. So testing proposed changes was needed.

The development of CRIU doesn't differ much from that of the Linux kernel. All patches are sent to the criu@openvz.org mailing list and get reviewed by CRIU developers to weed out bugs at the earliest stage. Reviewing used to be the only criterion for accepting patches, but that is not the case any more. So now, many more checks are done as well: compilation checks, automated test runs, code-coverage measurements, and static code analysis. All of that is performed with freely available tools, so the entire testing process is available to the community.

Patches are transferred from the mailing list to Patchwork, which automatically builds CRIU on all supported platforms (x86_64, ARM, AArch64, and PPC64le) to make sure the changes do not break the build. For this, CRIU uses Travis CI for x86_64 and qemu-user-static in a Docker container for the other architectures.

Good, working tests are vital for any project, no matter how complex it is. They let developers be sure their changes don't break anything, they give the maintainer a sense for how good the code is and how well it works, and they let users be sure that their use cases or configurations won't be broken in the next release. The more complex a project is, though, the higher the demand for testing is.

For functional regression tests, CRIU developers use the ZDTM (zero down-time migration) test suite that has been used to test the in-kernel implementation of C/R in OpenVZ. Each test from the suite is run separately and goes through three stages: environment preparation, daemonization and waiting for a signal to check the test's state, and a result check.

The tests are conventionally divided into two groups. The first group is static tests that prepare a certain static environment or state and wait for a signal. The second group is dynamic tests that constantly change their environment and/or state (e.g., transmit data via TCP). Back in 2012, CRIU's test suite included about 70 separate tests. These days, there are about 200. Functional tests are run on a schedule on a public Jenkins CI instance for each change added to the repository. The benefit is obvious as, according to the statistics gathered by the project, 10% of the changes break something.

Running the tests is as simple as running make and then make test, so anyone can test CRIU. However, the number of combinations of features and configurations is too large to do so manually. Besides, developers can sometimes be lazy when it comes to running tests regularly and might skip them even if the tests take only a minute.

The primary testing configuration is to launch the entire test suite on the host. After launch, each test puts itself into a particular state, its process is checkpointed, restored, and then checked for any changes in the state. Another important piece is to check that the process remains in working condition after it has been checkpointed. For this, each test needs to be run with checkpointing alone and the state must, once again, remain unchanged.

To make sure that the state remains unchanged after the restore, each test has a set of checks. For example, the test env00 checks that an environment variable has not changed. Sometimes the state of the restored process appears to remain unchanged and will pass the ZDTM tests, but it is unsuitable for another C/R. This gives us another testing configuration, repeated C/R, which will detect these kinds of problems. Then additional types of tests are run:

C/R with snapshots: CRIU saves a series of application states (all but the first are incremental) and later reverts to them. One example when snapshots might be useful is for debugging.
C/R in namespaces: C/R of applications running in namespaces (network, user, PID, IPC, mount, UTS)
Checkpoint with regular user privileges: Originally, CRIU required root privileges to perform a checkpoint operation, however, in CRIU 2.0, the ability to checkpoint as a regular user was added. Checkpoint with regular user privileges checks regressions for this mode.
C/R with backward compatibility. In this configuration, the test saves the current head, rolls back to the specified commit, compiles the CRIU binary, then executes a ZDTM test, and dumps its processes. Then the test checks out the current head, compiles the CRIU binary again, restores the tested processes, and checks the result.
Additional configurations with restore on BTRFS and NFS were added (due to the peculiarities of these filesystems).

And these are only the single-process tests. For group C/R you can also test the checkpointing of process groups where all processes are in the same state as prepared by a ZDTM test or where each has its own state.

But wait, there's more. CRIU currently supports several hardware architectures and also needs to test several kernels: the latest vanilla kernel, RHEL7 kernel based on 3.10, and the linux-next branch. Each test takes just 5 to 300 seconds, but considering all combinations of possible scenarios and configurations, the total time is quite impressive. Let's try to calculate it (approximately):

Currently there are 260 tests in ZDTM, each test has at least 100 run variants, each run taking 5 seconds on average.
The total number of configurations is 26,000 (260 x 100), and each takes five seconds to run, so it would take about 7 hours to run all variants of all tests.
The additional configurations, like snapshots, backward compatibility, namespaces, and so on, add 2-3 hours.
There is also group C/R when every process in the group has its own state and the test performs C/R for the entire group. It gives us about 2^200-1 combinations more.
Add to this different Linux kernels and hardware architectures ...

... and the total test time increases to infinity. Obviously, the project must then choose the highest priority configurations and tests to use for regular daily testing. Lower priority testing is done as time is available.

Kernels from the linux-next branch help discover and report changes that break the project before they make it into the mainline. In the course of developing CRIU, the developers have found roughly 20 bugs by testing with linux-next. Each test of linux-next must be run in a clean environment, so developers use a cloud service provider's API to create a virtual machine, install the kernel, and run tests. That ensures there won't be anything left over from previous tests.

Even though functional testing guarantees that features that have worked before will continue to do so, it doesn't help find new bugs. For this reason, fuzz tests were added. There are not as many fuzz tests as would be preferred, but it is a start. For example, the maps007 test creates random mappings and "touches" those memory regions. The mmap() system call uses four modes and 20 flags to create a new mapping in the virtual address space. Our test creates mappings with random parameters and makes sure that CRIU successfully performs C/R with this mapping.

Error-handling code paths are among the least covered by tests, so developers test the most critical of these paths with fault injection. The CRIU team couldn't find a suitable solution for such tests and had to write its own in the CRIU code. A number of CRIU tests are regularly run in the fault-injection mode.

Andrew Vagin, one of the CRIU developers, decided to try static code analysis along the way. He started with clang-analyzer and then moved on to Coverity, which is proprietary, but free to use for open-source projects. He expected static code analysis reports to have lots of false positives. However, it was just the opposite: the analyzers found bugs not discovered by the tests. Now, checking project code in Coverity is a must for each release.

Code coverage is typically measured to find parts of the code that are never tested and to understand how to test them—or at least why they are never reached by tests. For CRIU, developers did stumble upon parts of code that were never covered by tests, even though there were tests meant to exercise them (those discoveries were not pleasant at all). To measure code coverage for CRIU, developers use the standard gcov and lcov tools and also upload results to Coveralls to find out exactly which lines of code are covered.

Conclusion

The CRIU tests are quite easy to use and available for everyone. Moreover, the CRIU team has a continuous-integration system that consists of Patchwork and Jenkins, which run the required test configurations per-patch and per-commit. Patchwork also allows the team to track the status of patch sets to make the maintainer's work easier. The developers from the team always keep an eye on regressions. If a commit breaks a tree, the patches in question will not be accepted.

The testing regime targets finding bugs in CRIU as early in the process as possible. That leads to happier users, developers, and maintainers—and, of course, more stable code.

Index entries for this article
GuestArticles	Bronnikov, Sergey