How many kernel test frameworks?

By Jake Edge
June 5, 2019

The kernel self-test framework (kselftest) has been a part of the kernel for some time now; a relatively recent proposal for a kernel unit-testing framework, called KUnit, has left some wondering why both exist. In a lengthy discussion thread about KUnit, the justification for adding another testing framework to the kernel was debated. While there are different use cases for kselftest and KUnit, there was concern about fragmenting the kernel-testing landscape.

In early May, Brendan Higgins posted v2 of the KUnit patch set with an eye toward getting it into Linux 5.2. That was deemed a bit of an overaggressive schedule by Greg Kroah-Hartman and Shuah Khan given that the merge window would be opening a week later or so. But Khan did agree that the patches could come in via her kselftest tree. There were some technical objections to some of the patches, which is no surprise, but overall the patches were met with approval—and some Reviewed-by tags.

There were some sticking points, however. Several, including Kroah-Hartman and Logan Gunthorpe complained about the reliance on user-mode Linux (UML) to run the tests. Higgins said that he had "mostly fixed that". The KUnit tests will now run on any architecture, though the Python wrapper scripts are still expecting to run the tests in UML. He said that he should probably document that, which is something that he has subsequently done.

A more overarching concern was raised by Frank Rowand. From his understanding, using UML is meant to "avoid booting a kernel on real hardware or in a virtual machine", he said, but he does not really see that as anything other than "a matter of semantics"; running Linux via UML is simply a different form of virtualization. Furthermore:

It seems to me that KUnit is just another piece of infrastructure that I am going to have to be familiar with as a kernel developer. More overhead, more information to stuff into my tiny little brain.

I would guess that some developers will focus on just one of the two test environments (and some will focus on both), splitting the development resources instead of pooling them on a common infrastructure.

Khan replied that she sees kselftest and KUnit as complementary. Kselftest is "a collection of user-space tests with a few kernel test modules back-ending the tests in some cases", while KUnit provides a framework for in-kernel testing. Rowand was not particularly swayed by that argument, however. He sees that there is (or could be) an almost complete overlap between the two.

Unlike some other developers, Ted Ts'o actually finds the use of UML to be beneficial. He described some unit tests that are under development for ext4; they will test certain features of ext4 in isolation from any other part of the kernel, which is where he sees the value in KUnit. The framework provided with kselftest targets running tests from user space, which requires booting a real kernel, while KUnit is simpler and faster to use:

So this is why it's largely irrelevant to me that KUinit uses UML. In fact, it's a feature. We're not testing device drivers, or the scheduler, or anything else architecture-specific. UML is not about virtualization. What it's about in this context is allowing us to start running test code as quickly as possible. Booting KVM takes about 3-4 seconds, and this includes initializing virtio_scsi and other device drivers. If by using UML we can hold the amount of unnecessary kernel subsystem initialization down to the absolute minimum, and if it means that we can [communicate] to the test framework via a userspace "printf" from UML/KUnit code, as opposed to via a virtual serial port to KVM's virtual console, it all makes for lighter weight testing.

Frameworks

Part of the difference of opinion may hinge on the definition of "framework" to a certain extent. Ts'o stridently argued that kselftest is not providing an in-kernel testing framework, but Rowand just as vehemently disagreed with that. Rowand pointed to the use of kernel modules in kselftest and noted that those modules can be built into a UML kernel. Ts'o did not think that added up to a framework since "each of the in-kernel code has to create their own in-kernel test infrastructure". Rowand sees that differently: "The kselftest in-kernel tests follow a common pattern. As such, there is a framework." To Ts'o, that doesn't really equate to a framework, though perhaps the situation could change down the road:

So we may have different definitions of "framework". In my book, code reuse by "cut and paste" does not make a framework. Could they be rewritten to *use* a framework, whether it be KTF [Kernel Test Framework] or KUnit? Sure! But they are not using a framework *today*.

In addition, Ts'o said that kselftest expects to have a working user-space environment:

One major difference: kselftest requires a userspace environment; it starts systemd, requires a root file system from which you can load modules, etc. Kunit doesn't require a root file system; doesn't require that you start systemd; doesn't allow you to run arbitrary perl, python, bash, etc. scripts.

Rowand disagreed:

Kselftest in-kernel tests (which is the context here) can be configured as built in instead of as a module, and built in a UML kernel. The UML kernel can boot, running the in-kernel tests before UML attempts to invoke the init process.

No userspace environment needed. So exactly the same overhead as KUnit when invoked in that manner.

Ts'o is not convinced by that. He noted that the kselftest documentation is missing any mention of this kind of test. There are tests that run before init is started, but they aren't part of the kselftest framework:

There exists test modules in the kernel that run before the init scripts run --- but that's not strictly speaking part of kselftests, and do not have any kind of infrastructure. As noted, the kselftests_harness header file fundamentally assumes that you are running test code in userspace.

Overlaps

There may be overlaps in the functionality of KUnit and kselftest, however. Knut Omang, who is part of the Kernel Test Framework project—another unit-testing project for the kernel that is not upstream—pointed out that there are two types of tests that are being conflated a bit in the discussion. One is an isolated test of a particular subsystem that is meant to be run rapidly and repeatedly by developers of that subsystem. The other is meant to test interactions between more than one subsystem and might be run as part of a regression test suite or in a continuous-integration effort, though it would be used by developers as well. The unit tests being developed for ext4 would fall into the first category, while xfstests would fall into the latter.

Omang said that the two could potentially be combined into a single tool, with common configuration files, test reporting, and so on. That is what KTF is trying to do, he said. But Ts'o is skeptical that a single test framework is the way forward. There are already multiple frameworks out there, he said, including xfstests, blktests, kselftest, and so on. Omang also suggested that UML was still muddying the waters in terms of single-subsystem unit tests:

[...] the problem with using UML is that you still have to relate to the complexity of a kernel run time system, while what you really want for these types of tests is just to compile a couple of kernel source files in a normal user land context, to allow the use of Valgrind and other user space tools on the code. The challenge is to get the code compiled in such an environment as it usually relies on subtle kernel macros and definitions, which is why UML seems like such an attractive solution.

But Ts'o sees things differently:

"Just compiling a couple of kernel source files in a normal user land" is much harder than you think. It requires writing vast numbers of mocking functions --- for a file system I would have to simulate the block device layer, large portions of the VFS layer, the scheduler and the locking layer if I want to test locking bugs, etc., etc. In practice, UML itself is serving as [the] mocking layer, by its mere existence. So when Frank says that KUnit doesn't provide any mocking functions, I don't at all agree. Using KUnit and UML makes testing internal interfaces *far* simpler, especially if the comparison is "just compile some kernel source files as part of a userspace test program".

Gunthorpe saw some potential overlap as well. He made a distinction in test styles that was somewhat similar to Omang's. He noted that there are not many users of the kselftest_harness.h interface at this point, so it might make sense to look at unifying the areas that overlap sooner rather than later.

The second item, arguably, does have significant overlap with kselftest. Whether you are running short tests in a light weight UML environment or higher level tests in an heavier VM the two could be using the same framework for writing or defining in-kernel tests. It *may* also be valuable for some people to be able to run all the UML tests in the heavy VM environment along side other higher level tests.

Looking at the selftests tree in the repo, we already have similar items to what Kunit is adding as I described in point (2) above. kselftest_harness.h contains macros like EXPECT_* and ASSERT_* with very similar intentions to the new KUNIT_EXECPT_* and KUNIT_ASSERT_* macros.

Ts'o is not opposed to unifying the tests in whatever way makes sense, but said that kselftest_harness.h needs to be reworked before in-kernel tests can use it. Gunthorpe seemed to change his mind some when he replied that perhaps the amount of work to unify the two use cases was not worth it:

Using kunit for in-kernel tests and kselftest_harness for userspace tests seems like a sensible line to draw to me. Trying to unify kernel and userspace here sounds like it could be difficult so it's probably not worth forcing the issue unless someone wants to do some really fancy work to get it done.

Ultimately, what Rowand seems to be after is a better justification for KUnit and why it is, and needs to be, different from kselftest, in the patch series itself. "I was looking for a fuller, better explanation than was given in patch 0 of how KUnit provides something that is different than what kselftest provides for creating unit tests for kernel code." Higgins asked for specific suggestions on where the documentation of KUnit was lacking. Rowand replied that in-patch justification is what he, as a code reviewer, was looking for:

One thing that has become very apparent in the discussion of this patch series is that some people do not understand that kselftest includes in-kernel tests, not just userspace tests. As such, KUnit is an additional implementation of "the same feature". (One can debate exactly which in-kernel test features kselftest and KUnit provide, and how much overlap exists or does not exist. So don't take "the same feature" as my final opinion of how much overlap exists.) So that is a key element to be noted and explained.

But Gunthorpe did not agree; "in my opinion, Brendan has provided over and above the information required to justify Kunit's inclusion". The difference of opinion about whether kselftest provides any kind of in-kernel framework appears to be the crux of the standoff. Gunthorpe believes that the in-kernel kselftest code should probably be changed to use KUnit, once it gets merged, which he was strongly in favor of.

As the discussion was trailing off, Higgins posted v3 of the patch set on May 13, followed closely by an update to v4 a day later. Both addressed the technical comments on the v2 code and also added the documentation about running on architectures other than UML. There have been relatively few comments and no major complaints about those postings. One might guess that KUnit is on its way into the mainline, probably for 5.3.

Index entries for this article
Kernel	Development tools/Testing

How many kernel test frameworks?

Posted Jun 5, 2019 20:41 UTC (Wed) by logang (subscriber, #127618) [Link]

To add some context to my email that was originally quoted:

I am in favour of using UML, however when I tried to use KUnit I ran into a bunch of problems being able to compile my tests at all seeing the tree I wrote a test for wouldn't compile without PCI being selected and that could not be done in UML. I managed to work around it but I suspect there's going to be a lot of these problems in the future [1].

I think the consensus at the time was roughly that we'd need to add more mocking to UML to allow these subsystems to use it, not to stop using UML entirely.

Furthermore, my position regarding kselftests changed during the course of the discussion because it wasn't clear what kselftests actually provides or where the in-kernel tests were (they are in lib/test*). There's very little documentation for kselftests and they seem to cover a bunch of different cases. In contrast, documentation is one of the things KUnit has done very well.

Logan

[1] https://lore.kernel.org/lkml/6d9b3b21-1179-3a45-7545-30aa...

How many kernel test frameworks?

Posted Jun 6, 2019 7:56 UTC (Thu) by diconico07 (guest, #117416) [Link] (2 responses)

From my point of view as a kselftest user, the difference is in the purpose of the tools.
kselftest is meant to detect any API break and make sure not to break userspace, on the other hand KUnit test is meant for testing specific parts of the kernel and possibly parts that are not exposed to userspace or at least not directly.

Roughly speaking, for me it is the difference between unit tests and functional tests, and in most userspace-centric project these two uses different frameworks as they don't have the same needs, the only common thing is usually the output format.

And here again, in a project as big as the kernel the limit can be blurry as you might want to functionally test an entire subsystem that is not directly exposed to userspace.
And for this point there might be need for a third framework to keep things clear and avoid getting a bloated framework or unreadable/unmaintainable tests.

Something like:
- kselftest for userspace interface functional testing
- KUnit for kernel features unit testing
- ???? for in-kernel features functional testing

With the three sharing the same output format and the functional tests sharing the same way of writing scenarios seems like the most sane way to go.
With a well-defined structure you can make unit-tests mandatory for every patch set and functional-tests mandatory for inclusion in "main" tree. A set of tests like this is needed to build more trust in stable kernels.

How many kernel test frameworks?

Posted Jun 8, 2019 5:31 UTC (Sat) by marcH (subscriber, #57642) [Link]

> And here again, in a project as big as the kernel the limit can be blurry as ...

I rarely ever saw such a clear limit - in any project. Even with the best and clearest definitions there are always grey areas and overlaps somewhere in the middle. Not an exact science.

How many kernel test frameworks?

Posted Jun 9, 2019 19:19 UTC (Sun) by k3ninho (subscriber, #50375) [Link]

>it is the difference between unit tests and functional tests
Conventionally, unit tests *are* functional tests. Harnessing program logic in its own scope is unit testing; the tests themselves measure* the functionality. You're also mistaking the API conformity suite for being unit tests: they're integration tests simply becuase they ask "will these components play nicely together?"

> And for this point there might be need for a third framework to keep things clear and avoid getting a bloated framework or unreadable/unmaintainable tests.
Convention holds that the phrases you want to use are 'separation of concerns' for having tests appropriate to the layer of production functionality you want to measure* and 'single responsibility principle' for having production and test code do only one thing (hopefully well) -- and that single responsibility for the test code is to measure* the outcome of a single change in one layer of the system.

*: I've starred 'measure' each time I used it because I talk about testing in terms of taking measurements aimed to accept or reject a falsifiable hypothesis about the system. We talk about preparing the system, then making a single change, and measuring the impact. And we also talk about the layers of these tests: the 'testing pyramid' I prefer to use is one of a base of super-quick and super-numerous, whose output you trust when assessing whether the components will integrate properly as their interfaces work to explicit interface contracts, and then external interfaces (user and programmatic) becoming more expensive because they require more setup and levels to the stack to be representative of real-world use (which is balanced by the harness being lightweight because you're building on the trust of the lower levels of your testing); and finally, the smoke tests of "did we deploy it right?"

K3n.