A kernel unit-testing framework

Posted Mar 2, 2019 8:26 UTC (Sat) by gps (subscriber, #45638)
Parent article: A kernel unit-testing framework

Please, for the love of software, YES! 😁
I preached the need for user space unittesting without booting a system to kernel teams last decade. Too much community fear. No takers. I moved on.

Accept this concept. Get it in. Reject all future proposed kernel patches without small test coverage for their logic. Build meaningful frameworks within it for mock devices, mock and fuzzy event sequences, galore. Bug fixes could come with regression tests! *Gasp*

Leave the basement. Become a real world software project. Be proud of your work instead of lying to everybody that you just don't write bugs despite repeated evidence to the contrary as documented in CVEs. Write tests, you'll earn respect!

A kernel unit-testing framework

Posted Mar 4, 2019 1:12 UTC (Mon) by shemminger (subscriber, #5739) [Link] (18 responses)

Last I checked most of the kernel was device drivers and related infrastrucure. How would a generic unit test work for that?

A kernel unit-testing framework

Posted Mar 4, 2019 2:51 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Like most other external-facing code under unit tests - by mocking the device behavior.

This is admittedly less useful than testing with physical devices, but it still is useful.

A kernel unit-testing framework

Posted Mar 4, 2019 4:51 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

To add, I actually would LOVE to see a device behavior emulator that can allow to create a programmatic model of a device. With DMA, timed interrupts and so on. In particular to test the error recovery code that right now might never even get run.

A kernel unit-testing framework

Posted Mar 4, 2019 12:46 UTC (Mon) by geert (subscriber, #98403) [Link] (1 responses)

Testing DMA recovery is easy: hack the DMAC to let dmaengine_prep_slave_*() fail every Nth invocation.

Perhaps we need a kernel commandline option to enable that at the generic dmaengine level?

A kernel unit-testing framework

Posted Mar 4, 2019 20:11 UTC (Mon) by dezgeg (subscriber, #92243) [Link]

There is already standard infrastructure for fault injection which could be used: https://lwn.net/Articles/209257/

I once had a similar idea for injecting failures into USB transmissions (inspired a kernel crash in the USB hub code which would occur if he device was unplugged at a precise moment) but sadly, didn't implement it.

A kernel unit-testing framework

Posted Mar 4, 2019 17:04 UTC (Mon) by gps (subscriber, #45638) [Link]

Indeed. It also allows for people who don't have the hardware in question make meaningful changes with higher confidence.

A kernel unit-testing framework

Posted Mar 4, 2019 19:48 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

Do it in QEMU.

A kernel unit-testing framework

Posted Mar 5, 2019 14:14 UTC (Tue) by pm215 (subscriber, #98099) [Link] (2 responses)

This is essentially asking for double the work to be done for every driver. (My rough rule of thumb is that a device model is about the same amount of work as writing a driver -- assuming you have the specs for the device at all...) It also risks ending up with a QEMU model and a Linux driver that have the inverse of each others' bugs, neatly cancelling out. (I have actually seen this with a PCI controller driver for an Arm devboard -- the kernel code didn't actually work on the real hardware for more than one PCI card, but everybody was testing against QEMU...)

A kernel unit-testing framework

Posted Mar 5, 2019 19:08 UTC (Tue) by roc (subscriber, #30627) [Link] (1 responses)

Spending double the development effort to have reasonable (not perfect) automated tests isn't outrageous. It's in the right ballpark for projects I've worked on like Firefox and rr. Under the right conditions that spend pays for itself pretty easily.

The "right conditions" include the software living long enough for tests written today to pay off in the future, and bugs in deployed releases being costly because you have a lot of users or your software does important things or bugs found in the field are difficult to debug remotely.

Ironically the work I'm doing on improving debugging makes writing good tests slightly less important!

Tests are never perfect. I can see that device models diverging from hardware would be a problem. But it also seems to me that you could engineer around some of the problems, e.g. have a testing framework that *by default* tests for multiple instances of each hardware element, hotplugging of each hardware element, randomization of interrupt delays, etc.

A kernel unit-testing framework

Posted Mar 9, 2019 1:14 UTC (Sat) by nix (subscriber, #2304) [Link]

Spending double the development effort to have reasonable (not perfect) automated tests isn't outrageous. It's in the right ballpark for projects I've worked on like Firefox and rr. Under the right conditions that spend pays for itself pretty easily.

In glibc, which is very much following the 'everything should have tests dammit' policy (and long has), the tradeoff is sometimes much higher: it can easily take five times longer to write a decent test for some bugfixes than to fix the bug, even (sometimes especially!) when the bug is a real monster to find.

Linux would probably have terrible threading despite NPTL if Uli hadn't written a massive heap of tests for NPTL at the same time to make sure that the damn thing actually worked and did not regress. More than one bug I've looked at in the past which came down to one missed assembler instruction that triggered problems only in ludicrously obscure slowpath cases was tickled by one or more of those tests...

A kernel unit-testing framework

Posted Mar 4, 2019 19:23 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (4 responses)

That essentially amounts to writing an emulated version of the hardware. But it would be an enormous undertaking to do this in such an accurate way that e.g. wrong clock setups would be detected. The device model would probably be bigger than the driver—and given that it would be effectively giving away a lot of the firmware, it's probably not the kind of code that producers want or are able to release.

A kernel unit-testing framework

Posted Mar 4, 2019 19:32 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Some devices will probably always be impractical, like GPUs. These days they are separate computers in their own right and probably should be emulated with something like qemu.

However lots of other devices most definitely can be simulated.

Simulation also doesn't have to be perfect to be useful. Just something good enough will suffice.

A kernel unit-testing framework

Posted Mar 4, 2019 21:13 UTC (Mon) by pizza (subscriber, #46) [Link]

> Simulation also doesn't have to be perfect to be useful. Just something good enough will suffice.

"All models are wrong. Some are useful."

But the "useful" threshold varies widely depending on what you're trying to accomplish.

The majority of my headaches with respect to device drivers was due to the hardware not working the way it was documented. Spurious interrupts, clock phasing, interactions/collisions with shared resources, subtle sleep/wake sequencing issues, sensitivity to environmental conditions (eg clocks drifting between -40C -> 85C), dma peripheral quirks, workarounds for things fixed in later revisions.. and so forth.

Granted, I also had plenty of headaches due to the upper stacks not behaving as specified either. Or perhaps I should say significantly underspecified -- While it was actually well-tested (unit and system-level) the tests were written from the same incorrect/incomplete set of spherical-cow specifications/assumptions.

A kernel unit-testing framework

Posted Mar 4, 2019 22:51 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (1 responses)

WiFi would also be all but impractical.

USB would only be practical if you also emulated all the broken devices around to ensure they don't regress. Same for the "generic" drivers like AHCI or SDHCI.

Some mock devices are there, for example scsi_debug, the tests are in user space rather than kernel.

A kernel unit-testing framework

Posted Mar 14, 2019 3:22 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

WiFi is actually quite practical, its control protocols are quite complicated and are a prime target for mock-based unit tests. Integration tests are indeed very complicated, though.

A kernel unit-testing framework

Posted Mar 5, 2019 15:49 UTC (Tue) by shemminger (subscriber, #5739) [Link]

I was hoping for more exhaustive test suites for common device API's. There are some pieces for networking, but being able to hit more complicated cases would help.

A kernel unit-testing framework

Posted Mar 5, 2019 8:58 UTC (Tue) by knuto (subscriber, #96401) [Link]

This is why we take a more pragmatic approach to unit testing with KTF (Kernel Test Framework) (https://github.com/oracle/ktf, https://lwn.net/Articles/735034/) which we created to serve our needs to be able to test the semantics of kernel level APIs against drivers/hardware.

We're in the process of making patch set of KTF suitable for inclusion in the kernel, while keeping the features we have in place to have one test code base applied to multiple kernel versions, an important need for anyone trying to maintain stable kernels for production use in addition to bleeding edge.

A kernel unit-testing framework

Posted Mar 8, 2019 22:17 UTC (Fri) by flussence (guest, #85566) [Link] (1 responses)

There are plenty of virtual device drivers out there that could use the testing. They could serve as useful starting points for real hardware.

Filesystems don't need specific hardware either. Wouldn't it be nice if Btrfs had RAID5/6 regression tests?

A kernel unit-testing framework

Posted Mar 8, 2019 23:25 UTC (Fri) by dezgeg (subscriber, #92243) [Link]

Filesystem testing is pretty comprehensively done in xfstests. E.g. btrfs does have some sort of RAID regression tests: https://github.com/kdave/xfstests/blob/6ab53c6c6825c16d8a...