User: Password:
Subscribe / Log in / New account

Injecting faults into the kernel

Injecting faults into the kernel

Posted Nov 16, 2006 8:21 UTC (Thu) by simlo (guest, #10866)
Parent article: Injecting faults into the kernel

I dislike these kind of random tests. You hit a rare error but it can be very hard to debug because the triggering is random and coupled to timing issues which can be very hard to reproduce. I also dislike this because it makes changes to the actual running kernel. It is not a black box test.

The solution I prefer is unit testing. You take your subsystem, isolate it by stubbing all calls and make a test suite to run in user space. This test program should explicitly handle all the border cases. You can use a coverage scope to see you hit a high fraction of the code with your test. And it should do it deterministic.

This gives some huge benifites to development:
You get a must faster development cycle because you don't have to recompile and run boot the kernel for each change, only your subsystem and test program. You can easily debug it in gdb.
To do this you have to have loosely coupled subsystems. Therefore, when you start forcing yourself to work this way, you automatically get a better architecture.
You feel safe about changing a system because you know a lot of the bugs you might introduce will be caught by the test suite. Thus you avoid "coding in fear" which always produce bad code.

So here it is my suggestion:
Make a test directory in the kernel source. Put all kind of unit test suites in there. All kernel patches should parse all tests. A patch to the kernel also contains changes to the tests as they are developed along with the kernel code.

I made such a "TestRTMutex" to code on the rt-mutex. It worked really well. I could do at least some SMP coding without having actual SMP hardware. Unfortunately, it isn't maintained along with the kernel code and is thus not useable now.

At work I decided to do this unit-testing on a project. My boss was a bit worried why it took so long to make the code, but when I merged it into our application there was almost no errors, because almost every line of code was tested in detail.

If unit tests were to get established within the kernel, you will see the number of "oops, that was a mistake" released get much, much smaller. You have tests for many error paths in the code. You still need an integration test of course, but there is no need for injecting faults into a running kernel. That is much better done in unit tests in user-space.

(Log in to post comments)

Injecting faults into the kernel

Posted Nov 16, 2006 10:51 UTC (Thu) by mokki (subscriber, #33200) [Link]

I do agree that unit tests would be the way to go in an ideal world.

But in the real world there will always be code in the kernel that is not fully covered by unit tests (and even 100% coverage does not guarantee anything).

What this fault injection provides is a way to third parties to test the whole system or partial system failures independently. I think such a feature can only be helpful and does not prevent in any way applying of any other testing methods.

Injecting faults into the kernel vs unit testing

Posted Nov 17, 2006 23:19 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I think the main value of this fault injection over unit testing is cost. It takes a significant amount of time and boredom to write scaffolding for a module of the kernel, but a complete scaffold already exists -- the rest of the real kernel. All it lacks is the controls to manipulate all the inputs and outputs to get a full test, so fault injection adds a faint whisper of those.

I too agree that unit testing (and modular programming in general) gives a better result. But I understand why people find it not worth the cost.

Injecting faults into the kernel

Posted Dec 14, 2006 16:52 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

The really cool thing about fault injection is that it can make errors happen more quickly. In one example some years back, a race-condition bug that was taking about 24 hours to reproduce under heavy load was flushed out in under 10 minutes using fault-injection code. Think about this a bit. How long would you have to test the original system to be 99.99% confident that you had in fact fixed the bug? Now, how about the fault-injected system?

Let's just say that your users will likely be a lot happier with you if you are using fault injection.

That said, I also really like unit tests as well, kernel/rcutorture.c being a case in point.

The fact is that we need both fault injection and unit tests.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds