Kernel quality control, or the lack thereof

Posted Dec 10, 2018 14:43 UTC (Mon) by xorbe (guest, #3165)
Parent article: Kernel quality control, or the lack thereof

They need code coverage metrics, not just "billions of operations over a period of days."

Kernel quality control, or the lack thereof

Posted Dec 10, 2018 18:42 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (20 responses)

Code coverage is table stakes in this game, if that. Even 100% code coverage won't guarantee that you exercised the relevant race conditions, patterns of data on mass storage, combinations of kernel-boot parameters and Kconfig options, and so forth. For but one example, 100% code coverage in a CONFIG_PREEMPT=n kernel would not find a bug due to inopportune preemption that could happen in a CONFIG_PREEMPT=y kernel.

Don't get me wrong, code coverage is fine as far as it goes, and it seems likely that the Linux kernel community would do well to do more of it, but it is not a panacea. In particular, beyond a certain point, it is probably not the best place to put your testing effort.

Kernel quality control, or the lack thereof

Posted Dec 10, 2018 19:59 UTC (Mon) by shemminger (subscriber, #5739) [Link] (13 responses)

My experience with code coverage metrics has been mostly negative.
Managers look at the % numbers and it causes developers to rearrange code to have a single return to maximize the numbers.

Kernel quality control, or the lack thereof

Posted Dec 10, 2018 20:21 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

I certainly have seen this. And it can be even worse, due to penalizing good assertions and discouraging debugging code. So oddly enough, one reason why coverage is not a panacea is exactly because some people believe that it is in fact a panacea! :-)

Kernel quality control, or the lack thereof

Posted Dec 11, 2018 5:51 UTC (Tue) by JdGordy (subscriber, #70103) [Link] (6 responses)

Meanwhile, MISRA coding standards require single returns... :/

Kernel quality control, or the lack thereof

Posted Dec 18, 2018 11:46 UTC (Tue) by error27 (subscriber, #8346) [Link] (5 responses)

Single return coding style introduces "forgot to set the error code" bugs. A "goto out;" might do a ton of things or it might not do anything so it is a mystery, but a "return -EINVAL;" is unambiguous.

Kernel quality control, or the lack thereof

Posted Dec 18, 2018 14:05 UTC (Tue) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

If you didn't know better, you might think that there is no magic bullet to take out all bugs. :-)

Kernel quality control, or the lack thereof

Posted Dec 19, 2018 13:43 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

I prefer `rm` for that job. Works really well. ;)

Kernel quality control, or the lack thereof

Posted Dec 19, 2018 16:04 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

In the old days, I would have agreed. But these days, "rm" can often be undone using "git checkout", or, in more extreme cases, "git clone". ;-)

Kernel quality control, or the lack thereof

Posted Dec 20, 2018 3:45 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

Isn't it a simple matter of writing a virus (or a worm .. or a worm with a virus) which hunts out all bugs on the Internet and removes them. I remember Doctor Who did that once to get rid of all the photos of himself, so it can't be too hard.

Kernel quality control, or the lack thereof

Posted Dec 20, 2018 4:37 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Dr. Who has to do that every few years as the actors change?

Kernel quality control, or the lack thereof

Posted Dec 11, 2018 7:37 UTC (Tue) by marcH (subscriber, #57642) [Link] (4 responses)

> Managers look at the % numbers and it causes developers to rearrange code to have a single return

Interesting, could you share a simplified example?

> My experience with code coverage metrics has been mostly negative.

While error-handling code, corner cases and... backup configurations are notoriously untested, I agree there are diminishing returns and better trade-offs past some point. Curious what is experts' guestimation of where that percentage typically is.

Kernel quality control, or the lack thereof

Posted Dec 11, 2018 14:11 UTC (Tue) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (3 responses)

I have seen 80% used with some decent results. But it really depends on the code and its user base. 100% of your commonly executed code really does need to be covered. But of course the more users your code has, the larger the fraction of your code is commonly executed.

If only (say) 30% of your code is tested, you very likely need to substantially increase your coverage. If (say) 90% of your code is tested, there is a good chance that there is some better use of your time than getting to 91%. But for any rule of thumb like these, there will be a great many exceptions, for example, the safety-critical code mentioned earlier.

Hey, you asked! :-)

Kernel quality control, or the lack thereof

Posted Dec 11, 2018 20:53 UTC (Tue) by marcH (subscriber, #57642) [Link]

> Hey, you asked! :-)

Sincere thanks!

Kernel quality control, or the lack thereof

Posted Jan 5, 2019 18:06 UTC (Sat) by joseph.h.garvin (guest, #64486) [Link] (1 responses)

I think you have things backwards. If there is a bug in commonly executed code it's going to be exposed even if there isn't a test. It's the infrequently executed code that tends to contain bugs (e.g. handling error conditions). Testing the frequently executed code still has value in that it can prevent problems from reaching customers, but bugs in frequently executed code will tend to be discovered very quickly. In a sense what the entire point of tests is is to make it so some code paths are more frequently executed.

Kernel quality control, or the lack thereof

Posted Jan 5, 2019 22:29 UTC (Sat) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Especially in my part of the Linux kernel, there is great value in preventing problems from reaching the -tip tree, let alone Linus's tree, let alone distributions, let alone customers. This great value stems from the fact that RCU bugs tend to be a bit difficult to reproduce and track down. It is therefore quite important to test the common cases.

Nevertheless, your last sentence is spot on. It is precisely because rcutorture forces rare code paths and rare race conditions to execute more frequently that the number of RCU bugs reaching customers is kept down to a dull roar.

Kernel quality control, or the lack thereof

Posted Dec 10, 2018 21:18 UTC (Mon) by NAR (subscriber, #1313) [Link] (5 responses)

But anything less than 100% coverage guarantees that some part of the code is not tested...

Kernel quality control, or the lack thereof

Posted Dec 10, 2018 21:50 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

And anything less than 100% race coverage similarly guarantees a hole in your testing. As does anything less than 100% configuration-combination coverage. As does anything less than 100% input coverage. As does anything less than 100% hardware-configuration testing. As does ...

For most types of software, at some point it becomes more important to test more races, more configurations, more input sequences, and more hardware configurations than to provide epsilon increase in coverage by triggering that next assertion. After all, testing and coverage is about reducing risk given the time and resources at hand. Therefore, over-emphasizing one form of testing (such as coverage) will actually increase overall risk due to the consequent neglect of some other form of testing.

Of course, there are some types of software where 100% coverage is reasonable, for example, certain types of safety-critical software. But in this case, you will be living under extremely strict coding standards so as to (among a great many other things) make 100% coverage affordable.

Kernel quality control, or the lack thereof

Posted Dec 24, 2018 20:42 UTC (Mon) by anton (subscriber, #25547) [Link] (3 responses)

I would expect that defensive coding practices that lead to unreachable code (and thus <100% coverage) are particularly widespread in safety-critical software. I.e., you cannot trigger this particular safety-net code, and you are pretty sure that it cannot be triggered, but not absolutely sure; or even if you are absolutely sure, you foresee that the safety net might become triggerable after maintenance. Will you remove the safety net to increase your coverage metric?

OTOH, how do you test your safety net? Remember that Ariane 5 was exploded by a safety net that was supposed (and proven) to never trigger.

Kernel quality control, or the lack thereof

Posted Dec 25, 2018 0:05 UTC (Tue) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

The safety-critical software that I know of (admittedly an obsolete and vanishingly small fraction of the total) limited "if" statements for exactly this reason.

But yes, Murphy will always be with us. So even in safety critical code, at the end of the day it is about reducing risk rather than completely eliminating it.

And to your point about Ariane 5's failed proof of correctness... Same issue as the classic failed proof of correctness for the binary search algorithm! Sadly, a proof of correctness cannot prove the assumptions on which it is based. So Murphy will always find a way, but it is nevertheless our job to thwart him. :-)

Kernel quality control, or the lack thereof

Posted Dec 30, 2018 11:22 UTC (Sun) by GoodMirek (guest, #101902) [Link] (1 responses)

There are safety nets triggered only in cases impossible to reach via software means, but possible to happen in case of HW failure.
I saw that multiple times while working on embedded systems.

E.g.:
explosiveness=255;
if explosiveness !=255 then assert;

In theory, it should never assert. In reality, it is desirable to minimize a risk that 'explosiveness' variable is stored in a failed memory cell, prior that cell is used for indication of explosiveness of any kind.

Or this case:
if <green> then
explosiveness=0;
else
explosiveness=255;
if (explosiveness!=0 and explosiveness!=255) then assert;

It is very rare to trigger and almost impossible to test such assertions, but when I saw them triggered in reality, even once in a lifetime, I appreciated their merit.

Kernel quality control, or the lack thereof

Posted Dec 30, 2018 15:44 UTC (Sun) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Assuming that timing considerations permitted it, one approach would be to run the code in a simulator that provided fault-injection for memory. That said, to your point, you would have to inject the fault rather carefully to trigger that particular set of assertions.

But if the point was in fact to warn about unreliable memory, mightn't this sort of fault injection nevertheless be quite useful?