Fault injection and unexpected requirement injection
Sripathi Kodi recently posted a patch adding certain types of futex failures to the fault injection framework. Ingo Molnar responded with a potentially surprising request:
This "unacceptably ugly" interface has existed as part of the fault injection framework since 2006, so it is a little surprising to hear, now, that it cannot be used. Ingo is firm about this point, though, and appears unwilling to back down.
Extending perf events for fault injection might be the right long-term solution. But this situation highlights a trap for developers which certainly acts to make participation in the development process harder. In his travels, your editor has heard complaints from developers who set out to accomplish a specific task, only to be told that they must undertake a much larger cleanup to get their code merged. The topic also came up at the 2009 kernel summit; there, the consensus seemed to be that this kind of request can quickly become unreasonable.
In this case, Sripathi has not been asked to fix the remainder of the fault
injection framework code. But adding a new functionality to the perf
events subsystem still likely goes rather beyond the scope of the original
project. Sripathi has not responded to this request, so it's not clear
whether we'll see a futex fault injection mechanism reworked to fit the new
requirements, or whether this code will just fade away.
Index entries for this article | |
---|---|
Kernel | Development model |
Posted Dec 3, 2009 2:15 UTC (Thu)
by jcm (subscriber, #18262)
[Link] (7 responses)
Posted Dec 3, 2009 8:32 UTC (Thu)
by mingo (guest, #31122)
[Link] (6 responses)
This "unacceptably ugly" interface has existed as part of the fault injection framework since 2006, so it is a little surprising to hear, now, that it cannot be used. Ingo is firm about this point, though, and appears unwilling to back down.
Please allow me to explain my position on this in more detail.
Fault-injection certainly looked fine back in 2006 in its then-limited incarnation - i remember having reviewed it, and i liked it, and i even tested it and reported bugs in it.
These particular patches to the futex subsystem look ugly and intrusive though. Futexes are performance sensitive and complex - which both raise the bar to extra instrumentation and extra complexity.
The fact that some debugging infrastructure exists in the kernel does not give automatic acceptance of its use in any arbitrary place in the kernel - quality judgements have to be done (and are being done every day) for specific debugging patches.
For example new tracepoints are not accepted into other subsystems automatically (Andrew Morton has repeatedly rejected new mm tracepoints), without the maintainers actively accepting them - and similar boundaries exists for other existing debugging infrastructure as well.
Another example are debugobjects: each new debugobject conversion has to be separately accepted and acked by the respective maintainer. A debugging facility can be fine for some subsystems - and less suitable for others.
A third example is lockdep coverage: there were cases in the past where patches were rejected by subsystem maintainers, on the grounds of intrusiveness and other cost/benefit arguments.
These kinds of rejections are always case by case and they happened several times for debug infrastructure i authored over the past decade - and i expect them to happen in the future too.
Why should fault-injection to be treated in a preferential way to tracepoints, lockdep, debugobjects and all the other debugging facilities that exist in the upstream kernel today?
Also note that while in the discussion i suggested perf events as a cleaner solution (and i like that solution for obvious reasons and i know it would help other areas of the kernel too) - others might exist as well. One thing is sure though, the cost/benefit ratio of this particular patch-set looked too high for me.
Thanks,
Ingo
Posted Dec 3, 2009 9:39 UTC (Thu)
by johill (subscriber, #25196)
[Link] (2 responses)
These particular patches to the futex subsystem look ugly and intrusive though. Futexes are performance sensitive and complex - which both raise the bar to extra instrumentation and extra complexity.
Umm?
For the benefit of other readers, the impact on the code is essentially this in a number of places:
Beauty may be in the eye of the beholder, but even though you could argue that some of the parameters could be hidden by macros I'd never call it "butt-ugly" like you did.
Also, what should such an injection framework look like? It pretty much needs to have hooks in place to be able to fail at those specific points, and it also needs to impact the flow control in the function to be able to drop locks -- the above was a simple version from a place that can just return.
I'll have to agree with Jon and Peter, you're making an entire unreasonable requirement, and it appears to me (and probably many others) to be because you want perf to rule the world, rather than solve get the actual functionality in. If you really cared only about style issues you could, after all, ask them to create wrapper macros or something so that the code looks neater.
Posted Dec 3, 2009 9:51 UTC (Thu)
by jcm (subscriber, #18262)
[Link]
Posted Dec 3, 2009 10:01 UTC (Thu)
by mingo (guest, #31122)
[Link]
For the benefit of other readers, the impact on the code is essentially this in a number of places:
If you read the patches (i have done that) in reality those hooks are not just added 'in a number of places' (i personally wouldn't argue much about 1-2 call-sites although FYI many other maintainers will argue even about 1-2 hooks), the patches add literally a dozen such fault injection call-sites, sprinkled all around kernel/futex.c.
That is too ugly to me. There has to be a higher pay-off for us to accept that kind of intrusion: comprehensive, well-thought-out instrumentation of futex.c. Not just for the limited purpose of fault injection but also for other instrumentation.
So, as i've outlined it in my original reply as well, i'm not opposed to this kind of instrumentation in principle, it simply has to be done better.
Thanks,
Ingo
Posted Dec 3, 2009 9:54 UTC (Thu)
by jcm (subscriber, #18262)
[Link] (2 responses)
Posted Dec 3, 2009 10:18 UTC (Thu)
by mingo (guest, #31122)
[Link] (1 responses)
btw, in agreeing with Jon, I wasn't attacking you personally Ingo.
Hey i didn't feel being attacked personally (or being attacked at all) :)
I was just happy to see someone bring up the issue of people posting patches being told they should go do some other major overhaul :)
Been there, done that ;-)
I don't have enough fingers (nor enough toes) to count the number of times Linus or some other maintainer rejected some patch of mine in the past, because it was not complete/clean enough yet. (Quite a few of those rejections were of the "not ever!" non-negotiable sort - which hurt far more than the kind of "not this way" rejection we are talking about here.)
Think about the positive side: a large majority of 'major overhauls' that were ever done in Linux happened due to someone saying 'no' first.
It wouldn't be so easy to add new hacks today if we weren't standing on the shoulders of giants who did all the past series of major overhauls and kept Linux so pleasantly hackable.
(In fact i bravely claim that we probably wouldn't have major overhauls at all if people weren't be saying 'no'. It's so much easier to go for the fast hack than to fix things for real.)
You are arguing for people to not insist on clean approaches and major overhauls when a new patch justifies such an overhaul - and i hope i'm not misrepresenting your point by paraphrasing it like that.
If that is your argument, what other mechanism should drive the overhauls in your opinion, if we eliminate/weaken the current scheme of using people's desire to implement new features? Or do you suggest we should let the kernel's general code quality degrade over time, in favour of getting more features done more quickly?
Posted Dec 3, 2009 21:41 UTC (Thu)
by dmag (guest, #17775)
[Link]
Patch refusal from an obstinate kernel developer has lead to a better overall architecture in tons of cases (i.e. TALPA, Linus rejecting a USB stack patch causing a full rewrite, PID virtualization discussion, etc.)
Even if this method "holds up" development of a feature occasionally, I think that's fine. Linux is in more danger of "flying apart a the seams" than "stalling due to lack of interest". And I'll bet a box of doughnuts that this rejection spurs someone (maybe not Sripathi) to "get it done" sooner rather than later.
Keep up the good work, Ingo!
Posted Dec 3, 2009 17:12 UTC (Thu)
by dvhart (guest, #19636)
[Link]
Fault injection and unexpected requirement injection
Does a debugging subsystem have the right to spread arbitrarily?
Does a debugging subsystem have the right to spread arbitrarily?
+ if (futex_should_fail(&fail_futex_efault, 1))
+ return -EFAULT;
Does a debugging subsystem have the right to spread arbitrarily?
Does a debugging subsystem have the right to spread arbitrarily?
Does a debugging subsystem have the right to spread arbitrarily?
Does a debugging subsystem have the right to spread arbitrarily?
I for one welcome our stringent kernel overlords.
Fault injection and unexpected requirement injection