The part I found most impressive was when he identified that something was crashing in the first iteration of a loop by looking at the register contents and assembly decode from the oops report. Where other people would have needed to look at memory in a debugger, he could just look at registers and infer where the bad value was coming from.
Posted Apr 14, 2010 4:11 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246)
[Link]
Actually, wasn't that Boris that did the initial assembly dump interpretation?
It's actually not so hard once you've done it a couple times. I've had to chase down bugs in our C compiler at work (ah, the joys of running internal alpha builds!).
The case of the overly anonymous anon_vma
Posted Apr 14, 2010 4:44 UTC (Wed) by iabervon (subscriber, #722)
[Link]
The message I'm thinking of is <alpine.LFD.2.00.1004061220270.3487@i5.linux-foundation.org> from Apr 6 at 15:35; Linus looks at Steinar Gunderson's disassembly, where %rax is a kernel pointer and %rbx is not %rax+20 like it would be after running the loop, implying that "anon_vma->head.next" is NULL, not some other anon_vma_chain entry. Boris had worked out that there was some problem in the list previously, but Linus identified that the pointer from the anon_vma was bad, rather than the list further down being corrupt. This turned out to be important to the actual bug, which had to do with the anon_vma associated with the page being gone (and its memory reused) rather than some other anon_vma in the vma's chain being gone or something messing up the list. Boris picked up the stuff you'd get from a debugger that had registers but not core; Linus picked up an important detail that one would only normally get from a debugger by inspecting memory.
The case of the overly anonymous anon_vma
Posted Apr 14, 2010 4:57 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246)
[Link]
Ah, ok. I hadn't seen that email. Makes sense.
The case of the overly anonymous anon_vma
Posted Apr 15, 2010 20:09 UTC (Thu) by Felix (subscriber, #36445)
[Link]
(it has a slightly different message id+time but seems to fit)
The case of the overly anonymous anon_vma
Posted Apr 15, 2010 20:17 UTC (Thu) by iabervon (subscriber, #722)
[Link]
Actually, that's a similar message, but not the one I was thinking of. In that one, he says "So again, I can show that..." I was looking at the first time, and this is the second time. The one I'm thinking of is http://groups.google.com/group/linux.kernel/msg/f9c7ca848... and has a more complete explanation of the middle steps.