The ORCs are coming
The state of the kernel's call stack can be surprisingly hard to interpret. Normally, it is made up of normal C function calls, but then assembly-language code, interrupts, processor traps, etc. tend to confuse the picture. A confusing stack can, naturally, cause the "unwinder" code that tries to derive the current call chain from that stack's contents to do strange things; as a result, the kernel has long eschewed any sort of complicated unwinding code. For the most part, developers who deal with kernel tracebacks have learned to cope with occasional bad data.
The live patching effort, though, depends on accurate call-stack information for its consistency model; in short, it needs to be able to tell which functions appear in the call stack of any thread in the system. Getting there involved implementing the compile-time stack validation mechanism to ensure that all kernel code keeps the stack in reasonable condition at all times. The final step is a proper unwinder that uses this now-reliable stack information.
Last May, an attempt to add such an unwinder based on the DWARF debugging records
emitted by the compiler ran into trouble
when Linus Torvalds saw it. He noted that, the last time this
experiment was tried, the unwinder ran into continual problems from changes
to assembly-language code or problems with incorrect DWARF records and, as
a result,
proved to be unmaintainable. Thus, he said: "I do not ever again
want to see fancy unwinders with complex state machine handling used by the
oopsing code.
" So DWARF, which requires that sort of complexity,
did not appear to be a good option.
That might have been the end of the story, given that Torvalds was firm in his position, but Josh Poimboeuf mentioned an idea he had been pondering for a bit. The objtool utility that performs stack validation at compile time builds a model of the state of the stack at every point in the built kernel. Perhaps, he thought, objtool could emit the debugging records to make that information available to the unwinder in a format rather simpler than DWARF. The result could be a more reliable unwinder using a more efficient data format that, importantly, is fully under the control of the kernel community and, one would hope, relatively unlikely to break.
Two months or so later, the result is the ORC unwinder. The name ostensibly stands for "oops rewind capability", though it's obviously a play on DWARF (which, in turn, is a play on the ELF executable format). The new ORC format is simple at its core; it is based on this structure:
struct orc_entry { s16 sp_offset; s16 bp_offset; unsigned sp_reg:4; unsigned bp_reg:4; unsigned type:2; };
The purpose of an orc_entry structure is to tell the unwinder code how to orient itself on the stack. There is one of these structures associated with each executable address in the kernel, along with a simple data structure allowing the unwinder to find the correct entry given a program-counter address.
The interpretation of the structure depends on the type field. If it is ORC_TYPE_CALL, the code is running within a normal C-style call frame, and the beginning of that frame can be found by adding the sp_offset value to the value found in the register indicated by sp_reg. If, instead, type is ORC_TYPE_REGS, then that sum points to a pt_regs structure describing the processor (and stack) state prior to a system call. Finally, ORC_TYPE_REGS_IRET says that sp_reg and sp_offset can be used to find a return frame for a hardware interrupt. Those three possibilities appear to be enough to describe any situation that will be encountered, at least on the x86 architecture. (The bp_reg and bp_offset fields don't appear to have much use in the current implementation).
The resulting mechanism is far simpler than the DWARF mechanism. Among other things, that means it's quite a bit faster — a factor of at least 20x is claimed. Unwinding performance may not matter much when responding to a kernel oops, but it's a big deal for function tracing and profiling. The ORC approach is also claimed to be more reliable than telling the compiler to use frame pointers, and it doesn't suffer from the significant performance hit that frame pointers bring with them. And, as noted above, the ORC format is entirely under the control of the kernel community, so it shouldn't break with new compiler versions and, if it does, kernel developers can fix it.
Of course, it's hard to predict just how creative the compiler developers of the future may be when it comes to breaking call-stack information. Poimboeuf acknowledges that risk in the patch posting, but notes that:
The other disadvantage is that the ORC format takes more space than DWARF,
occupying 1MB or so of extra memory. Poimboeuf suggested that the memory use could be reduced
if it turns out to be a real problem. "However, it will probably
require sacrificing some combination of speed and simplicity
".
Torvalds has not yet made his feelings known regarding the ORC patches,
though he had in the past indicated that he
thought the combination of objtool and a simpler format might
work. Ingo Molnar, meanwhile, has applied the
patches to the tip tree, indicating that they are likely to show up in
a 4.14 pull request. So, barring last-minute problems, the multi-year
effort to get a reliable stack unwinder in the kernel may be close to
completion.
Index entries for this article | |
---|---|
Kernel | Stack unwinder |
Posted Jul 20, 2017 19:53 UTC (Thu)
by jhoblitt (subscriber, #77733)
[Link] (4 responses)
Posted Jul 20, 2017 19:57 UTC (Thu)
by corbet (editor, #1)
[Link] (3 responses)
Posted Jul 20, 2017 20:58 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (2 responses)
Posted Jul 20, 2017 21:06 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted Jul 22, 2017 2:02 UTC (Sat)
by alkbyby (subscriber, #61687)
[Link]
Posted Aug 1, 2017 5:36 UTC (Tue)
by alison (subscriber, #63752)
[Link] (4 responses)
Posted Aug 1, 2017 12:24 UTC (Tue)
by corbet (editor, #1)
[Link] (3 responses)
Posted Aug 1, 2017 13:38 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Aug 3, 2017 9:50 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Sep 14, 2017 3:54 UTC (Thu)
by ajdlinux (subscriber, #82125)
[Link]
From what I can tell of the initially-x86-only features that do get ported to other architectures, arm/arm64 gets them first, powerpc can be (an often distant) second, and everything else may well be never...
Posted Aug 1, 2017 14:32 UTC (Tue)
by vbabka (subscriber, #91706)
[Link]
Posted Aug 6, 2017 15:08 UTC (Sun)
by vineetg (subscriber, #85161)
[Link]
Posted Sep 7, 2017 19:46 UTC (Thu)
by vomlehn (guest, #45588)
[Link]
Posted Sep 14, 2017 7:09 UTC (Thu)
by johill (subscriber, #25196)
[Link]
What happens if we crash in eBPF JIT'ed code? Clearly there cannot be any ORC annotation for that? I'm not sure the JIT ever emits stack usage though.
The ORCs are coming
ORC will track neither of those things; it just provides the information needed to make sense of the kernel stack. The kallsyms mechanism can associate symbols with addresses, as always.
The ORCs are coming
The ORCs are coming
That's a question that came up in the conversation; I didn't manage to work it into the article, sorry. There is definitely interest in doing that, and it seems possible, but nobody is working on it at the moment.
User space
User space
Is objtool for x86_64 only?
Lots of kernel features show up on x86 first; it doesn't usually take all that long for the interesting ones to spread to the other architectures. I don't know of anybody working on ARMing the ORCs at the moment, but it would not surprise me if it happened fairly soon once the x86 stuff lands.
Is objtool for x86_64 only?
Is objtool for x86_64 only?
Is objtool for x86_64 only?
Wol
Is objtool for x86_64 only?
The ORCs are coming
The ORCs are coming
ARMing ORCs
The ORCs are coming