September 16, 2009
This article was contributed by Jon Ashburn
Modern processors support hardware breakpoint or watchpoint debugging
functionality, but the Linux kernel does not provided a way for debuggers,
such as kgdb or gdb, to access these breakpoint registers
in a shared manner. Thus, debuggers running concurrently can easily
collide in their use of these registers, causing the debuggers to act in
a strange and confusing manner. For example, continuing execution through a
breakpoint, rather than breaking, would certainly confuse a
programmer.
This issue is being addressed by a proposed kernel API called
hw-breakpoint (alternatively hw_breakpoint). The hw-breakpoint
functionality, developed in a series of patches by K. Prasad, Frederic
Weisbecker, and Alan Stern, aims to provide a consistent, portable, and
robust method for multiple programs to access special hardware debug
registers. These registers are useful for any application that requires
the ability to observe memory data accesses, or trigger the collection of
program information based on data accesses. Such applications include
debugging, tracing, and performance monitoring. While these patches
initially target the x86, they attempt to provide a generic API that can be
supported in an architecture independent manner on various processors.
Although the details are still being ironed out, with hw-breakpoint
hardware debug resources can be concurrently available to various users in
a more portable manner.
The most common debugging scenarios that would use the hw-breakpoint
patches are memory corruption bugs. Programming mistakes such as bad
pointers, buffer overruns, and improper memory allocation/deallocation can
lead to memory corruption where valid data is accidentally
overwritten. These bugs can be hard to find; the corruption can occur
anywhere in the program. The error resulting from the corruption often occurs
long after the corruption. These bugs cannot typically
be found by focusing on the local sections of code that explicitly access
the corrupted data. Instead, debugger watchpoints, which are a special type
of breakpoint, are the first choice for debugging memory corruption
problems.
Debugger breakpoints halt program execution at a given address and
transfer control to the debugger. This allows the program state (variables,
memory, and registers) to be examined. When programmers talk of breakpoints
they usually are referring to software breakpoints. For example, in
gdb the break command sets a software breakpoint at the
specified instruction address. The break command replaces the
specified instruction with a trap instruction that, when executed, passes
control to gdb.
In contrast, watchpoints are best implemented using hardware
breakpoints; software implementations of watchpoints are extremely slow.
But, hardware breakpoints require special debug registers in the processor.
These debug registers continuously monitor memory addresses generated by
the processor, and a trap handler is invoked if the address in the
register matches the address generated by the processor.
Memory accesses can be for data read, data write, or instruction execute
(fetch), so hardware breakpoints usually support trapping on
not only the address, but also the type of access: read,
write, read/write, or execute. Hardware debug registers may also support
trapping on IO port accesses in addition to memory accesses. In either
case, a watchpoint is a trap on any type of data access rather than just an
instruction execute access. Since memory corruption can happen anywhere in
the program, a watchpoint set to trap on writes to the corrupted
variable/location can be a good way to catch these bugs in the act.
These hardware debug registers are limited resources: Intel x86
processors support up to four hardware breakpoints/watchpoints using the
special purpose DR0 to DR7 registers. Registers DR0 to DR3 can be
programmed with the virtual memory address of the desired hardware
breakpoint or watchpoint. DR4 and DR5 are reserved for processor use. DR6
is a status register that gives information about the last breakpoint hit,
such as the register number of the breakpoint, and DR7 is the breakpoint
control register. DR7 includes controls such as, local and global enables,
memory access type, and memory access length. However, as with any limited
hardware resource, multiple software users must contend for access of these
registers.
Since existing released kernels do not control or arbitrate
access to these registers, software users can unknowingly clash in
their usage, which usually will result in a software error or
crash. Hw-breakpoint solves this problem by arbitrating the access to these
limited hardware registers from both user-space and kernel-space software.
User-space access, such as from gdb, is done via the
ptrace() system call. Kernel-space access includes kgdb
and KVM (only during context switches between host and guests).
Hw-breakpoint arbitration keeps kernel and/or user space debuggers from
stepping on each others' toes .
Additional kernel patches have been developed to take advantage of the
hw-breakpoint API. A plug-in for ftrace (ftrace has previously been
discussed in LWN articles here and here) has been developed to
dynamically trace any kernel global symbol. This functionality, called
ksym_tracer, allows all read and write accesses on a kernel variable to be
displayed in debugfs. Since it uses the hw-breakpoint API, it relies on
underlying hardware breakpoint support. This new feature of ftrace could
be very useful for memory corruption bugs that are difficult to catch with
watchpoints. These difficulties include such things as: 1) an erroneous
write that is lurking beneath a large quantity of valid writes, 2) the
necessity to setup a remote machine to run Kgdb, and 3) kernel
bugs which no longer manifest themselves when the machine is halted via
breakpoints. Hw-breakpoint allows the concurrent use of both ksym_tracer
and debugger watchpoints without the risk of hardware debug register
corruption.
In addition to ftrace, perfcounters (see LWN articles here and here) can be enhanced through
the generic hw-breakpoint functionality. Specifically, counters can be
updated based on data accesses rather than instruction execution. A patch
to perfcounters has been developed to use kernel-space hardware breakpoints
to monitor performance events associated with data accesses. For example,
spinlock accesses can be counted by monitoring the spinlock flag itself.
Currently this patch is rather limited in supporting the definition and use
of breakpoint counters. However, additional features are planned.
Since the additions to ftrace and perfcounter patches, the hw-breakpoint
API can now be potentially used by several pieces of code: kgdb,
KVM, ptrace, ftrace, and perfcounters. This increased potential
usage has resulted in increased scrutiny of the API by various developers:
hw-breakpoint is no longer solely of concern to debugger developers. This
increased scrutiny has resulted in major changes to the hw-breakpoint code
that are still ongoing. In particular, the coupling of perfcounters to
hw-breakpoint has caused the rethinking of a significant chunk of the
original hw-breakpoint functionality and structure.
The original (pre-perfcounter support) hw-breakpoint functionality was
primarily developed by K. Prasad. It supported global, system-wide
kernel-space breakpoints and per-thread user-space breakpoints. Whereas
user-space breakpoints were only enabled during thread execution, kernel
breakpoints were always present on all CPUs in the system. Additionally,
no reservation policy was implemented. Requests for hardware debug
registers were granted on a first-come, first-serve basis. Once all
physical debug registers were used, hw-breakpoint returned an error for
further breakpoint requests.
This original hw-breakpoint implementation is "an
utter mis-match" to support perfcounter functionality for three
reasons, as pointed out
by Peter Zijlstra. First, counters (either user or kernel-space) can be
defined per-cpu or per-task; this conflicts with hw-breakpoint's
system-wide kernel breakpoints. Second, per-task counters are scheduled by
perfcounter to save unnecessary context swaps of the underlying hardware
resources when it is not necessary. Third, counters can be multiplexed, in
a time-sliced fashion, beyond the underlying hardware PMUs (performance
monitoring unit) resource limit, which for x86 hardware breakpoints is
four. These incongruities between perfcounter and hw-breakpoint led to a
debate about any coupling between hw-breakpoint and perfcounter. However,
a consensus formed that integrating hw-breakpoint into perfcounter's PMU
reservation and scheduling infastructure would be beneficial given
perfcounters richer support for scheduling, reservation, and management of
hardware resources. About these benefits Frederic Weisbecker writes:
And in the end we have a pmu (which unifies the control of
this profiling unit through a well established and known object for
perfcounter) controlled by a high level API that could also benefit to
other debugging subsystems.
Newly posted in the last week is Weisbecker's patch to
integrate hw-breakpoint and perfcounter code. Conceptually, this splits
the hw-breakpoint functionality into two halves: 1) the top level API, and
2) the low level debug register control. In between these halves
lies the perfcounter functionality. With this patch each breakpoint is a
specific perfcounter instance called a breakpoint counter. Perfcounter
handles register scheduling, and thread/CPU attachment of these breakpoint
counter instances. The modified hw-breakpoint API still handles requests
from ptrace(), ftrace, and kgdb for breakpoints by
creating a breakpoint counter. Breakpoint counters can also be created
directly from the existing perfcounter system call
(perf_counter_open()). The breakpoint counter layer interacts
with the low-level, architecture specific hw-breakpoint code that handles
reading and writing the processor's debug registers.
Unfortunately, because of the very recent integration into
perfcounters, the hw-breakpoint API has changed and additional changes to
the API are planned. Rather than cover in detail the existing API, since it
appears likely to change, I will give a summary of it. Two Function calls
are provided to set a new hardware breakpoint.
int register_user_hw_breakpoint(struct task_struct *tsk, struct hw_breakpoint *bp);
int register_kernel_hw_breakpoint(struct hw_breakpoint *bp, int cpu);
where:
cpu is the cpu number to set the breakpoint on;
*tsk is a pointer to 'task_struct' of the process to which the address belongs;
*bp is a pointer to the breakpoint property information which includes:
1) a pointer to function handler to be invoke upon hitting the breakpoint;
2) a pointer to architecture dependent data (struct arch_hw_breakpoint).
The
struct arch_hw_breakpoint provides breakpoint properties such
as the memory address of the breakpoint, type of memory access
(read/write, read, or write), and the length of memory access (byte,
short, word, ...). These parameters are highly dependent upon the
specific support provided by the hardware. For example, while x86
supports virtual memory addresses, other processors support physical
memory addresses. Since the API aims for architecture independence, this
structure is architecture dependent.
To avoid having to
register and unregister a breakpoint if it just needs modification, the
following function is provided:
int modify_user_hw_breakpoint(struct task_struct *tsk, struct hw_breakpoint *bp)
Hardware breakpoints are removed by an unregister function:
void unregister_hw_breakpoint(struct hw_breakpoint *bp)
Hw-breakpoint has made its way into the -tip tree, the kernel source
development tree maintained by Ingo Molnar. In June it was tentatively
targeted for merging from -tip into the 2.6.32 kernel. However,
the delayed integration with perfcounters has pushed any merge out past
2.6.32.
Whenever it is released, hw-breakpoint promises to provide a portable
and robust method for debuggers to access hardware breakpoints without
conflict. While the hw-breakpoint functionality started out as a relatively
isolated feature to support debuggers, its existence has spawned new
tracing and performance monitoring features. These new features should
prove useful for various situations where data memory access, rather than
instruction access provides the appropriate trigger to collect dynamic
information. By leveraging the perfcounter resource scheduling and
reservation functionality, hw-breakpoint has a very generalized method for
managing limited hardware breakpoint registers. The release of
hw-breakpoint promises to enable new ways for Linux users to track down
difficult bugs such as memory corruption, and to enable diverse dynamic
data access techniques (such as gdb watchpoints and ftrace
ksym_tracer) to play well together.
(
Log in to post comments)