|
|
Log in / Subscribe / Register

Announcing a new utility: 'trace'

From:  Thomas Gleixner <tglx-AT-linutronix.de>
To:  LKML <linux-kernel-AT-vger.kernel.org>
Subject:  [ANNOUNCE] New utility: 'trace'
Date:  Tue, 16 Nov 2010 22:04:40 +0100 (CET)
Message-ID:  <alpine.LFD.2.00.1011162103580.2900@localhost6.localdomain6>
Cc:  Linus Torvalds <torvalds-AT-linux-foundation.org>, Andrew Morton <akpm-AT-linux-foundation.org>, Ingo Molnar <mingo-AT-elte.hu>, Peter Zijlstra <peterz-AT-infradead.org>, Steven Rostedt <rostedt-AT-goodmis.org>, Arjan van de Ven <arjan-AT-infradead.org>, Arnaldo Carvalho de Melo <acme-AT-redhat.com>, Frederic Weisbecker <fweisbec-AT-gmail.com>, Masami Hiramatsu <masami.hiramatsu.pt-AT-hitachi.com>, Tom Zanussi <tzanussi-AT-gmail.com>, Mathieu Desnoyers <mathieu.desnoyers-AT-efficios.com>, Li Zefan <lizf-AT-cn.fujitsu.com>, Jason Baron <jbaron-AT-redhat.com>, "David S. Miller" <davem-AT-davemloft.net>, Christoph Hellwig <hch-AT-lst.de>, Pekka Enberg <penberg-AT-kernel.org>, Lai Jiangshan <laijs-AT-cn.fujitsu.com>, Eric Dumazet <eric.dumazet-AT-gmail.com>
Archive‑link:  Article

We are pleased to announce a new tracing tool called 'trace'.

This tool follows up on Linus's (not so) recent observations that
Linux tracing is still not really offering good tracing tools to
ordinary users. There's no easy to use workflow and the tools that
exist are not expressive enough to solve everyday problem.

After years of efforts we have not succeeded in meeting (let alone
exceeding) the utility of decades-old user-space tracing tools such as
strace - except for a few new good tools such as PowerTop and
LatencyTop.

'trace' is our shot at improving the situation: it aims at providing a
simple to use and straightforward tracing tool based on the perf
infrastructure and on the well-known perf profiling workflow:

   trace record firefox          # trace firefox and all child tasks,
                                   put it into trace.data

   trace summary                 # show quick overview of the trace,
                                   based on trace.data

   trace report                  # detailed traces, based on trace.data

The main workflow approach is to trace once, and then to analyze the
trace in an iterative fashion, by drilling down from a high-level
overview to more and more detailed trace events - with numerous
filtering capabilities available to make this progression effective
and intuitive. (further below there are a number of real examples of
what you can do - but try out the tool to see all the capabilities.)

To try it out, pick up the -tip tree:

   http://people.redhat.com/mingo/tip.git/README

   ( NOTE: you will need to install the audit-libs-dev /
     audit-libs-devel package if you dont have it installed. We'll
     remove this library dependency in a future iteration of the
     tool. )

And do:

   cd tools/perf/
   make -j install

Once you've done that you should have the 'trace' tool available.

Build the -tip kernel and reboot into it to get the most out of the
tool - but even on default kernels it will try to cope. Syscall events
(CONFIG_FTRACE_SYSCALLS=y) is a must for most methods of analysis
though. (with the above -tip tree you'll get it all configured by
default)

If you've booted the new kernel you can run 'trace check' to double
check that all events used by the tool are available:

 $ trace check

 Checking whether the kernel has all required events ...
  ... Checking event  raw_syscalls:sys_enter:r      : ok
  ...
  ... Checking event  sched:sched_stat_runtime:r    : ok
 Good: all required event types are supported by this kernel.
 The 'trace' utility will be fully functional.

The combo diffstat of the tool is appended at the end of the mail. The
overwhelming majority of changes is on the tooling side - it uses
existing perf events facilities and features to implement the tool.

Here's a bit more general description of the 'trace' tool's capabilities:

     - Trace sessions (single tasks or groups of processes/threads
                       (auto-follow), system-wide and per cpu
                       tracing).

     - Unintrusive tracing

     - It will record filterable information about the session:

        * syscalls
        * task lifetime events (fork/clone/exit)
        * pagefaults
	* various mm events (mmap)
	* scheduling events

        * (support for more events is being worked on)

     - Iterative analysis of an existing session/trace without having
       to re-run the trace, with different filters and options (trace
       data is saved on disk)

     - Concurrent and multi-user safe trace recording: multiple users
       can trace, or the same user can trace in multiple different
       tracing sessions.

     - Available to plain users - the permission model is not
       completely done yet, it needs a temporary workaround currently:

	echo -1 > /proc/sys/kernel/perf_event_paranoid

Todo list:

     - Improve output formatting
     - Utilize more events: networking, block IO
     - Add more higher level trace points
     - Implement 'trace diff' utility to simplify the comparison of traces
     - Improve the permission model (work in progress)
     - Implement -p NNN (trace existing pid) option

Main differences to strace:

     - Low impact recording

     - Arguments are printed in plain hex representation, except for
       important arguments like filenames, which are resolved after
       the syscall is recorded.  'trace' wont do full decoding like
       strace does.

     - Ability to record pagefaults and resolve them fully back to
       within the corresponding mapping:

        * file : offset
        * [heap] : offset
        * [stack] : offset
        * [anon] : offset
        * [vdso] : offset

     - Ability to repeat analysis with different filters and options

Here are a few real-life examples of how to use the 'trace' tool.

Firstly, to record a trace, simply prefix whatever command you'd like
to be traced with 'trace record':

  $ trace record firefox
  # trace recorded [2.570 MB] - try 'trace summary' to get an overview

The new trace.data file in the current directory contains the tracing
session, and it can be analyzed after it has been recorded - without
having to repeat tracing again.

For example, you can get a summary of all events:

  $ trace summary

    .--------------------------------.
 __(    Summary of tasks traced       )_______________________________

               [ task - pid ]     [ events ] [ ratio ]  [ runtime ]
 _____________________________________________________________________

       run-mozilla.sh - 1740   :     14108   [ 57.2% ]     37.341 ms
              firefox - 1741   :       851   [  3.5% ]      0.708 ms
                uname - 1742   :       842   [  3.4% ]      0.527 ms
              firefox - 1747   :      1793   [  7.3% ]      3.568 ms
                mkdir - 1748   :       713   [  2.9% ]      1.289 ms
              firefox - 1750   :       497   [  2.0% ]      2.050 ms
                :1751 - 1751   :       368   [  1.5% ]      0.000 ms
                  sed - 1752   :      1034   [  4.2% ]      0.479 ms
                :1753 - 1753   :       494   [  2.0% ]      2.059 ms
                :1754 - 1754   :       377   [  1.5% ]      0.331 ms
                  sed - 1755   :      1035   [  4.2% ]      0.378 ms
       run-mozilla.sh - 1756   :       848   [  3.4% ]      0.467 ms
              dirname - 1757   :       845   [  3.4% ]      0.531 ms
       run-mozilla.sh - 1758   :       840   [  3.4% ]      0.501 ms
 ____________________________________________________________________

                   14   tasks  :     24645   [100.0% ]     50.228 ms
  $

( Note that the table is augmented with colors to highligh
  high-overhead entries and other information. Those colors are not
  visible in the cut & paste screens here :-) )

To get to the detailed trace (see further below about subsystem level
filtering and higher level overview):

 $ trace report

 #
 # trace events of 'firefox':
 #
             firefox/28132 ( 0.000 ms):  ... [continued]: execve() => 0x0
             firefox/28132 ( 0.001 ms): brk(brk: 0x0)
=> 0x6e2000
             firefox/28132 ( 0.010 ms): mmap(addr: 0x0, len: 0x1000, prot: 0x3, flags: 0x22, fd:
0xffffffff, off: 0x0) => 0x7fb3f442c000
             firefox/28132 ( 0.006 ms): access(filename: 0x7fb3f422aa60, mode: 0x4) (fpath:
/etc/ld.so.preload)  => 0xfffffffffffffffe
             firefox/28132 ( 0.004 ms): open(filename: 0x7fb3f4227f4b, flags: 0x0, mode: 0x1)
(fpath: /etc/ld.so.cache)  => 0x3
             firefox/28132 ( 0.001 ms): fstat(3:</etc/ld.so.cache>, statbuf: 0x7fffafe4ef80)
=> 0x0

To see the events of the 'mozilla-xremote' task alone, the -t option
can be used:

  $ trace report -t mozilla-xremote

 ( 0.000 ms):  ... [continued]: clone() => 0x0
 ( 0.003 ms): close(255:<...>)                                                       => 0x0
 ( 0.002 ms): rt_sigprocmask(how: 0x2, set: 0x6e1a60, oset: 0x0, sigsetsize: 0x8)    => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x14, act: 0x7fffcc4caa80, oact: 0x7fffcc4ca9e0, sigsetsize: 0x8)
=> 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x15, act: 0x7fffcc4caa80, oact: 0x7fffcc4ca9e0, sigsetsize: 0x8)
=> 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x16, act: 0x7fffcc4caa90, oact: 0x7fffcc4ca9f0, sigsetsize: 0x8)
=> 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x2, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8) =>
0x0
 ( 0.001 ms): rt_sigaction(sig: 0x3, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8) =>
0x0
 ( 0.001 ms): rt_sigaction(sig: 0x11, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8)
=> 0x0
 ( 0.010 ms): open(filename: 0x6e95b0, flags: 0x241, mode: 0x1b6) (fpath: /dev/null)  => 0x3
 ( 0.001 ms): dup2(oldfd: 0x3, newfd: 0x1)                                           => 0x1
 ( 0.001 ms): close(3:</dev/null>)                                                   => 0x0

Note how the comm/pid column has been collapsed - because there's only
a single task to display - this keeps the output simpler.

But such traces are generally still too large and detailed to be
really suitable for at-a-glance analysis, so 'trace' offers further
filters which allow you to drill down into any traced workload.

For example subsystem filters will output activities of a given kernel
subsystem.

Current subsystems tags are:

   aio, arch-x86, events, fd, fs, fs-attr, IO, IO-locking,
   IPC-locking, IPC-mm, IPC-net, locking, misc, mm, net, process,
   sched, security, signal, stat, system, task, timer, tty.

This is a first rough categorization of events - more is possible and
more will be done. (Suggestions are welcome!)

For example to utilize the 'process' event category, simply add it
after 'trace report':

  $ trace report process

 # (restricting syscalls to: process)
             firefox/ 6722 ( 0.000 ms):  ... [continued]: execve() => 0x0
             firefox/ 6722 ( 0.257 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a43
            basename/ 6723 ( 0.000 ms):  ... [continued]: clone() => 0x0
            basename/ 6723 ( 0.215 ms): execve(0x6e9c40, 0x6e8d20, 0x6e8b00, 0x7fffcc4ca980, 0x0,
0x0) (fpath: /bin/basename)  => 0x0
            basename/ 6723 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.021 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cad6c,
options: 0x0, ru: 0x0) => 0x1a43
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca8dc,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.267 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a44
      run-mozilla.sh/ 6724 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6724 ( 0.166 ms): execve(0x6ea150, 0x6ea400, 0x6ec660, 0x7fffcc4ca830, 0x0,
0x7f1aa2a611e8) (fpath: /bin/uname)  => 0x0
      run-mozilla.sh/ 6724 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac1c,
options: 0x0, ru: 0x0) => 0x1a44
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca79c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.246 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a45
     mozilla-xremote/ 6725 ( 0.000 ms):  ... [continued]: clone() => 0x0
     mozilla-xremote/ 6725 ( 0.204 ms): execve(0x6ef210, 0x6e95d0, 0x6ecda0, 0x7fffcc4caa30, 0x1,
0x2a) (fpath: /usr/lib64//xulrunner-2/mozilla-xremote-client)  => 0x0
     mozilla-xremote/ 6725 ( 0.000 ms): exit_group(error_code: 0x1)
             firefox/ 6722 ( 3.262 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac7c,
options: 0x0, ru: 0x0) => 0x1a45
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca7dc,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.247 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a46
      run-mozilla.sh/ 6726 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6726 ( 0.150 ms): execve(0x6edca0, 0x6e9ab0, 0x6ecda0, 0x7fffcc4c9ca0, 0x1,
0x7f1aa2a611e8) (fpath: /bin/mkdir)  => 0x0
      run-mozilla.sh/ 6726 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 1.148 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9eec,
options: 0x0, ru: 0x0) => 0x1a46
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9a5c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.243 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a47
             firefox/ 6727 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6727 ( 0.289 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a47, 0x1a47)
=> 0x1a48
             firefox/ 6728 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6727 ( 0.259 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a47, 0x1a47)
=> 0x1a49
                 sed/ 6729 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6728 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6727 ( 0.151 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc,
options: 0x0, ru: 0x0) => 0x1a48
                 sed/ 6729 ( 0.161 ms): execve(0x6e9ed0, 0x6f25e0, 0x6ecda0, 0x7fffcc4c9bb0, 0x0,
0x0) (fpath: /bin/sed)  => 0x0
                 sed/ 6729 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6727 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc,
options: 0x0, ru: 0x0) => 0x1a49
             firefox/ 6727 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9c1c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6727 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.041 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca2bc,
options: 0x0, ru: 0x0) => 0x1a47
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9e1c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.264 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)
=> 0x1a4a
      run-mozilla.sh/ 6730 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6730 ( 0.277 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a4a, 0x1a4a)
=> 0x1a4b
               :6731/ 6731 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6730 ( 0.248 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a4a, 0x1a4a)
=> 0x1a4c
                 sed/ 6732 ( 0.000 ms):  ... [continued]: clone() => 0x0
               :6731/ 6731 ( 0.000 ms): exit_group(error_code: 0x0)
      run-mozilla.sh/ 6730 ( 0.214 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc,
options: 0x0, ru: 0x0) => 0x1a4b
                 sed/ 6732 ( 0.229 ms): execve(0x6e9e50, 0x6f25e0, 0x6ecda0, 0x7fffcc4c9d90, 0x0,
0x0) (fpath: /bin/sed)  => 0x0
                 sed/ 6732 ( 0.000 ms): exit_group(error_code: 0x0)
      run-mozilla.sh/ 6730 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc,
options: 0x0, ru: 0x0) => 0x1a4c
      run-mozilla.sh/ 6730 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9ddc,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
      run-mozilla.sh/ 6730 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.043 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca49c,
options: 0x0, ru: 0x0) => 0x1a4a
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca01c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.174 ms): execve(0x6f0ee0, 0x6f0550, 0x6ef580, 0x7fffcc4cae50, 0x0,
0x0) (fpath: /usr/lib64/firefox-4/run-mozilla.sh)  => 0x0
             firefox/ 6722 ( 0.489 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)
=> 0x1a4d
            basename/ 6733 ( 0.000 ms):  ... [continued]: clone() => 0x0
            basename/ 6733 ( 0.154 ms): execve(0x6ea830, 0x6eacd0, 0x6e9070, 0x7fffc740ac50, 0x0,
0x7f2b576381e8) (fpath: /bin/basename)  => 0x0
            basename/ 6733 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740b03c,
options: 0x0, ru: 0x0) => 0x1a4d
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740ab9c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.243 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)
=> 0x1a4e
      run-mozilla.sh/ 6734 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6734 ( 0.154 ms): execve(0x6ea790, 0x6eabd0, 0x6e9070, 0x7fffc740ac50, 0x0,
0x7f2b576381e8) (fpath: /usr/bin/dirname)  => 0x0
      run-mozilla.sh/ 6734 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.074 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740b03c,
options: 0x0, ru: 0x0) => 0x1a4e
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740ab9c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.237 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)
=> 0x1a4f
               uname/ 6735 ( 0.000 ms):  ... [continued]: clone() => 0x0
               uname/ 6735 ( 0.158 ms): execve(0x6e9780, 0x6e9b00, 0x6e97a0, 0x7fffc740a330, 0x0,
0x7f2b576381e8) (fpath: /bin/uname)  => 0x0
               uname/ 6735 ( 0.000 ms): exit_group(error_code: 0x0)
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740a71c,
options: 0x0, ru: 0x0) => 0x1a4f
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740a29c,
options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.194 ms): execve(0x6f8ae0, 0x6f9c00, 0x6f8ed0, 0x7fffc740a8f0, 0x0,
0x40) (fpath: /usr/lib64/firefox-4/firefox)  => 0x0
             firefox/ 6722 ( 0.000 ms): exit_group(error_code: 0x1)

(Multiple categories can be specified too - for example 'process,io'.)

Note that this is the complete output already, no abbreviations - this
dump alone is already short enough and suitable to get an overview of
what happened in this Firefox session.

Other useful filters are 'trace report io' - this filters out IO
events (syscalls for the time being), or 'trace report fs' to see FS
related activities.

Another common approach to analyzing traces is to figure out which
portion of a trace took the most time. 'trace' allows such analysis
via a duration filter. For example, to see all events that took longer
than 0.5 milliseconds:

  $ trace report -d 0.5

             firefox/ 6722 ( 1.273 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4caf40, count:
0x80)           => 0x8
             firefox/ 6722 ( 1.213 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4cadf0, count:
0x80)           => 0x7
             firefox/ 6722 ( 3.262 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac7c,
options: 0x0, ru: 0x0) => 0x1a45
             firefox/ 6722 ( 1.148 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9eec,
options: 0x0, ru: 0x0) => 0x1a46
             firefox/ 6722 ( 2.360 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4ca490, count:
0x80)           => 0x3
             firefox/ 6727 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc,
options: 0x0, ru: 0x0) => 0x1a49
             firefox/ 6722 ( 2.431 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4ca670, count:
0x80)           => 0x6
      run-mozilla.sh/ 6730 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc,
options: 0x0, ru: 0x0) => 0x1a4c
             firefox/ 6722 ( 1.054 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf:
0x7fffc740b210, count: 0x80) => 0xf
             firefox/ 6722 ( 1.053 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf:
0x7fffc740b210, count: 0x80) => 0x15
             firefox/ 6722 ( 1.157 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf:
0x7fffc740a8f0, count: 0x80) => 0x6

Another thing that is recorded are page fault events. These can be analyzed via the 
-p option:

  $ trace report -p

  [...]
             firefox/ 6722 ( 0.001 ms):      #PF       4793: [                    g_str_hash]: =>
/usr/lib64/libgdk-x11-2.0.so.0.2302.0 offset: 0xa5000 page: 165 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4794: [         gdk_event_handler_set]: =>
/usr/lib64/libgdk-x11-2.0.so.0.2302.0 offset: 0x25bc0 page: 37 (R.U)
             firefox/ 6722 ( 0.001 ms): rt_sigaction(sig: 0xd, act: 0x7fff163c34c0, oact:
0x7fff163c3420, sigsetsize: 0x8) => 0x0
             firefox/ 6722 ( 0.001 ms):      #PF       4795: [                  7f4dddedd860]: =>
/usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x72860 page: 114 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4796: [                  7f4dde0163a0]: =>
/usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x1ab3a0 page: 427 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4797: [                  7f4dde0163a4]: =>
[anon 0x7f4dde4ee000] offset: 0x220 page: 0 (R.U)
             firefox/ 6722 ( 0.002 ms):      #PF       4798: [                  7f4dde0163b0]: =>
[anon 0x7f4dde4ee000] offset: 0x220 page: 0 (W.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4799: [                  7f4dde011c50]: =>
/usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x1a6c50 page: 422 (R.U)
  [...]

Note that both data faults and code faults are traced and interpreted
- and if the fault lies in an executable area (binary or shared
library) then the tool will figure out the function that caused the
fault.

The file name of the faulting address is displayed as well, and the
offset into that file.

Here is an example of the 'duration filter', where only events that
took longer than 20 msecs are displayed, including absolute timestamps
(with syscalls, pagefaults and scheduler events included):

 $ trace report -t -P -s -d 20
#
# trace events of 'firefox':
# (duration filter: 20.000 msecs)
#
    29.205       run-mozilla.sh/11425 (22.723 ms): => blocked [state: 0x02 cputime: 0.044 ms #PF:
0] stat(filename: 0x1734860, statbuf: 0x7fff313370b0) (fpath: /usr/lib64/firefox-3.6/firefox)
    32.347       run-mozilla.sh/11425 (25.864 ms): stat(filename: 0x1734860, statbuf:
0x7fff313370b0) (fpath: /usr/lib64/firefox-3.6/firefox)  => 0x0
    69.612      mozilla-plugin-/11428 (23.321 ms): => blocked [state: 0x02 cputime: 0.041 ms #PF:
0] stat(filename: 0x8f1ba0, statbuf: 0x7fff06a44a40) (fpath: /usr/lib64/nspluginwrapper/plugi
    74.800      mozilla-plugin-/11428 (28.509 ms): stat(filename: 0x8f1ba0, statbuf:
0x7fff06a44a40) (fpath: /usr/lib64/nspluginwrapper/plugin-config)  => 0x0
   307.225      mozilla-plugin-/11428 (216.985 ms): => unblock [PF: 0]
   307.238      mozilla-plugin-/11428 (217.002 ms): wait4(upid: 0xffffffffffffffff, stat_addr:
0x7fff06a44c7c, options: 0x0, ru: 0x0) => 0x2ca6
   307.459       run-mozilla.sh/11425 (274.198 ms): => unblock [PF: 0]
   307.466       run-mozilla.sh/11425 (274.209 ms): wait4(upid: 0xffffffffffffffff, stat_addr:
0x7fff313372ec, options: 0x0, ru: 0x0) => 0x2ca4
   534.117       run-mozilla.sh/11425 (221.398 ms): => unblock [PF: 0]
   534.128       run-mozilla.sh/11425 (221.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr:
0x7fff313370ec, options: 0x0, ru: 0x0) => 0x2ca7
   556.522       run-mozilla.sh/11425 (21.759 ms): execve(0x173c110, 0x173b800, 0x173acc0,
0x7fff31337040, 0x0, 0x0) (fpath: /usr/lib64/firefox-3.6/run-mozilla.sh)  => 0x0
   597.105       run-mozilla.sh/11435 (23.718 ms):      #PF         64: [
__execve]: => /usr/lib64/firefox-3.6/firefox offset: 0x6d8 page: 0 (W.K)
   597.230       run-mozilla.sh/11435 (30.228 ms): execve(0x1a7c860, 0x1a81560, 0x1a88140,
0x7fffd937b050, 0x0, 0x7f30498921e8) (fpath: /usr/lib64/firefox-3.6/firefox)  => 0x0
   628.920       run-mozilla.sh/11435 (20.541 ms): open(filename: 0x7fff35246ec4, flags: 0x0, mode:
0x1b6) (fpath: /etc/gre.d/gre64.conf)  => 0x3
   767.480       run-mozilla.sh/11435 (29.628 ms):      #PF        579: [
memset]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0x1ce7a0 page: 462 (W.U)
  1179.170       run-mozilla.sh/11435 (26.734 ms): => unblock [PF: 1]
  1179.270       run-mozilla.sh/11435 (26.864 ms):      #PF       2944: [
7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U) ... [unfini
  1179.270       run-mozilla.sh/11435 (26.864 ms): => blocked [state: 0x02 cputime: 0.010 ms #PF:
1] 
  1193.124       run-mozilla.sh/11435 (40.718 ms):      #PF       2944: [
7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U)
  1730.468       run-mozilla.sh/11435 (20.649 ms): access(filename: 0x7fb7a5981040, mode: 0x4)
(fpath: /usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE)  => 0x0
  1984.050       run-mozilla.sh/11425 (1417.131 ms): => unblock [PF: 0]
  1984.070       run-mozilla.sh/11425 (1417.155 ms): wait4(upid: 0xffffffffffffffff, stat_addr:
0x7fffd937b32c, options: 0x0, ru: 0x0) => 0x2cab

Here is an example of looking at a specific task (pid 11435),
scheduler events in a timeframe around an interesting event which we
picked from the above duration filtered output.

 $ trace report -t -P -s -f 11435 -T 1152.4,1196
 #
 # trace events of 'firefox':
 #
  1152.401 ( 0.002 ms):      #PF       2942: [                  7fb7a3005e20]: =>
/usr/lib64/libssl3.so offset: 0x29e20 page: 41 (R.U)
  1152.404 ( 0.001 ms):      #PF       2943: [                  7fb7a39375d0]: =>
/usr/lib64/xulrunner-1.9.2/libxul.so offset: 0x4dd5d0 page: 1245 (R.U)
  1152.436 ( 0.031 ms):      #PF       2944: [                  7fb7a42db4e0]: =>
/usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U) ... [unfinished]
  1152.436 ( 0.031 ms): => blocked [state: 0x02 cputime: 0.216 ms #PF: 1] 
  1179.170 (26.734 ms): => unblock [PF: 1]
  1179.270 (26.864 ms): => blocked [state: 0x02 cputime: 0.010 ms #PF: 1] 
  1193.120 (13.850 ms): => unblock [PF: 1]
  1193.124 (40.718 ms):      #PF       2944: [                  7fb7a42db4e0]: =>
/usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U)
  1193.187 ( 0.061 ms):      #PF       2945: [                  7fb7a42c270c]: =>
/usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe6870c page: 3688 (R.U) ... [unfinished]
  1193.187 ( 0.061 ms): => blocked [state: 0x02 cputime: 0.067 ms #PF: 1] 

At this point we'd like to ask for feedback from other users of tracing tools,
which workflow components would you like to see in addition to what the 'trace'
tool is offering?

Comments, bug reports, suggestions welcome,

Thanks,

	Thomas, Ingo
---
 arch/x86/mm/fault.c                                |   32 +-
 fs/namei.c                                         |    7 +-
 include/trace/events/kmem.h                        |   39 +
 include/trace/events/sched.h                       |   80 +
 include/trace/events/vfs.h                         |   32 +
 init/Kconfig                                       |    3 +
 kernel/perf_event.c                                |    2 +-
 kernel/sched.c                                     |    2 +
 kernel/trace/trace.c                               |   17 +
 .../{perf-trace-perl.txt => perf-script-perl.txt}  |   28 +-
 ...erf-trace-python.txt => perf-script-python.txt} |   88 +-
 tools/perf/Documentation/perf-script.txt           |  111 ++
 tools/perf/Documentation/perf-trace.txt            |   96 +-
 tools/perf/Makefile                                |   14 +-
 tools/perf/builtin-lock.c                          |    6 +-
 tools/perf/builtin-record.c                        |   91 +-
 tools/perf/builtin-sched.c                         |    6 +-
 tools/perf/builtin-script.c                        |  826 +++++++++
 tools/perf/builtin-trace.c                         | 1839 +++++++++++++-------
 tools/perf/builtin.h                               |    3 +-
 tools/perf/command-list.txt                        |    1 +
 tools/perf/perf.c                                  |   13 +-
 .../perf/scripts/python/Perf-Trace-Util/Context.c  |    2 +-
 tools/perf/syscall-attr.h                          |  303 ++++
 tools/perf/util/event.c                            |    6 +-
 tools/perf/util/header.c                           |   76 +-
 tools/perf/util/header.h                           |    1 +
 tools/perf/util/map.c                              |   26 +-
 tools/perf/util/parse-events.c                     |   18 +-
 tools/perf/util/parse-events.h                     |    1 +
 .../perf/util/scripting-engines/trace-event-perl.c |    6 +-
 .../util/scripting-engines/trace-event-python.c    |    4 +-
 tools/perf/util/session.c                          |   28 +-
 tools/perf/util/session.h                          |    5 +
 34 files changed, 2969 insertions(+), 843 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



to post comments

Announcing a new utility: 'trace'

Posted Nov 16, 2010 22:29 UTC (Tue) by nix (subscriber, #2304) [Link] (18 responses)

- Arguments are printed in plain hex representation, except for important arguments like filenames, which are resolved after the syscall is recorded. 'trace' wont do full decoding like strace does.
If 'wont' meant 'is not implemented yet', this is fine and indeed expected. If 'wont' means 'will never be implemented', then it is nearly useless for virtually every application I can imagine using strace for. Sorry, arguments matter, and not just filenames. strace is already annoying because there are some arguments (notably structure pointers) it can't drill down into: eliminating even more of them just makes the tool next to useless for diagnostics (at least for any of the diagnostics I, as a userspace programmer and sysadmin, carry out day-to-day).

Announcing a new utility: 'trace'

Posted Nov 16, 2010 22:41 UTC (Tue) by mingo (subscriber, #31122) [Link] (17 responses)

Could you list a few everyday examples of strace argument decoding that you find the most useful?

Thanks,

Ingo

Announcing a new utility: 'trace'

Posted Nov 16, 2010 22:50 UTC (Tue) by walters (subscriber, #7396) [Link]

I haven't actually tried trace yet (it sounds awesome though). One thing that comes to mind that I looked at recently with strace is the arguments to poll().

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:02 UTC (Tue) by foom (subscriber, #14868) [Link] (9 responses)

Let's take some of your example output:
firefox/28132 ( 0.010 ms): mmap(addr: 0x0, len: 0x1000, prot: 0x3, flags: 0x22, fd:
0xffffffff, off: 0x0) => 0x7fb3f442c000
( 0.001 ms): rt_sigaction(sig: 0x14, act: 0x7fffcc4caa80, oact: 0x7fffcc4ca9e0, sigsetsize: 0x8)
=> 0x0

That tells me almost nothing.

Compare that to strace's:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3f442c000
rt_sigaction(SIGTSTP, {SIG_IGN}, {SIG_DFL}, 8) = 0

Sorry, strace wins by a landslide. Decoding the flags to symbolic names is hugely useful. Decoding the struct sigaction is even more useful (since I can't even do that manually by grepping include files -- you've just given me a random pointer, completely useless).

Argument names, not so useful: that's basically just noise. I know the API of mmap, but I sure as heck don't know what the flags' raw integer values are!

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:22 UTC (Tue) by tglx (subscriber, #31301) [Link] (4 responses)

We know that and for one this can be improved, but OTOH our main objective at the moment was to provide features which are not in strace (and never will be): The analysis of blocking events, pagefaults etc.

This is just the rough start of this tool and we are happy to listen to your feedback!

Thanks,

tglx

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:48 UTC (Tue) by foom (subscriber, #14868) [Link] (2 responses)

If there's no technical or philosophical impediment to their reimplementation in the future: Awesome!

I'm certainly looking forward to all those other features -- they are sorely needed, and it makes total sense to work on the unique features first. Indeed I've been wishing for a tool which would easily tell me what pages I'm faulting in (and in what order) out of a 5GB mmaped file, in order to help debug performance issues with disk seeks when the file is totally paged out.

I just hope that in time, the existing featureset of strace can also be supported, so that we don't end up with two half-useful tools instead of one awesome tool.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 12:17 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Strongly agreed. strace *is* a crufty old goat that would be nice to replace, and anything that can put ptrace() uses to death is a good thing in my eyes. But we do need argument decoding if we're to use it to figure out what on earth a program is up to, which is, y'know, the point of it. AIX has a syscall trace program that doesn't do any argument decoding (even of strings!), and it's main use is as a source of frustration (though of course it doesn't do any of the other things trace does either). Kicking up the debugger is often easier, even without debugging information.

(I know that argument decoding is really boring to get right and tends to involve giant tables of syscalls so needs active maintenance, but unfortunately it's also hard to avoid for a useful tool. But, hey, if you don't actually object to it so it might turn up later, that's fine: we know there's nothing in the implementation that makes it harder for trace than it is for strace. At least you don't need to worry too much about putting new syscalls in fast: the new syscalls will be relatively unused for a long time, so you don't need to rush to add them to the syscall table in the tracing tool.)

Announcing a new utility: 'trace'

Posted Nov 26, 2010 2:33 UTC (Fri) by kabloom (guest, #59417) [Link]

Hey! Since `trace' lives in the kernel tree, this means that all of the syscall decoding code will ALSO live in the kernel tree, and that whoever writes a new syscall can be asked to provide the decoder at the same time.

Announcing a new utility: 'trace'

Posted Nov 20, 2010 17:01 UTC (Sat) by kleptog (subscriber, #1183) [Link]

Anything that brings something like strace but better is good. However, if you do decide to do argument decoding I strongly suggest looking at the source for strace. It contains scripts for dozens of OSes and architechtures for extracting syscall, ioctl and signal information. Along with code to print it out.

It's probably a good idea to at least glance at it once in a while to make sure the impedance mismatch doesn't get too great, so that when you do want it it won't be much work.

(The PORTING document of strace suggests that you should be able to create a new event loop and you'll have a basic working version pretty quick.)

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:27 UTC (Tue) by marineam (guest, #28387) [Link]

The argument names could be useful, just as the name/pid (timing) prefix could be useful. Both could be optional. One thing that I really like about strace is that it defaults to being pretty terse. Pids are only included if tracing children is enabled, timing information is a whole set of options depending on what type of timing you need, etc. That way the simple case of just seeing what system calls are bing made isn't clouded by extra information. (Am I in the event loop or stuck on a futex?)

Announcing a new utility: 'trace'

Posted Nov 17, 2010 9:45 UTC (Wed) by rvfh (guest, #31018) [Link] (1 responses)

> firefox/28132 ( 0.010 ms): mmap(addr: 0x0, len: 0x1000, prot: 0x3, flags: 0x22, fd: 0xffffffff, off: 0x0) => 0x7fb3f442c000

> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3f442c000 rt_sigaction(SIGTSTP, {SIG_IGN}, {SIG_DFL}, 8) = 0

Looks better in trace to me!
addr: 0x0 <=> NULL
len: 0x1000 <=> 4096 (which is 0x1000 in hex)
You can decode the protocol/flags given the value (I know it's more effort), and 0xffffffff is -1

Announcing a new utility: 'trace'

Posted Nov 17, 2010 15:01 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

You can decode the protocol/flags given the value (I know it's more effort)

It's not just more effort, it's something the machine can do better than me.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 10:10 UTC (Wed) by renox (guest, #23785) [Link]

I disagree with your comment about the argument names which are very useful IMHO:
1) if you have an unfamiliar function, the argument names (if well chosen) can help a lot grasp what the function does.
2) if you're debugging, it helps catching a very common programming mistake: putting the argument in the wrong order (which is only caught by the compiler when the arguments types are incompatible).

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:06 UTC (Tue) by marineam (guest, #28387) [Link]

In my usage often the important strings other than filenames that I look at frequently in strace are the arguments to execve() and read()/write() so I can narrow down my debugging to things like "Ok, I see that it is receiving all of the data correctly but it somehow truncated it before calling xyz"

The vast majority of the strings aren't important but that one time an important string isn't decoded it is annoying and slightly time consuming to switch to a more complex tool such as gdb to answer the otherwise 3 second question of "that data is good/bad." (I can't remember what syscall it was that I hit this with strace any more but the data wasn't in a struct and yet it didn't know it could decode it.)

Decoding strings by default does add noise but strace's default of truncating after 32 characters seems to be a reasonable compromise.

Announcing a new utility: 'trace'

Posted Nov 16, 2010 23:27 UTC (Tue) by ajross (guest, #4563) [Link]

Symbolic ioctl names are a big one for me. These are often infuriatingly hard to grep out of headers.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 14:20 UTC (Wed) by madcoder (guest, #30027) [Link]

Well, I will repeat some of what nix said, but here is what is tremendously useful to me in strace that trace doesn't provide:

* flags demangling (see the mmap example);
* pretty-printing of the standard types, see various strace examples:
rt_sigaction(SIGRTMIN, {0x7fe36e983860, [], SA_RESTORER|SA_SIGINFO, 0x7fe36e98cf60}, NULL, 8) = 0
stat("/usr/bin/test", {st_mode=S_IFREG|0755, st_size=30136, ...}) = 0
* see through buffers (strace -s) for reads and writes (it's *VERY* useful to detect off-by-ones in read(v)s/write(v)s/{send,recv}{,msg} and so forth (I've used it numerous times);
* CMSG decoding (for example for the SCM_CREDENTIALS or SCM_RIGHTS stuff for unix sockets).

Argument names are good though, and useful for the reader for less used functions (or raw kernel syscalls that the libc usually wraps in more known interfaces e.g.).

I know "trace" can do stuff that strace doesn't like reporting what was slow, filter the trace for many things. But it cannot claim to replace strace until it can do that. Strace is (for me) more used for debugging purposes (What does my $KJASD*(*@!#*&!@ program is doing with the syscalls because I'm too dumb to understand what is happening by reading the code) than for performance reasons (even if did use it one or twice to do that with post-filtering of strace -tt output, clearly painful and here "trace" is a clear winner).

Right now trace is a *very interesting* performance oriented trace tool, on the way to take some "market share" to strace, but it's not a drop in replacement by far.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 15:22 UTC (Wed) by cventers (guest, #31465) [Link]

I use strace to debug one of my Perl par build processes. The PAR archive support included in the Perl Catalyst web framework's make has a nasty habit of shutting down stderr somewhere in the chain, and so build failures often result in a generic make error.

In those cases I pass -f, -s 50000 and -e trace=read,write and I can see what went wrong.

handy structure/array pretty-printing in strace

Posted Nov 17, 2010 23:13 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (1 responses)

select/poll arguments are quite handy, and so are socket addresses so that you can understand which socket is yours and which are created by the resolver. sigaction also comes to mind.

handy structure/array pretty-printing in strace

Posted Nov 19, 2010 17:11 UTC (Fri) by adamgundy (subscriber, #5418) [Link]

seems like a good case for ripping out the strace syscall decoding logic into a separate library that both projects can share...

Announcing a new utility: 'trace'

Posted Nov 17, 2010 0:39 UTC (Wed) by jdahlin (subscriber, #14990) [Link] (5 responses)

Will the trace utility ever do user space tracing? In particular: inter-library function calls. Which in my opinion (as a user space developer) is much more interesting than the kernel itself.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 1:12 UTC (Wed) by chantecode (subscriber, #54535) [Link] (4 responses)

You mean bringing ltrace features?

Well that will be possible and nice to have when we'll have uprobes merged upstream.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 12:19 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

Note that ltrace uses LD_AUDIT, which works surprisingly well. Maybe using both to fill in the holes in each other might be worthwhile (or then again perhaps not).

Announcing a new utility: 'trace'

Posted Nov 17, 2010 13:24 UTC (Wed) by jdahlin (subscriber, #14990) [Link] (2 responses)

LD_AUDIT looks like a nice interface, haven't heard about it before.

However, I think you're confusing ltrace with latrace, only the latter uses the LD_AUDIT interface.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 18:16 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Oh, er, yes, I was. I'm not sure I ever encountered ltrace. :)

Announcing a new utility: 'trace'

Posted Nov 17, 2010 19:35 UTC (Wed) by oak (guest, #2786) [Link]

Now that the latest latrace version has fixed the process stack handling crashes and the output can be toggled on & off[1], it's more useful than ltrace.

[1] The latrace based tracing downside is that you need to start the process under latrace as it's depending on dynamic linker (LD_AUDIT-feature) whereas ltrace can be attached to an already running process.

Downsides of ptrace() based ltrace are that it doesn't work for threaded processes[2] and that it cannot trace library -> library calls, only binary -> library calls. Both are pretty huge downsides.

[2] ltrace toggles the breakpoints to let programs continue, this means that it can miss other threads calling same functions at the same time. Whereas e.g. functracer executes the instruction that was under breakpoint out-of-line (i.e. needs arch-specific code to emulate code using relative addressing) so that it doesn't need to be disabled.

One more problem with ltrace & functracer are that Linux ptrace() is somewhat broken for tracing. Utrace is supposed to fix/make ptrace() robust, but unfortunately it's still not in mainline.

Comparison to other tools?

Posted Nov 17, 2010 4:12 UTC (Wed) by karim (subscriber, #114) [Link]

I haven't been following the various projects for a while (so I won't even venture naming any by name - I'm not even sure which is active and which isn't anymore), but I'd be curious to know how this differs from the other various tracing efforts that have been around - some of which did/do have user-space usage as a central target.

Karim

Live information

Posted Nov 17, 2010 9:39 UTC (Wed) by iq-0 (subscriber, #36655) [Link] (1 responses)

Looks like a very neat tool, it certainly has promise. Perhaps the best part of it is that it doesn't try to contain itself to tracing one kind of information (however useful that may be).

Full tracing inside an application is still dependent on uprobes, but for many use cases I think it would suffice to be able to read profiling (gprof) data and combine it with the kernel level information. This would really make it a powerful profiling solution. And with uprobes one has the best of both worlds: basic profiling of any program and very detailed profiling when using special profiling builds from the same tool.

But in practice the most used feature I have for strace would be the live view on an already running process. The "why does this program appear to be stuck" kind or "why is this program so extremely busy. And part of these questions is answered by the speed the output of strace scrolls by (especially when combined with a few filters to filter out some common event-loop "noise").
Seeing possible burst patterns (and immediately looking back at their names by suspending the view) is really very different from trying to interpret timestamps and creating a mental picture of the flow. Sure it's not very scientific but often it gives enough information about how to proceed. So live information feeds is (for me at least) one the main reasons for grabbing a tool like strace.
But I guess something similar will be created for 'trace' like what was done for 'perf' and things will all be good. Great work guys! :-D

Live information

Posted Nov 17, 2010 15:23 UTC (Wed) by Yorick (guest, #19241) [Link]

Live stracing is definitely very useful, and not just in the quantitative way ("the speed the output scrolls by"), but really seeing what the process is doing when I'm pressing this button here, and why it's busy when I'm doing nothing at all.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 15:38 UTC (Wed) by Yorick (guest, #19241) [Link] (8 responses)

I often find myself staring at a line in a very long strace output, wondering where that file descriptor came from. It would be handy to have an option adding this as an annotation, like
read(17, "groo", 4) = 4    # 17 = /etc/frotz, O_RDONLY
write(9, "moo", 3) == 3    # 3 = tcp 192.88.15.36 port 1469
This could probably be done by postprocessing (s)trace output, but it would be handy to have it built in.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 16:21 UTC (Wed) by dvhart (guest, #19636) [Link]

That's an excellent idea.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 17:00 UTC (Wed) by hp (guest, #5220) [Link]

I could have used this many many times, nice idea.

Announcing a new utility: 'trace'

Posted Nov 17, 2010 18:24 UTC (Wed) by ilmari (guest, #14175) [Link] (2 responses)

OH! YES! PLEASE! I have lost count of the hours I've spent groveling through traces to find where an fd came from (and that only works if you have a trace of the entire lifetime of the process, i.e. not when attaching to an already-running one with -p).

Announcing a new utility: 'trace'

Posted Nov 17, 2010 19:49 UTC (Wed) by oak (guest, #2786) [Link]

Preferably also for unix sockets too, not just files. I mean, files are easy see if the process is still running, you just check /proc/PID/fd/ (or use lsof), but for unix sockets it's nearly impossible to see to which unix socket (file) given FD is related to unless you catch also the connect().

As this thing is tracing whole system not just individual process like strace does, it might be too slow to get all arguments to all functions. The arguments to functions like open() etc that return values being used by other system calls, are very important though. For example LTT doesn't store file names for access() calls (one can see them with strace if really necessary), but it will store them for open() calls.

As to formatting of the output, ltrace is quite nice. It aligns the call return values so that they're easier to see + it has separate option for call time and count summaries. Try for example:

  ltrace -S cat
  ^C
  ltrace -c cat
  ^C

Announcing a new utility: 'trace'

Posted Nov 23, 2010 13:58 UTC (Tue) by stevem (subscriber, #1512) [Link]

<aol>
I've done this *so* many times...

Announcing a new utility: 'trace'

Posted Nov 19, 2010 17:13 UTC (Fri) by adamgundy (subscriber, #5418) [Link] (2 responses)

this is the most annoying thing about using strace.. this one feature would be a massive improvement.

Announcing a new utility: 'trace'

Posted Nov 19, 2010 17:15 UTC (Fri) by adamgundy (subscriber, #5418) [Link] (1 responses)

actually, what would *really* good would be if trace (or strace) could figure out the FD *even if* it didn't see it opened (ie: you attached to the process later on) - combining a bit of lsof functionality with the tracer.

Announcing a new utility: 'trace'

Posted Nov 26, 2010 15:37 UTC (Fri) by i3839 (guest, #31386) [Link]

That's fairly easy because all you need is to read the /proc/$PID/fd/$FD symlink. That works great for files. For sockets it works less great, because it seems tricky to figure out which socket belongs to which fd.


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds