Perfcounters added to the mainline
Posted Jul 3, 2009 23:38 UTC (Fri) by
mingo (subscriber, #31122)
In reply to:
Perfcounters added to the mainline by zlynx
Parent article:
Perfcounters added to the mainline
Correct.
Another thing to note is that perf stat has a '--repeat N' parameter. This option directs perf stat to run the measured command N times. It saves the various counter results, and then emits basic (avg, std-dev) statistics about them.
For example, running the 'hackbench' messaging benchmark 10 times gives:
aldebaran:~> perf stat --repeat 10 ./hackbench 10
Time: 0.121
Time: 0.091
Time: 0.114
Time: 0.094
Time: 0.090
Time: 0.095
Time: 0.094
Time: 0.107
Time: 0.094
Time: 0.095
Performance counter stats for './hackbench 10' (10 runs):
1259.878957 task-clock-msecs # 10.597 CPUs ( +- 1.799% )
51812 context-switches # 0.041 M/sec ( +- 5.103% )
3519 CPU-migrations # 0.003 M/sec ( +- 4.915% )
17870 page-faults # 0.014 M/sec ( +- 0.392% )
3802645216 cycles # 3018.262 M/sec ( +- 1.747% )
1588586719 instructions # 0.418 IPC ( +- 0.837% )
16885948 cache-references # 13.403 M/sec ( +- 1.503% )
7328059 cache-misses # 5.816 M/sec ( +- 1.773% )
0.118889101 seconds time elapsed ( +- 3.398% )
Shows us the statistical properties of the counters. If your system is 'noisy', or if the metric is a fundamentally volatile one (cycles, or cache-misses), the noise level will be higher.
Other metrics such as instructions or branches executed are a lot more stable.
But for any of the metrics, 'perf stat --repeat 10' gives you a good guess about how reliable that metric is on that particular system.
Somewhat surprisingly, for this particular workload, the most noisy metric is 'context-switches' and 'CPU-migrations' - which measures the number task switches and the number of cross-CPU task migrations. (this is not a PMU metric but a perfcounter metrics offered by the kernel.)
(The reason for the noise here is that hackbench starts and stops a lot of tasks in a bursty way, and any noise in initial conditions get magnified by the chance placement of tasks. 100 msecs is not a lot of time to run, so depending on when the scheduler's balancing algorithm kicks in the placement of tasks is randomized to a certain degree (due to the high overload) and the metric gets spread out.)
The conclusion is that noisy metrics are just as useful as stable metrics, as long as you can measure the noise and as long as you know how to reduce the noise to acceptable levels. Modern CPUs with huge caches and complex heuristics are fundamentally random in their characteristics, so deterministic results can rarely be expected.
(
Log in to post comments)