Call-graph / call-chain support
Posted Jul 5, 2009 10:28 UTC (Sun) by
mingo (subscriber, #31122)
In reply to:
Perfcounters added to the mainline by mingo
Parent article:
Perfcounters added to the mainline
btw., another thing worth mentioning about perfcounters is turn-key call-graph support and call-graph visualization:
$ perf record -g -f ./pipe-test-1m
[ perf record: Captured and wrote 8.169 MB perf.data (~356901 samples) ]
$ perf report --sort symbol --callchain fractal,5 | cat
#
# (80245 samples)
#
# Overhead Symbol
# ........ ......
#
4.50% [k] pipe_read
|
--99.00%-- do_sync_read
vfs_read
sys_read
system_call_fastpath
__GI___libc_read
__libc_start_main
4.39% [.] main
4.27% [k] __switch_to
|
| |
| |--50.97%-- __GI___libc_write
| |
| --49.06%-- __GI___libc_read
| __libc_start_main
|
--11.19%-- thread_return
|
|--51.44%-- __GI___libc_write
|
--48.83%-- __GI___libc_read
__libc_start_main
3.75% [k] copy_user_generic_string
|
|--52.11%-- do_sync_read
| vfs_read
| sys_read
| system_call_fastpath
| __GI___libc_read
| __libc_start_main
|
--45.39%-- pipe_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
__GI___libc_write
__libc_start_main
3.36% [k] avc_has_perm_noaudit
|
--96.59%-- avc_has_perm
inode_has_perm
file_has_perm
selinux_file_permission
security_file_permission
rw_verify_area
|
|--51.34%-- vfs_read
| sys_read
| system_call_fastpath
| __GI___libc_read
| __libc_start_main
|
--48.66%-- vfs_write
sys_write
system_call_fastpath
|
--99.53%-- __GI___libc_write
__libc_start_main
3.29% [k] schedule
|
|--50.34%-- sysret_careful
| __GI___libc_write
|
--46.55%-- pipe_wait
pipe_read
do_sync_read
vfs_read
sys_read
system_call_fastpath
__GI___libc_read
__libc_start_main
2.89% [k] switch_mm
|
--97.67%-- schedule
|
|--50.75%-- pipe_wait
| pipe_read
| do_sync_read
| vfs_read
| sys_read
| system_call_fastpath
| __GI___libc_read
| __libc_start_main
|
--49.29%-- sysret_careful
__GI___libc_write
2.85% [.] __GI___libc_write
2.70% [.] __GI___libc_read
2.60% [k] file_has_perm
|
|--93.67%-- selinux_file_permission
| security_file_permission
| rw_verify_area
| |
| |--55.02%-- vfs_write
| | sys_write
| | system_call_fastpath
| | |
| | --99.81%-- __GI___libc_write
| | __libc_start_main
| |
| --44.98%-- vfs_read
| sys_read
| system_call_fastpath
| __GI___libc_read
| __libc_start_main
|
--6.33%-- security_file_permission
rw_verify_area
|
|--52.27%-- vfs_write
| sys_write
| system_call_fastpath
| |
| |--97.10%-- __GI___libc_write
| | __libc_start_main
| |
| --4.35%-- __write_nocancel
|
--47.73%-- vfs_read
sys_read
system_call_fastpath
__GI___libc_read
__libc_start_main
2.15% [k] pipe_write
|
--98.31%-- do_sync_write
vfs_write
sys_write
system_call_fastpath
__GI___libc_write
__libc_start_main
2.05% [k] system_call
|
|--50.64%-- __GI___libc_write
| |
| --49.46%-- __libc_start_main
|
--49.24%-- __GI___libc_read
__libc_start_main
Here we record and output full call-chains (down to and including user-space call-chains) and display the overhead in a tree - detailing the call-path that results in that profile entry - and recursively so. (the '5' is a 5% filter - to skip entries below a 5% (relative-)overhead threshold)
For example this portion:
3.75% [k] copy_user_generic_string
|
|--52.11%-- do_sync_read
| vfs_read
| sys_read
| system_call_fastpath
| __GI___libc_read
| __libc_start_main
|
--45.39%-- pipe_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
__GI___libc_write
__libc_start_main
Tells us that in this workload there's a combined overhead of 3.75% from user-copies (copy_user_generic_string()), and that ~52% overhead of that comes from a user-space read() and 45% comes from a user-space write() call.
With traditional 'flat' profiling output we'd only know that there's 3.75% overhead in copy_user_generic_string() - we would not know where it comes from.
(
Log in to post comments)