|From:||Ingo Molnar <firstname.lastname@example.org>|
|Subject:||[patch] Latency Tracer, voluntary-preempt-2.6.8-rc4-O6|
|Date:||Fri, 13 Aug 2004 01:51:16 +0200|
|Cc:||Lee Revell <email@example.com>, Felipe Alfaro Solana <firstname.lastname@example.org>, Florian Schmidt <email@example.com>|
i've uploaded the latest version of the voluntary-preempt patch: http://redhat.com/~mingo/voluntary-preempt/voluntary-pree... during the past couple of weeks there has been a steady trend towards rarer and harder to analyze latencies. The preempt-timing patch was a nice starting point but it only prints limited info about the beginning and the end of a critical section - often leaving us in the dark about what happened within the critical section. Often the trace only contains generic entry/exit points like e.g. do_poll() which are not very helpful in determining the precise reason for the latency. so in -O6 i've implemented a 'latency tracer', which automatically records all kernel functions called during a maximum-latency incident. This typically means thousands of functions per critical section. I've combined this tracer with the preempt-timing approach to produce a pretty powerful tool to find & squash latencies. there's a new /proc/latency_trace file that holds the current latency trace (belonging to the previous high-latency event). It has a format that is intended to make it as easy as possible for kernel developers to fix any particular latency source. Audio developers and users can generate such traces and send them along to kernel developers as text files. Sample use of the latency tracer: E.g. the following incident: (default.hotplug/1470): 121 us critical section violates 100 us threshold. => started at: <kmap_high+0x2b/0x2d0> => ended at: <kmap_high+0x1a9/0x2d0> [<c0105a23>] dump_stack+0x23/0x30 [<c0140d14>] check_preempt_timing+0x184/0x1e0 [<c0140e84>] sub_preempt_count+0x54/0x5d [<c0152959>] kmap_high+0x1a9/0x2d0 [<c017655a>] copy_strings+0xea/0x230 [<c01766db>] copy_strings_kernel+0x3b/0x50 [<c017840d>] do_execve+0x12d/0x1f0 [<c0103284>] sys_execve+0x44/0x80 [<c0104b95>] sysenter_past_esp+0x52/0x71 this doesnt tell us too much about why it took 121 usecs to get from one end of kmap_high() to the other end of kmap_high(). Looking at /proc/latency_trace tells us the full story: preemption latency trace v1.0 ----------------------------- latency: 121 us, entries: 1032 (1032) process: default.hotplug/1470, uid: 0 nice: -10, policy: 0, rt_priority: 0 =======> 0.000ms (+0.000ms): page_address (kmap_high) 0.000ms (+0.000ms): page_slot (page_address) 0.000ms (+0.000ms): flush_all_zero_pkmaps (kmap_high) 0.000ms (+0.000ms): set_page_address (flush_all_zero_pkmaps) [...] 0.118ms (+0.000ms): page_slot (set_page_address) 0.118ms (+0.000ms): check_preempt_timing (sub_preempt_count) it's the rare but possible call to flush_all_zero_pkmaps() that generates this particular latency. as can be seen in the above the example, the trace contains a header portion and a trace line for every kernel function called. Only function entries are recorded (not function returns) so i've added the parent function to the trace too, for easier identification of the call sequence. there's a MAX_TRACE define in kernel/latency.c - set to 4000 currently - this is the maximum number of function calls traced per critical section. Feel free to increase/decrease this. The header portion shows the true number of functions called in a critical section, e.g.: latency: 1531 us, entries: 4000 (16098) tells us that there were 16098 trace entries but only the first 4000 were recorded. -O6 also adds another timing option besides preempt_thresh: if preempt_thresh is set to 0 then the tracer will automatically track the largest-previous latency. (i.e. the system does a search for the absolute maximum latency.) The /proc/sys/kernel/preempt_max_latency control can be used to reset this value to conduct a new search for a new workload, without having to reboot the system. -O6 also does some SMP improvements: the IRQ threads now listen to the /proc/irq/*/smp_affinity mask and bind themselves to the configured CPU. This means that e.g. the irqbalance daemon will work as expected. -O6 also fixes and cleans up a number of other aspects of the preempt-timing mechanism. the latency tracer can be turned on/off via CONFIG_LATENCY_TRACE at compile time. An active tracer means considerable runtime overhead. Especially code that does alot of small function calls will see a performance hit. I'm seeing a ~10% overhead on a 2GHz system, but YMMV. reports, suggestions welcome, Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to firstname.lastname@example.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds