|From:||Alex Shi <email@example.com>|
|To:||firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org|
|Subject:||[patch v6 0/21] sched: power aware scheduling|
|Date:||Sat, 30 Mar 2013 22:34:47 +0800|
|Cc:||email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org|
This patch set implement/consummate the rough power aware scheduling proposal: https://lkml.org/lkml/2012/8/13/139. The code also on this git tree: https://github.com/alexshi/power-scheduling.git power-scheduling The patch defines a new policy 'powersaving', that try to pack tasks on each sched groups level. Then it can save much power when task number in system is no more than LCPU number. As mentioned in the power aware scheduling proposal, Power aware scheduling has 2 assumptions: 1, race to idle is helpful for power saving 2, less active sched groups will reduce cpu power consumption The first assumption make performance policy take over scheduling when any group is busy. The second assumption make power aware scheduling try to pack disperse tasks into fewer groups. Compare to the removed power balance, this power balance has the following advantages: 1, simpler sys interface only 2 sysfs interface VS 2 interface for each of LCPU 2, cover on all cpu topology effect on all domain level VS only work on SMT/MC domain 3, Less task migration mutual exclusive perf/power LB VS balance power on balanced performance 4, considered system load threshing yes VS no 5, transitory task considered yes VS no BTW, like sched numa, Power aware scheduling is also a kind of cpu locality oriented scheduling. Thanks comments/suggestions from PeterZ, Linus Torvalds, Andrew Morton, Ingo, Len Brown, Arjan, Borislav Petkov, PJT, Namhyung Kim, Mike Galbraith, Greg, Preeti, Morten Rasmussen, Rafael etc. Since the patch can perfect pack tasks into fewer groups, I just show some performance/power testing data here: ========================================= $for ((i = 0; i < x; i++)) ; do while true; do :; done & done On my SNB laptop with 4 core* HT: the data is avg Watts powersaving performance x = 8 72.9482 72.6702 x = 4 61.2737 66.7649 x = 2 44.8491 59.0679 x = 1 43.225 43.0638 on SNB EP machine with 2 sockets * 8 cores * HT: powersaving performance x = 32 393.062 395.134 x = 16 277.438 376.152 x = 8 209.33 272.398 x = 4 199 238.309 x = 2 175.245 210.739 x = 1 174.264 173.603 tasks number keep waving benchmark, 'make -j <x> vmlinux' on my SNB EP 2 sockets machine with 8 cores * HT: powersaving performance x = 2 189.416 /228 23 193.355 /209 24 x = 4 215.728 /132 35 219.69 /122 37 x = 8 244.31 /75 54 252.709 /68 58 x = 16 299.915 /43 77 259.127 /58 66 x = 32 341.221 /35 83 323.418 /38 81 data explains: 189.416 /228 23 189.416: average Watts during compilation 228: seconds(compile time) 23: scaled performance/watts = 1000000 / seconds / watts The performance value of kbuild is better on threads 16/32, that's due to lazy power balance reduced the context switch and CPU has more boost chance on powersaving balance. Some performance testing results: --------------------------------- Tested benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. results: A, no clear performance change found on 'performance' policy. B, specjbb2005 drop 5~7% on both of policy whenever with openjdk or jrockit on powersaving polocy C, hackbench drops 40% with powersaving policy on snb 4 sockets platforms. Others has no clear change. === Changelog: V6 change: a, remove 'balance' policy. b, consider RT task effect in balancing c, use avg_idle as burst wakeup indicator d, balance on task utilization in fork/exec/wakeup. e, no power balancing on SMT domain. V5 change: a, change sched_policy to sched_balance_policy b, split fork/exec/wake power balancing into 3 patches and refresh commit logs c, others minors clean up V4 change: a, fix few bugs and clean up code according to Morten Rasmussen, Mike Galbraith and Namhyung Kim. Thanks! b, take Morten Rasmussen's suggestion to use different criteria for different policy in transitory task packing. c, shorter latency in power aware scheduling. V3 change: a, engaged nr_running and utilisation in periodic power balancing. b, try packing small exec/wake tasks on running cpu not idle cpu. V2 change: a, add lazy power scheduling to deal with kbuild like benchmark. -- Thanks Alex [patch v6 01/21] Revert "sched: Introduce temporary FAIR_GROUP_SCHED [patch v6 02/21] sched: set initial value of runnable avg for new [patch v6 03/21] sched: only count runnable avg on cfs_rq's [patch v6 04/21] sched: add sched balance policies in kernel [patch v6 05/21] sched: add sysfs interface for sched_balance_policy [patch v6 06/21] sched: log the cpu utilization at rq [patch v6 07/21] sched: add new sg/sd_lb_stats fields for incoming [patch v6 08/21] sched: move sg/sd_lb_stats struct ahead [patch v6 09/21] sched: scale_rt_power rename and meaning change [patch v6 10/21] sched: get rq potential maximum utilization [patch v6 11/21] sched: detect wakeup burst with rq->avg_idle [patch v6 12/21] sched: add power aware scheduling in fork/exec/wake [patch v6 13/21] sched: using avg_idle to detect bursty wakeup [patch v6 14/21] sched: packing transitory tasks in wakeup power [patch v6 15/21] sched: add power/performance balance allow flag [patch v6 16/21] sched: pull all tasks from source group [patch v6 17/21] sched: no balance for prefer_sibling in power [patch v6 18/21] sched: add new members of sd_lb_stats [patch v6 19/21] sched: power aware load balance [patch v6 20/21] sched: lazy power balance [patch v6 21/21] sched: don't do power balance on share cpu power -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to email@example.com More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds