User: Password:
Subscribe / Log in / New account

CFS-devel, performance improvements

From:  Ingo Molnar <>
Subject:  [announce] CFS-devel, performance improvements
Date:  Tue, 11 Sep 2007 22:04:59 +0200
Message-ID:  <>
Cc:  Peter Zijlstra <>, Mike Galbraith <>, Roman Zippel <>

fresh back from the Kernel Summit, Peter Zijlstra and me are pleased to 
announce the latest iteration of the CFS scheduler development tree. Our 
main focus has been on simplifications and performance - and as part of 
that we've also picked up some ideas from Roman Zippel's 'Really Fair 
Scheduler' patch as well and integrated them into CFS. We'd like to ask 
people go give these patches a good workout, especially with an eye on 
any interactivity regressions.

The combo patch against 2.6.23-rc6 can be picked up from:

The sched-devel.git tree can be pulled from:


There are lots of small performance improvements in form of a 
finegrained 29-patch series. We have removed a number of features and 
metrics from CFS that might have been needed but ended up being 
superfluous - while keeping the things that worked out fine, like 
sleeper fairness. On 32-bit x86 there's a ~16% speedup (over -rc6) in 
lmbench (lat_ctx -s 0 2) results:

                                  (microseconds, lower is better)
        v2.6.22    2.6.23-rc6(CFS)     v2.6.23-rc6-CFS-devel
           0.70          0.75                0.65
           0.62          0.66                0.63
           0.60          0.72                0.69
           0.62          0.74                0.61
           0.69          0.73                0.53
           0.66          0.73                0.63
           0.63          0.69                0.61
           0.63          0.70                0.64
           0.61          0.76                0.61
           0.69          0.74                0.63
      avg: 0.64          0.72 (+12%)         0.62 (-3%)

there is a similar speedup on 64-bit x86 as well. We are now a bit 
faster than the O(1) scheduler was under v2.6.22 - even on 32-bit. The 
main speedup comes from the avoidance of divisions (or shifts) in the 
wakeup and context-switch fastpaths.

there's also a visible reduction in code size:

   text    data     bss     dec     hex filename
  13369     228    2036   15633    3d11 sched.o.before  (UP, nodebug)
  11167     224    1988   13379    3443 sched.o.after   (UP, nodebug)

which obviously helps embedded and is good for performance as well. Even 
on 32-bit we are now within 1% of the size of v2.6.22's sched.o, which 

   text    data     bss     dec     hex filename
   9915      24    3344   13283    33e3 sched.o.v2.6.22

and on SMP the new scheduler is now substantially smaller:

   text    data     bss     dec     hex filename
  24972    4149      24   29145    71d9 sched.o-v2.6.22
  24056    2594      16   26666    682a sched.o-CFS-devel

Changes: besides the many micro-optimizations, one of the changes is 
that se->vruntime (virtual runtime) based scheduling has been introduced 
gradually, step by step - while keeping the wait_runtime metric working 
too. (so that the two methods are comparable side by side, in the same 

The ->vruntime metric is similar to the ->time_norm metric used by 
Roman's patch (and both are losely related to the already existing 
sum_exec_runtime metric in CFS), it's in essence the sum of CPU time 
executed by a task, in nanoseconds - weighted up or down by their nice 
level (or kept the same on the default nice 0 level). Besides this basic 
metric our implementation and math differs from RFS. The two approaches 
should be conceptually more comparable from now on.

We have also picked up two cleanups from RFS (the cfs_rq->curr approach 
and an uninlining optimization) and there's also a cleanup patch from 
Matthias Kaehlcke. We welcome and encourage finegrained patches against 
this patchset. As usual, bugreports, fixes and suggestions are welcome,

	Ingo, Peter

Matthias Kaehlcke (1):
      sched: use list_for_each_entry_safe() in __wake_up_common()

Peter Zijlstra (5):
      sched: simplify SCHED_FEAT_* code
      sched: new task placement for vruntime
      sched: simplify adaptive latency
      sched: clean up new task placement
      sched: add tree based averages

Ingo Molnar (23):
      sched: fix new-task method
      sched: small sched_debug cleanup
      sched: debug: track maximum 'slice'
      sched: uniform tunings
      sched: use constants if !CONFIG_SCHED_DEBUG
      sched: remove stat_gran
      sched: remove precise CPU load
      sched: remove precise CPU load calculations #2
      sched: track cfs_rq->curr on !group-scheduling too
      sched: cleanup: simplify cfs_rq_curr() methods
      sched: uninline __enqueue_entity()/__dequeue_entity()
      sched: speed up update_load_add/_sub()
      sched: clean up calc_weighted()
      sched: introduce se->vruntime
      sched: move sched_feat() definitions
      sched: optimize vruntime based scheduling
      sched: simplify check_preempt() methods
      sched: wakeup granularity fix
      sched: add se->vruntime debugging
      sched: debug: update exec_clock only when SCHED_DEBUG
      sched: remove wait_runtime limit
      sched: remove wait_runtime fields and features
      sched: x86: allow single-depth wchan output

 arch/i386/Kconfig     |   11 
 include/linux/sched.h |   17 -
 kernel/sched.c        |  196 ++++-------------
 kernel/sched_debug.c  |   86 +++----
 kernel/sched_fair.c   |  557 +++++++++++++-------------------------------------
 kernel/sysctl.c       |   22 -
 6 files changed, 243 insertions(+), 646 deletions(-)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds