I've been watching the internals of the kernel for a while now. In the older 2.2 kernels, the code was smaller and there was a lot of micro-optimizations. Some of these optimizations caused for quite a head ache to understand what they were doing. But this was fine because the kernel was small.
Then the kernel reached adulthood, and became a more prime time competitor. It now has lots of rich features and runs from small embedded devices to boxes with more CPUs than the cows on my street. This made the kernel larger and more complex. Now adding micro optimizations that causes more complexity is looked down upon. We now are focusing on maintainability over getting that last 0.5% out of the box.
I've seen several decisions to choose cleaner, more maintainable code over the split second faster code that can only be understood by the goblins of Geek Heaven. But this is a Good Thing (TM). It makes the kernel more maintainable. And looking at any of Corbet's kernel talks, the amount of churn that Linux takes, needs all the maintainability it can get.
Posted Oct 21, 2009 15:47 UTC (Wed) by pj (subscriber, #4506)
[Link]
Agreed, maintainability is pretty key. It would be nice, though, if we could teach the compiler about those kinds of micro-optimizations - then we could be both fast and maintainable!
Sometimes performance is not the main factor
Posted Oct 21, 2009 17:03 UTC (Wed) by felixrabe (guest, #50514)
[Link]
How about this: have two chunks of source code, one non-optimized, other optimized. #if 0 out the non-optimized version.
Neat idea: put a SHA1 sum of the non-optimized code next to the (by hand) optimized one, and state (in a compiler-readable way) that the optimized version is equivalent to the code with that hash, and let the compiler check that the non-optimized, "commented-out" version still matches the hash - otherwise issue a warning and compile the non-optimized version instead.
Sometimes performance is not the main factor
Posted Oct 21, 2009 17:20 UTC (Wed) by nevets (subscriber, #11875)
[Link]
And this makes things cleaner and maintainable how?
Note, it is also design decisions that may not by the best for performance. Some of the issues is with the compiler. We break large functions up to make it more readable. This creates hard issues about inlining functions or not.
You may think inlining a bunch of functions will help in performance, but then you may increase the size of the code and start taking more instruction cache misses, which cost more than a function call. Some archs handle function calls better than others.
Yes, if a design improves the code by 1 or 2 percent, that may be rational to go with the more complex design. But if the more complex design only saves you a quarter a percent, and it is much more likely to carry bugs (more complex code is always more buggy) then it is not worth it. But as the kernel grows, each of those 1/4 percent performance regression adds up.
With things like ftrace and perf now in the kernel, we can start looking deeper at problem areas, and hopefully redesign things in a maintainable way to get some of our performance back.
Sometimes performance is not the main factor
Posted Oct 22, 2009 0:17 UTC (Thu) by nix (subscriber, #2304)
[Link]
That teaches it one peephole optimization, as a solid lump that can't be
split up or scheduled (as that would probably lose whatever property you
were trying to tell it). A likely loss.
(And, er, also, most optimizations can't be expressed usefully in source
code, and those that can are much too complicated to express by handing it
two hunks and saying 'this one is optimized'. The only property this could
usefully impart is trivial code motion optimizations, and those are
*transformations on graphs*, not a straight replacement of one lump of
source code with another.)
Now in some languages you *can* do something like this: start with
something non-optimized and prove to the compiler that it can transform it
into something optimized, and it can do that henceforward to all similar
constructs it encounters. But the 'something' is not going to be a lump of
C. Even Haskell's not really expressive enough for this sort of thing to
work, and doing it is *not* simple.
Sometimes performance is not the main factor
Posted Oct 21, 2009 18:22 UTC (Wed) by ncm (subscriber, #165)
[Link]
Is the speed of the code declining more slowly than the speed of processors
is increasing? It seems unlikely. GUI coders have succeed in maintaining
exponential decrease in speed, occasionally overwhelming Moore's law, but
kernel coders are made of less-stern stuff.
Sometimes performance is not the main factor
Posted Oct 22, 2009 0:18 UTC (Thu) by nix (subscriber, #2304)
[Link]
The speed of processors is increasing?
Their *parallelism* is increasing, but where I live the speed increases
topped out four or five years back.
Sometimes performance is not the main factor
Posted Oct 22, 2009 11:02 UTC (Thu) by djcapelis (subscriber, #53964)
[Link]
It's important to remember that just because clock speed isn't increasing at a large rate anymore doesn't mean the micro-architectures are standing still, even at a single-core level. Some of the improvements in the core microarchitecture over say... the P4 microarchitecture are welcome and should yield performance improvements.
That said, I would agree in general that single-core performance is not indeed climbing as much as it used to be, but to say it's completely standing still probably goes too far.
Even the clock speeds on consumer chips may still have room to improve. IBM was surprised to see everyone stop at 3Ghz which is why they turned that crank one last time to hit 5Ghz in POWER. They mostly did it for marketing purposes, but in terms of long-term trends, they seem to feel that that's the place clockspeeds top out. Not where we are now.
That said, none of this refutes your point really, but perhaps provides some different color to it.
Sometimes performance is not the main factor
Posted Oct 22, 2009 16:43 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
when you talk about how the micro architecture has improved since the P4, you also need to look a little futher back because the P4 was a huge step backwards in terms of micro architecture efficiancy (they went for lock speed instead)
Sometimes performance is not the main factor
Posted Oct 23, 2009 0:11 UTC (Fri) by djcapelis (subscriber, #53964)
[Link]
Yes, the P4 was a step backwards in terms of microarchitecture. But as I said, now that the ghz wars are over, people are working on improving microarchitectures and that's where performance improvement is focused.
Even if the only thing they were doing was rolling back the P4's braindead design decisions, this would still be true.
That said, this isn't the only thing they're doing and the new Core microarchitectures are really quite an improvement over anything Intel's ever done before. I'm not thrilled with the original Core, but Core 2 on has been an improvement.
I'm not saying it's great, clearly Intel's never going to do cutting edge radical microarchitectures like Tilera, Sun or even IBM (from more radical to least) but you can't possibly claim their newer microarchitectures aren't an improvement that offer tangible benefits to single-core performance.
Sometimes performance is not the main factor
Posted Oct 22, 2009 3:41 UTC (Thu) by daniels (subscriber, #16193)
[Link]
... how many cows do you have on your street?
Sometimes performance is not the main factor
Posted Oct 22, 2009 4:05 UTC (Thu) by nevets (subscriber, #11875)
[Link]
I'll give you a hint. I live on a road called "Farm to Market".
Sometimes performance is not the main factor
Posted Oct 22, 2009 23:12 UTC (Thu) by iabervon (subscriber, #722)
[Link]
The thing is that micro-optimizations don't always make the overall performance of the system better; they may make the function faster at the cost of leaving the cache slightly less useful. And the benchmark that focuses on that one function can see an improvement while it is causing some other concurrent task to get an additional cache miss in its main loop. Furthermore, getting the last 0.5% out of one architecture (or micro-architecture, or the code produced by one compiler version) may make others much less efficient.
I think the main trade-off is not micro-optimizations, but rather between code that just does what needs to be done and code that maintains data structures that will inform other code as to what state things are in. The old driver code, for example, set up the device, performed whatever operation was requested, and kept a small amount of state about it; it requested an interrupt from the device, and handled the interrupt when it came. The new code tracks the power management state of the device, what operations are pending that would prevent powering it off, what other hardware needs to be kept turned on to access the device, and so forth. It's a lot of bloat in the form of data that has to be stored and updated; it's also necessary to have any chance of having power management work.
Sometimes performance is not the main factor
Posted Oct 24, 2009 21:10 UTC (Sat) by NAR (subscriber, #1313)
[Link]
I think what's also important that maybe one size doesn't fit all, the optimizations that help large databases (or benchmarks simulating large databases) on one architecture might decrease the performance of e.g. video encoding on an other architecture. So some parts of the kernel should be configurable by some knobs.