User: Password:
Subscribe / Log in / New account

Dueling performance monitors

Dueling performance monitors

Posted Dec 13, 2008 14:19 UTC (Sat) by saffroy (subscriber, #43999)
In reply to: Dueling performance monitors by giraffedata
Parent article: Dueling performance monitors

Sometimes you want to go beyond algorithmic optimization in your program, and want to know if and how a particular piece of code could run any faster. There can be many reasons why the current code is not optimal yet: it could be causing frequent cache misses, or TLB misses, or branch prediction would not work well enough, etc. But without the hardware telling you exactly what is happening, all you can do is guess.

It's an easier game when the hardware helps you: that's why modern processors can be programmed to keep counters of events relating to performance issues, such as cache misses, or TLB misses, or branch prediction issues... Processors can also be programmed to generate an interrupt when a counter reaches a certain threshold (ie. when it "overflows"): at this point, the operating system can record which exact piece of code was running when this event occurred. Over time, you can thus accumulate statistics telling you how often your particular piece of code encounters one of the aforementioned performance problems.

Given these statistics, you can make a more educated guess as to how your code could be improved (eg. re-arrange some structure to reduce cache misses, etc).

A classic paper from Digital (1997) explains how they implemented it on their Alpha platforms:

The "batches" mentioned in the article relates to the number of performance registers (counters) that can be read in one shot.


(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds