* Not all Intel chips have full hardware counter support under perf. Most notably, Pentium Pro/II/III or Pentium 4. One might argue that these are old and don't matter, but they are supported by perfmon2. Pentium 4 is the troublesome one, because the performance counters for that architecture don't map well at all to the abstraction chosen by the perf developers.
* I find calling perf a "simple" command line tool to be a bit deceptive. It is quite complicated and not very well documented yet.
* There is still some lingering bitterness about how Ingo took over perfcounters, sort of the same way he took over amd64 and the CFS scheduler. Mainly because he is re-inventing everything from scratch and making many mistakes that the other implementations already learned the hard way. Also the perf implementation does a lot of things, such as abusing ioctl()s, that the perfmon2 developers were told would not be allowed in the kernel and they wasted a lot of time working around these issues, only to find out it didn't matter in the end.
* I will admit the perf developers can be helpful, especially if you bug them enough. Upon prompting, they've reduced the static aggregate count overhead in the perf tool from a few thousand instructions to near zero.
* I personally think the abstraction they chose of having "common" counters hard-wired in the kernel to be a bad one, because as they are already finding out every chip and chip revision has different counters with different issues. perfmon2 took the saner route to do this in user-space; once 2.6.31 is released the ABI is frozen and we're going to be stuck with this.