Good article. One subtle thing to note is that "offcore" and "uncore" events, while they might sound related, are very different in practice.
Offcore Response events let you gather counts for cache results that have gone "offcore". There is a separate MSR (outside the normal MSR range) that you can set that will filter these cache results based on exactly where the offcore cache request was handled.
Much of the debate referenced in this article was sparked because this extra MSR was newly exposed via the "config1" field in the perf_event_attr structure in the perf_events.h header file. (It still is, just now your program will fail if you try to program that field).
Uncore events are events handled by a separate PMU that handles chip-wide (rather than per-core) counters. This includes things like the memory controller, interconnect logic, and the last-level cache. To support this properly the kernel will have to be patched in much more intrusive ways (to support having multiple PMUs active at once, and to also figure out complicated issues. Such as: an uncore-PMU counteroverflowed. Which of the cores gets notified? Which cores is it even relevant too? A tricky problem).
As per Ingo's complaint that you have to pass dense hex values to perf... that's a failing with perf. You can use the very nice "libpfm4" library to use nice names for these events as documented in the various vendor manuals. There have been patches to have perf link against libpfm4, but it doesn't seem to have gone anywhere.