This blog post is mostly so I don't forget this kind of stuff.
http://software.intel.com/sites/default/files/m/a/d/2/2/e/15529-Intel_VTune_Using.pdf mentions "% execution stalled". This is the core i7 document rather than the Sandy Bridge document, but bear with me.
The formula is:
However, there's no UOPS_EXECUTED.CORE_STALL_CYCLES in the PMC documentation, nor is it in the Intel SDM chapter on performance counters.
But wait! It kind of is there. There /is/ UOPS_EXECUTED.THREAD, which is "Counts the total number of uops to be executed per thread each cycle." In the same block, it says that to count stall cycles, set CMASK=1, INV=1. Ok, so how does one do that with PMC?
# pmcstat -S UOPS_EXECUTED.THREAD,inv,cmask=1 -T -w 5
Now, it seems to be showing me the ACPI wait and MWAIT functions as high sample events - which is odd, as I didn't think this particular PMC measured C1 and MWAIT states. I'll chase this up.
For Sandy Bridge it's UOPS_DISPATCHED.THREAD - this counts dispatched micro-operatons per-thread each cycle. CMASK=1,INV=1 counts the number of stall cycles.