Abstract
This session re-caps on the Top-down µarch Analysis (TMA) method - that is widely adopted in performance profiling tools, Microarchitecture challenges faced in out-of-order cores, and the abstraction that helped the method to be universally supported across CPU vendors (Intel as well as AMD ARM).Then, we show how the primary TMA metrics of Frontend Bound, Bad Speculation, Core Bound, Memory Bound and Retiring can be used to classify and direct exploitation of popular software optimizations.The session closes with a use-case that got deployed in code generation of modern compilers. The use-case demonstrates how to mitigate Instruction Fetch Bandwidth issue through tuning of loop unrolling to speedup tight loops in recent wide-issue out-of-order cores.