Affiliation:
1. Uppsala University&University of Edinburgh, Uppsala, Sweden
2. University of Edinburgh, Edinburgh, United Kingdom
Abstract
The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction working sets. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between performance and metadata storage costs. This work introduces Shotgun, a BTB-directed front-end prefetcher powered by a new BTB organization that maintains a logical map of an application's instruction footprint, which enables high-efficacy prefetching at low storage cost. To map active code regions, Shotgun precisely tracks an application's global control flow (e.g., function and trap routine entry points) and summarizes local control flow within each code region. Because the local control flow enjoys high spatial locality, with most functions comprised of a handful of instruction cache blocks, it lends itself to a compact region-based encoding. Meanwhile, the global control flow is naturally captured by the application's unconditional branch working set (calls, returns, traps). Based on these insights, Shotgun devotes the bulk of its BTB capacity to branches responsible for the global control flow and a spatial encoding of their target regions. By effectively capturing a map of the application's instruction footprint in the BTB, Shotgun enables highly effective BTB-directed prefetching. Using a storage budget equivalent to a conventional BTB, Shotgun outperforms the state-of-the-art BTB-directed front-end prefetcher by up to 14% on a set of varied commercial workloads.
Funder
Engineering and Physical Sciences Research Council
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. UDP: Utility-Driven Fetch Directed Instruction Prefetching;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29
2. Enhancing Power Efficiency in Branch Target Buffer Design with a Two-Level Prediction Mechanism;Electronics;2024-03-23
3. HHVM Performance Optimization for Large Scale Web Services;Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering;2023-04-15
4. Thermometer;Proceedings of the 49th Annual International Symposium on Computer Architecture;2022-06-11
5. Morrigan: A Composite Instruction TLB Prefetcher;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17