Affiliation:
1. Intel Corporation, Santa Clara, CA
2. The University of Michigan, Ann Arbor, MI
Abstract
With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing speed of the CPU and the memory access latency. Previous work has shown that database applications have large instruction and data footprints and hence do not use processor caches effectively. In this paper, we propose Call Graph Prefetching (CGP), an N instruction prefetching technique that analyzes the call graph of a database system and prefetches instructions from the function that is deemed likely to be called next. CGP capitalizes on the highly predictable function call sequences that are typical of database systems. CGP can be implemented either in software or in hardware. The software-based CGP (
CGP_S
) uses profile information to build a call graph, and uses the predictable call sequences in the call graph to determine which function to prefetch next. The hardware-based CGP(
CGP_H
) uses a hardware table, called the Call Graph History Cache (CGHC), to dynamically store sequences of functions invoked during program execution, and uses that stored history when choosing which functions to prefetch.We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries, as well as on CPU-2000 benchmarks. For most CPU-2000 applications the number of instruction cache (I-cache) misses were very few even without any prefetching, obviating the need for CGP. On the other hand, the database workloads do suffer a significant number of I-cache misses;
CGP_S
improves their performance by 23% and
CGP_H
by 26% over a baseline system that has already been highly tuned for efficient I-cache usage by using the OM tool. CGP, with or without OM, reduces the I-cache miss stall time by about 50% relative to O5+OM, taking us about half way from an already highly tuned baseline system toward perfect I-cache performance.
Publisher
Association for Computing Machinery (ACM)
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Protean: Resource-efficient Instruction Prefetching;Proceedings of the International Symposium on Memory Systems;2023-10-02
2. A Storage-Effective BTB Organization for Servers;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02
3. EIRES: Efficient Integration of Remote Data in Event Stream Processing;Proceedings of the 2021 International Conference on Management of Data;2021-06-09
4. Divide and Conquer Frontend Bottleneck;2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA);2020-05
5. Schedtask;Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture;2017-10-14