Affiliation:
1. Computer Systems Laboratory, Stanford University, CA
Abstract
The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 to 5 times the latency to local memory. CC-NOW machines provide the benefits of cache coherence to networks of workstations, at the cost of even higher remote access latency. Given the large remote access latencies of these architectures, data locality is potentially the most important performance issue. Using realistic workloads, we study the performance improvements provided by OS supported dynamic page migration and replication. Analyzing our kernel-based implementation, we provide a detailed breakdown of the costs. We show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses may not be a consistent approximation for cache misses. Finally, our experiments show that dynamic page migration and replication can substantially increase application performance, as much as 30%, and reduce contention for resources in the NUMA memory system.
Publisher
Association for Computing Machinery (ACM)
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Unveiling the Power of Data Structures: Exploring Applications in Diverse Computing Domains;2023 3rd Asian Conference on Innovation in Technology (ASIANCON);2023-08-25
2. Adapt Burstable Containers to Variable CPU Resources;IEEE Transactions on Computers;2023-03-01
3. Improving the efficiency of sparse matrix class processing by using the SPM-CSR parallel algorithm and OpenMP technology;2022 4th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE);2022-03-17
4. Distance-in-time versus distance-in-space;Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation;2021-06-18
5. Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms;International Journal of Parallel Programming;2013-04-24