Affiliation:
1. Pennsylvania State University, State College, PA, USA
2. University of North Texas, Denton, TX, USA
3. Yonsei University, Seoul, South Korea
4. TOBB University of Economics and Technology, Ankara, Turkey
Abstract
One cost that plays a significant role in shaping the overall performance of both single-threaded and multi-thread applications in modern computing systems is the cost of moving data between compute elements and storage elements. Traditional approaches to address this cost are code and data layout reorganizations and various hardware enhancements. More recently, an alternative paradigm, called Near Data Computing (NDC) or Near Data Processing (NDP), has been shown to be effective in reducing the data movements costs, by moving computation to data, instead of the traditional approach of moving data to computation. Unfortunately, the existing Near Data Computing proposals require significant modifications to hardware and are yet to be widely adopted.
In this paper, we present a software-only (compiler-driven) approach to reducing data movement costs in both single-threaded and multi-threaded applications. Our approach, referred to as Computing with Near Data (CND), is built upon a concept called "recomputation," in which a costly data access is replaced by a few less costly data accesses plus some extra computation, if the cumulative cost of the latter is less than that of the costly data access. If implemented carefully, CND can successfully trade off data access with computation, and considering the continuously increasing latency gap between the two, doing so can significantly reduce the execution latencies of both sequential and parallel application programs.
We i) quantify the intrinsic recomputability of a set of single-threaded and multi-threaded applications, ii) propose a practical, compiler-driven approach that automatically transforms a given application code fragment to a version that employs recomputation, iii) discuss an optimization strategy that increases recomputability; and iv) compare CND, both qualitatively and quantitatively, against NDC. Our experimental analysis of CND reveals that i) the average recomputability across our benchmarks is 51.1%, ii) our compiler-driven strategy is able to exploit 79.3% of the recomputation opportunities presented by our workloads, and iii) our enhancements increase the value of the recomputability metric significantly. As a result, our compiler-driven approach with the proposed enhancements brings an average execution time improvement of 40.1%.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Data Recomputation for Multithreaded Applications;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28
2. Memory Space Recycling;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2022-02-24
3. Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17
4. Compiler support for near data computing;Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2021-02-17
5. Quantifying Data Locality in Dynamic Parallelism in GPUs;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2018-12-21