SynchroTrace-Reference-Cited by-同舟云学术

SynchroTrace

Published:2018-04-02 Issue:1 Volume:15 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Sangaiah Karthik¹,Lui Michael¹,Jagtap Radhika²,Diestelhorst Stephan²,Nilakantan Siddharth³,More Ankit⁴,Taskin Baris¹,Hempstead Mark⁵

Affiliation:

1. Drexel University, Philadelphia, PA

2. ARM Ltd., Cambridge, UK

3. NVIDIA Corporation

4. Intel Corporation

5. Tufts University, Medford, MA

Abstract

Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, allowing portability, and scalability. However, trace-based simulation approaches have difficulty capturing and accurately replaying multithreaded traces due to the inherent nondeterminism in the execution of multithreaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multithreaded simulation methodology. By recording synchronization events relevant to modern threading libraries (e.g., Pthreads and OpenMP) and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the nondeterminism of multithreaded programs for different hardware platforms and threading paradigms. Through capturing high-level instruction categories, the SynchroTrace average CPI trace Replay timing model offers fast and accurate simulation of many-core in-order CMPs. We perform two case studies to validate the SynchroTrace simulation flow against the gem5 full-system simulator: (1) a constraint-based design space exploration with traditional CMP benchmarks and (2) a thread-scalability study with HPC-representative applications. The results from these case studies show that (1) our trace-based approach with trace filtering has a peak speedup of up to 18.7× over simulation in gem5 full-system with an average of 9.6× speedup, (2) SynchroTrace maintains the thread-scaling accuracy of gem5 and can efficiently scale up to 64 threads, and (3) SynchroTrace can trace in one platform and model any platform in early stages of design.

Funder

National Science Foundation, including CAREER

NSF Graduate Research Fellowship

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3158642

Reference38 articles.

1. C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University Princeton NJ. C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University Princeton NJ.

2. The gem5 simulator

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Āpta: Fault-tolerant object-granular CXL disaggregated memory for accelerating FaaS;2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2023-06

2. Distributed Effect Evaluation Algorithm of Computer English Online Platform based on Hibernate Task-based Data Architecture;2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS);2022-05-25

3. Parallel I/O Evaluation Techniques and Emerging HPC Workloads: A Perspective;2021 IEEE International Conference on Cluster Computing (CLUSTER);2021-09

4. Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent Replication;2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA);2021-06

5. Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review;Lecture Notes in Computer Science;2021