Affiliation:
1. Indian Institute of Technology Dharwad, Karnataka, India
2. Indian Institute of Technology Delhi, India
Abstract
Research in computer architecture is commonly done using software simulators. The simulation speed of such simulators is therefore critical to the rate of progress in research. One of the less commonly used ways to increase the simulation speed is to decompose the benchmark’s execution into contiguous chunks of instructions and simulate these chunks in parallel. Two issues arise from this approach. The first is of correctness, as each chunk (other than the first chunk) starts from an incorrect state. The second is of performance: The decomposition must be done in such a way that the simulation of all chunks finishes at nearly the same time, allowing for maximum speedup. In this article, we study these two aspects and compare three different chunking approaches (two of them are novel) and two warmup approaches (one of them is novel). We demonstrate that average speedups of up to 5.39X can be achieved (while employing eight parallel instances), while constraining the error to 0.2% on average.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,Modeling and Simulation
Reference12 articles.
1. [n.d.]. Pin—A Dynamic Binary Instrumentation Tool. Retrieved from http://www.pintool.org. [n.d.]. Pin—A Dynamic Binary Instrumentation Tool. Retrieved from http://www.pintool.org.
2. The gem5 simulator
3. The SimpleScalar tool set, version 2.0
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献