Affiliation:
1. School of Computer Science & Informatics, Cardiff University, Cardiff, UK
Abstract
Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give results that are consistent with the timing results for the particle force computation obtained on NVidia GeForce GTX960 and Tesla P100 GPUs. Further analysis of the observed memory use, in terms of cache hits and the number of memory transactions, provides a more detailed explanation of execution behaviour for the different orderings. To the best of our knowledge, this is the first study to investigate memory analysis and data locality issues for molecular dynamics simulations of Lennard-Jones fluids on NVidia’s Maxwell and Tesla architectures.
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays;Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering;2024-05-07
2. Large Screen for 3D Data Visualization Based on RFG-SVM Algorithm;Innovative Computing Vol 1 - Emerging Topics in Artificial Intelligence;2023
3. Efficient 3D Hilbert Curve Encoding and Decoding Algorithms;Chinese Journal of Electronics;2022-03