Author:
You Xin,Yang Hailong,Luan Zhongzhi,Liu Yi,Qian Depei
Publisher
Springer International Publishing
Reference35 articles.
1. Bez, J.L., Bernart, E.E., dos Santos, F.F., Schnorr, L.M., Navaux, P.O.A.: Performance and energy efficiency analysis of HPC physics simulation applications in a cluster of arm processors. Concurrency Comput.: Pract. Experience 29(22), e4014 (2017)
2. Blackford, L.S., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)
3. Blackmore, C., Ray, O., Eder, K.: Automatically tuning the GCC compiler to optimize the performance of applications running on the ARM cortex-M3. CoRR (2017)
4. Bock, N., et al.: The basic matrix library (BML) for quantum chemistry. J. Supercomput. 74(11), 6201–6219 (2018)
5. Chen, D., Fang, J., Chen, S., Xu, C., Wang, Z.: Optimizing sparse matrix-vector multiplications on an ARMv8-based many-core architecture. Int. J. Parallel Program. 1–15 (2018)
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30
2. GraphCube: Interconnection Hierarchy-aware Graph Processing;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20
3. TH-Allreduce: Optimizing Small Data Allreduce Operation on Tianhe System;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17
4. A heterogeneous parallel model of unstructured mesh finite element method based on CPU+GPU;Highlights in Science, Engineering and Technology;2023-11-29
5. Optimizing Depthwise Convolutions on ARMv8 Architecture;Parallel and Distributed Computing, Applications and Technologies;2023