1. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis;Ben-Nun;ACM Comput. Surv.,2019
2. Hybrid-molecular-dynamics algorithms for the numerical simulation of quantum chromodynamics;Gottlieb;Phys. Rev. D,1987
3. The design and implementation of FFTW3;Frigo;Proc. IEEE,2005
4. A. Sapio, M. Canini, C.-Y. Ho, J. Nelson, P. Kalnis, C. Kim, A. Krishnamurthy, M. Moshref, D.R.K. Ports, P. Richtárik, Scaling Distributed Machine Learning with In-Network Aggregation, in: Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 21, 2021.
5. An in-network architecture for accelerating shared-memory multiprocessor collectives;Klenk,2020