Affiliation:
1. Harvard University, Cambridge, MA
2. Facebook AI
Abstract
State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference39 articles.
1. Soheil Bahrampour Naveen Ramakrishnan Lukas Schott and Mohak Shah. 2015. Comparative study of Caffe Neon Theano and Torch for deep learning. arXiv preprint arXiv:1511.06435. Soheil Bahrampour Naveen Ramakrishnan Lukas Schott and Mohak Shah. 2015. Comparative study of Caffe Neon Theano and Torch for deep learning. arXiv preprint arXiv:1511.06435.
2. Julia: A Fresh Approach to Numerical Computing
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献