Abstract
During the last years, deep learning (DL) models have been used in several applications with large datasets and complex models. These applications require methods to train models faster, such as distributed deep learning (DDL). This paper proposes an empirical approach aiming to measure the speedup of DDL achieved by using different parallelism strategies on the nodes. Local parallelism is considered quite important in the design of a time-performing multi-node architecture because DDL depends on the time required by all the nodes. The impact of computational resources (CPU and GPU) is also discussed since the GPU is known to speed up computations. Experimental results show that the local parallelism impacts the global speedup of the DDL depending on the neural model complexity and the size of the dataset. Moreover, our approach achieves a better speedup than Horovod.
Funder
Fédération Wallonie-Bruxelles
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference32 articles.
1. Horovod: Fast and easy distributed deep learning in TensorFlow;Sergeev;arXiv,2018
2. Demystifying Parallel and Distributed Deep Learning
3. Parallel and Distributed Deep Learning;Hegde,2016
4. Large-Scale Deep Belief Nets With MapReduce
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献