A novel framework to enhance the performance of training distributed deep neural networks-Reference-Cited by-同舟云学术

A novel framework to enhance the performance of training distributed deep neural networks

Published:2023-05-18 Issue:3 Volume:27 Page:753-768
ISSN:1088-467X
Container-title:Intelligent Data Analysis
language:
Short-container-title:IDA

Author:

Phan Trung¹²,Do Phuc¹

Affiliation:

1. Faculty of Information Science And Engineering, University of Information Technology Vietnam National University, Ho Chi Minh City, Vietnam

2. Faculty of Information Technology, Hoa Sen University, Ho Chi Minh City, Vietnam

Abstract

There are many attempts to implement deep neural network (DNN) distributed training frameworks. In these attempts, Apache Spark was used to develop the frameworks. Each framework has its advantages and disadvantages and needs further improvements. In the process of using Apache Spark to implement distributed training systems, we ran into some obstacles that significantly affect the performance of the systems and programming thinking. This is the reason why we developed our own distributed training framework, called Distributed Deep Learning Framework (DDLF), which is completely independent of Apache Spark. Our proposed framework can overcome the obstacles and is highly scalable. DDLF helps to develop applications that train DNN in a distributed environment (referred to as distributed training) in a simple, natural, and flexible way. In this paper, we will analyze the obstacles when implementing a distributed training system on Apache Spark and present solutions to overcome them in DDLF. We also present the features of DDLF and how to implement a distributed DNN training application on this framework. In addition, we conduct experiments by training a Convolutional Neural Network (CNN) model with datasets MNIST and CIFAR-10 in Apache Spark cluster and DDLF cluster to demonstrate the flexibility and effectiveness of DDLF.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science

Reference15 articles.

1. A Survey on Distributed Machine Learning

2. Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster;Campos;Procedia Computer Science,2017

3. DMTree: A Novel Indexing Method for Finding Similarities in Large Vector Sets

4. Building a Vietnamese question answering system based on knowledge graph and distributed CNN

5. Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark