Evaluating the Effects of Modern Storage Devices on the Efficiency of Parallel Machine Learning Algorithms-Reference-Cited by-同舟云学术

Evaluating the Effects of Modern Storage Devices on the Efficiency of Parallel Machine Learning Algorithms

Published:2020-06 Issue:03n04 Volume:29 Page:2060008
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

Akritidis Leonidas¹^ORCID,Fevgas Athanasios¹,Tsompanopoulou Panagiota¹,Bozanis Panayiotis¹

Affiliation:

1. Data Structuring & Engineering Lab, Department of Electrical and Computer Engineering, University of Thessaly, 37 Glavani & 28th October Str, 382 21 Volos, Greece

Abstract

Big Data analytics is presently one of the most emerging areas of research for both organizations and enterprises. The requirement for deployment of efficient machine learning algorithms over huge amounts of data led to the development of parallelization frameworks and of specialized libraries (like Mahout and MLlib) which implement the most important among these algorithms. Moreover, the recent advances in storage technology resulted in the introduction of high-performing devices, broadly known as Solid State Drives (SSDs). Compared to the traditional Hard Drives (HDDs), SSDs offer considerably higher performance and lower power consumption. Motivated by these appealing features and the growing necessity for efficient large-scale data processing, we compared the performance of several machine learning algorithms on MapReduce clusters whose nodes are equipped with HDDs, SSDs, and devices which implement the latest 3D XPoint technology. In particular, we evaluate several dataset preprocessing methods like vectorization and dimensionality reduction, two supervised classifiers, Naive Bayes and Linear Regression, and the popular k-Means clustering algorithm. We use an experimental cluster equipped with the three aforementioned storage devices under different configurations, and two large datasets, Wikipedia and HIGGS. The experiments showed that the benefits which derive from the usage of SSDs depend on the cluster setup and the nature of the applied algorithms.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Artificial Intelligence

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218213020600088

Reference12 articles.

1. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

2. Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce

3. Large-Scale Deep Belief Nets With MapReduce

4. Efficient implementation of a multi-dimensional index structure over flash memory storage systems

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Median-based Resilient Distributed Optimization Algorithm Against Byzantine Attack;International Journal on Artificial Intelligence Tools;2022-09