Research on parallel data processing of data mining platform in the background of cloud computing-Reference-Cited by-同舟云学术

Research on parallel data processing of data mining platform in the background of cloud computing

Published:2021-01-01 Issue:1 Volume:30 Page:479-486
ISSN:2191-026X
Container-title:Journal of Intelligent Systems
language:en
Short-container-title:

Author:

Bu Lingrui¹,Zhang Hui¹,Xing Haiyan¹,Wu Lijun¹

Affiliation:

1. Shandong Labor Vocational and Technical College , Jinan , Shandong , , China

Abstract

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Information Systems,Software

Link

https://www.degruyter.com/document/doi/10.1515/jisys-2020-0113/pdf

Reference20 articles.

1. K. Siddique, Z. Akhtar, Z. Akhtar, E. J. Yoon, Y. S. Jeong, D. Dasgupta and Y. Kim, Apache Hama: An Emerging Bulk Synchronous Parallel Computing Framework for Big Data Applications, IEEE Access PP(2016), 1–1.

2. Y. Lu, B. Cao, C. Rego, and F. Glover, A Tabu Search based clustering algorithm and its parallel implementation on Spark, Appl Soft Comput 63 (2017), 97–109.

3. Y. Zhang, Z. Zhu, H. Cui, X. Dong, and H. Chen, Small files storing and computing optimization in Hadoop parallel rendering, Concurr Comp Pract E 29(2017), 1269–1274.

4. J. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng, and K. Li, A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment, IEEE T Parall Distr 28(2017), 919–933.

5. Y. Wang, J. Li, and H. H. Wang, Cluster and cloud computing framework for scientific metrology in flow control, Cluster Comput 22(2019), 1–10.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Composition and Optimization of Higher Education Management System Based on Data Mining Technology;Scientific Programming;2021-11-08