Research on Parallel Support Vector Machine Based on Spark Big Data Platform-Reference-Cited by-同舟云学术

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Published:2021-12-17 Issue: Volume:2021 Page:1-9
ISSN:1875-919X
Container-title:Scientific Programming
language:en
Short-container-title:Scientific Programming

Author:

Huimin Yao¹^ORCID

Affiliation:

1. Management School, University of Bath, Bath, UK

Abstract

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Link

http://downloads.hindawi.com/journals/sp/2021/7998417.pdf

Reference28 articles.

1. Information Security in Big Data: Privacy and Data Mining

2. Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study

3. A survey on indexing techniques for big data: taxonomy and performance evaluation

4. Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination

5. Application of network protocol improvement and image content search in mathematical calculus 3D modeling video analysis;X. Wang;AEJ - Alexandria Engineering Journal,2021

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mastering Precision in Pivotal Variables Defining Wine Quality via Incremental Analysis of Baseline Accuracy;IEEE Access;2024

2. Performance Optimization of Machine Learning Algorithms Based on Spark;Applied Mathematics and Nonlinear Sciences;2024-01-01

3. Construction of a big data platform for infrastructure in the rural revitalisation strategy;Applied Mathematics and Nonlinear Sciences;2023-11-08

4. A novel design and application of spatial data management platform for natural resources;Journal of Cleaner Production;2023-07