Affiliation:
1. 1Department of Business Information Systems, Assumption University, Samut Prakan 10540, Kingdom of Thailand
Abstract
AbstractInstance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a parallel distance-based instance selection approach for a feed-forward neural network (FFNN), which can utilize all available processing power to reduce the data set while obtaining similar levels of classification accuracy as when the original data set is used. The algorithm identifies the instances at the decision boundary between consecutive classes of data, which are essential for placing hyperplane decision surfaces, and retains these instances in the reduced data set (subset). Each identified instance, called a prototype, is one of the representatives of the decision boundary of its class that constitutes the shape or distribution model of the data set. No feature or dimension is sacrificed in the reduction process. Regarding reduction capability, the algorithm obtains approximately 85% reduction power on non-overlapping two-class synthetic data sets, 70% reduction power on highly overlapping two-class synthetic data sets, and 77% reduction power on multiclass real-world data sets. Regarding generalization, the reduced data sets obtain similar levels of classification accuracy as when the original data set is used on both FFNN and support vector machine. Regarding execution time requirement, the speedup of the parallel algorithm over the serial algorithm is proportional to the number of threads the processor can run concurrently.
Subject
Artificial Intelligence,Information Systems,Software
Reference48 articles.
1. LIBSVM: a library for support vector machines;ACM Trans. Intell. Syst. Technol.,2011
2. Fuzzy logic approaches to structure preserving dimensionality reduction;IEEE Trans. Fuzzy Syst.,2002
3. Training data reduction to speed up SVM training;Appl. Intell.,2014
4. Asymptotic properties of nearest neighbor rules using edited data;IEEE Trans. Syst. Man Cybern.,1972
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献