Affiliation:
1. Dept. of Business Administration, University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, 26500 Rio, Patras, Greece
2. UPAIRC, 26500 Rio, Patras, Greece
Abstract
Contrary to much of the research in machine learning where there is a concentration on problems with relatively small volume of data, one of the main challenges of the today's data mining systems is their ability to handle data that is substantially larger than available main memory on a single processor. In this paper, we present a distributed technique that consists in combining the partial results of different classifiers supplied, in parallel, with different subsets of data. It is, actually, a two-phase process. At first, a number of classifiers are trained, each with a different subset of the data. Then, the trained classifiers are used in the construction of a new training data set which has exactly the same format as, but is substantially smaller than, the initial one. The new data set is used to train the final classifier through an iterative process, which is guided by a threshold concerning the size of the data set and the achieved increase in accuracy. We present extensive empirical tests, which demonstrate that the proposed technique significantly reduces time complexity, usually at the expense of a lower accuracy, compared to a single classifier supplied with all the data.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Artificial Intelligence
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Big Data Clustering with Kernel k-Means: Resources, Time and Performance;International Journal on Artificial Intelligence Tools;2018-06
2. SVR+RVR: A ROBUST SPARSE KERNEL METHOD FOR REGRESSION;International Journal on Artificial Intelligence Tools;2010-10
3. DISTRIBUTED MINING OF ASSOCIATION RULES BASED ON REDUCING THE SUPPORT THRESHOLD;International Journal on Artificial Intelligence Tools;2008-12
4. ON MERGING CLASSIFICATION RULES;International Journal of Information Technology & Decision Making;2008-09