Affiliation:
1. Ahmadu Bello University
Abstract
Abstract
Classification technique in data mining focuses on prediction which is done by classical C4.5 decision tree algorithm, but limited by its computation complexities due to large datasets. However, this results to inefficient implementation of the algorithm with reference to computing time, memory utilization and data complexity. Meanwhile, several researches have been done to curb these limitations. One of such improvements is the parallelizing of the algorithm using the MapReduce model. This involves splitting the large dataset into smaller units and distributing them on multiple computers for parallel processing, but the recursive nature of the algorithm makes the computational cost high due to large number of calculations that are repeated. This research is aimed at further reducing computation time, using memoized MapReduce model that involves storing the result of previous calculations in a cache. Thus, when same calculations re-occur, the cached result is returned, thereby eliminating re-computation.
Publisher
Research Square Platform LLC
Reference26 articles.
1. Improved C4.5 Decision Tree Classifier Algorithm for Analysis of Data Mining Application;Badgujar G;International Journal for Research in Engineering Application & Management,2017
2. Becklas, A. (2018). FIFA World Cup. Kaggle Repository. Kaggle Inc. Retrieved October 17, 2019, from https://www.kaggle.com/abecklas/fifa-world-cup
3. Very Fast C4.5 Decision Tree Algorithm;Cherfi A;Applied Artificial Intelligence,2018
4. “A MapReduce Implementation of C4.5 Decision Tree Algorithm;Dai W;International Journal of Database Theory Application,2014
5. “MapReduce: simplified data processing on large clusters;Dean J;Communications of the ACM,2008