Author:
Ali Rahman,Zada Muhammad Sadiq Hassan,Khatak Asad Masood,Hussain Jamil
Abstract
AbstractIn practical data mining, a wide range of classification algorithms is employed for prediction tasks. However, selecting the best algorithm poses a challenging task for machine learning practitioners and experts, primarily due to the inherent variability in the characteristics of classification problems, referred to as datasets, and the unpredictable performance of these algorithms. Dataset characteristics are quantified in terms of meta-features, while classifier performance is evaluated using various performance metrics. The assessment of classifiers through empirical methods across multiple classification datasets, while considering multiple performance metrics, presents a computationally expensive and time-consuming obstacle in the pursuit of selecting the optimal algorithm. Furthermore, the scarcity of sufficient training data, denoted by dimensions representing the number of datasets and the feature space described by meta-feature perspectives, adds further complexity to the process of algorithm selection using classical machine learning methods. This research paper presents an integrated framework called eML-CBR that combines edge edge-ML and case-based reasoning methodologies to accurately address the algorithm selection problem. It adapts a multi-level, multi-view case-based reasoning methodology, considering data from diverse feature dimensions and the algorithms from multiple performance aspects, that distributes computations to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module. The eML-CBR framework offers a suite of contributions, including integrated algorithm selection, multi-view meta-feature extraction, innovative performance criteria, improved algorithm recommendation, data scarcity mitigation through incremental learning, and an open-source CBR module, reshaping research paradigms. The CBR module, trained on 100 datasets and tested with 52 datasets using 9 decision tree algorithms, achieved an accuracy of 94% for correct classifier recommendations within the top k=3 algorithms, making it highly suitable for practical classification applications.
Funder
Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Reference32 articles.
1. Koerich, A.L. Improving classification performance using metaclasses. in SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme-System Security and Assurance (Cat. No. 03CH37483). 2003. IEEE.
2. Tavakoli, S., Signal classification using weighted orthogonal regression method. arXiv preprint arXiv:2010.05979, 2020.
3. Bouckaert RR et al (2010) WEKA–-experiences with a java open-source project. J Mach Learn Res 11:2533–2541
4. Jalernrat, S., Data Mining Using Decision Tree Algorithms. University of the Thai Chamber of Commerce Journal, 2013: p. 11-43.
5. Engel, J., T. Erickson, and L. Martignon. Teaching about decision trees for classification problems. in IASE Satellite Meeting, https://iase-web.org/documents/papers/sat2019/IASE2019% 20Satellite% 20132_E NGEL. pdf. 2019.