Abstract
Nowadays, in the topics related to prediction, in addition to increasing the accuracy of existing algorithms, the reduction of computational time is a challenging issue that has attracted much attention. Since the existing methods may not have enough efficiency and accuracy, we use a combination of machine-learning algorithms and statistical methods to solve this problem. Furthermore, we reduce the computational time in the testing model by automatically reducing the number of trees using penalized methods and ensembling the remaining trees. We call this efficient combinatorial method “ensemble of clustered and penalized random forest (ECAPRAF)”. This method consists of four fundamental parts. In the first part, k-means clustering is used to identify homogeneous subsets of data and assign them to similar groups. In the second part, a tree-based algorithm is used within each cluster as a predictor model; in this work, random forest is selected. In the next part, penalized methods are used to reduce the number of random-forest trees and remove high-variance trees from the proposed model. This increases model accuracy and decreases the computational time in the test phase. In the last part, the remaining trees within each cluster are combined. The results of the simulation and two real datasets based on the WRMSE criterion show that our proposed method has better performance than the traditional random forest by reducing approximately 12.75%, 11.82%, 12.93%, and 11.68% and selecting 99, 106, 113, and 118 trees for the ECAPRAF–EN algorithm.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference38 articles.
1. Bagging predictors
2. Joint induction of shape features and tree classifiers
3. Ensemble of Optimal Trees, Random Forest and Random Projection Ensemble Classification;Khan,2020
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献