D2TS: a dual diversity tree selection approach to pruning of random forests-Reference-Cited by-同舟云学术

D2TS: a dual diversity tree selection approach to pruning of random forests

Published:2022-09-15 Issue:2 Volume:14 Page:467-481
ISSN:1868-8071
Container-title:International Journal of Machine Learning and Cybernetics
language:en
Short-container-title:Int. J. Mach. Learn. & Cyber.

Author:

Ragab Hassen Hani^ORCID,Alabdeen Yassin Zain,Gaber Mohamed Medhat,Sharma Megha

Abstract

AbstractRandom Forest is one of the most effective classification techniques. It is an ensemble technique with typically decision trees as its classifiers. Each tree votes for an outcome when a new instance is being classified, and a majority vote is taken to decide the final output. Two main factors play an essential role in Random Forests performance, namely diversity among trees in the forest and their number. Higher diversity increases prediction accuracy, whereas lower numbers of trees result in faster predictions. This paper aims at optimizing these two factors by using clustering analysis of trees in order to prune correlated trees while keeping outlier trees to maintain diversity. We group the trees into clusters and only take a number of representatives from each cluster while also keeping some or all of the outliers to preserve diversity. The resulting subset of trees will constitute a random forest of a reduced size. We will use the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for clustering. DBSCAN is one of the most commonly used clustering techniques and is immune to outliers. We will use DBSCAN to (i) group trees in clusters based on their prediction behaviour and (ii) identify outliers. Each of the clustered and outlier trees bring an element of diversity into the pruned random forest, thus giving our approach its dual diversity aspect. Our approach achieved up to a 99% pruning level while resulting in similar, or even better, accuracy compared to the original forests for 19 public datasets with varying properties. Our source code is publicly available on GitHub.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s13042-022-01644-1.pdf

Reference43 articles.

1. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. In: ACM Sigmod record, vol 28. ACM, p 49–60

2. Bader-El-Den M, Gaber M (2012) GARF: towards self-optimised random forests. In: International conference on neural information processing, Springer, p 506–515

3. Bakker B, Heskes T (2003) Clustering ensembles of neural network models. Neural Netw 16(2):261–269

4. Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: Neural networks, 2009. IJCNN 2009. International joint conference on. IEEE, p 302–307

5. Bernard S, Heutte L, Adam S (2010) A study of strength and correlation in random forests. In: International conference on intelligent computing, Springer, p 186–191

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data driven models for predicting pH of CO2 in aqueous solutions: Implications for CO2 sequestration;Results in Engineering;2024-09

2. Optimizing the number of branches in a decision forest using association rule metrics;Knowledge and Information Systems;2024-02-27