Abstract
AbstractWe describe a new simulated annealing algorithm to compute near-optimal oblique splits in the context of decision tree induction. The algorithm can be interpreted as a walk on the cells of a hyperplane arrangement defined by the observations in the training set. The cells of this hyperplane arrangement correspond to subsets of oblique splits that divide the feature space in the same manner and the vertices of this arrangement reveal multiple neighboring solutions. We use a pivoting strategy to iterate over the vertices and to explore this neighborhood. Embedding this neighborhood search in a simulated annealing framework allows to escape local minima and increases the probability of finding global optimal solutions. To overcome the problems related to degeneracy, we rely on a lexicographic pivoting scheme. Our experimental results indicate that our approach is well-suited for inducing small and accurate decision trees and capable of outperforming existing univariate and oblique decision tree induction algorithms. Furthermore, oblique decision trees obtained with this method are competitive with other popular prediction models.
Funder
Technische Universität Clausthal
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability