A Better Decision Tree: The Max-Cut Decision Tree with Modified PCA Improves Accuracy and Running Time-Reference-Cited by-同舟云学术

A Better Decision Tree: The Max-Cut Decision Tree with Modified PCA Improves Accuracy and Running Time

Published:2022-05-31 Issue:4 Volume:3 Page:
ISSN:2661-8907
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Bodine Jonathan,Hochbaum Dorit S.

Abstract

AbstractDecision trees are a widely used method for classification, both alone and as the building blocks of multiple different ensemble learning methods. The Max Cut decision tree introduced here involves novel modifications to a standard, baseline variant of a classification decision tree, CART Gini. One modification involves an alternative splitting metric, Max Cut, based on maximizing the distance between all pairs of observations that belong to separate classes and separate sides of the threshold value. The other modification, Node Means PCA, selects the decision feature from a linear combination of the input features constructed using an adjustment to principal component analysis (PCA) locally at each node. Our experiments show that this node-based, localized PCA with the Max Cut splitting metric can dramatically improve classification accuracy while also significantly decreasing computational time compared to the CART Gini decision tree. These improvements are most significant for higher-dimensional datasets. For the example dataset CIFAR-100, the modifications enabled a 49% improvement in accuracy, relative to CART Gini, while providing a

$$6.8 \times$$

6.8 × speed up compared to the Scikit-Learn implementation of CART Gini. These introduced modifications are expected to dramatically advance the capabilities of decision trees for difficult classification tasks.

Funder

national science foundation

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s42979-022-01147-4.pdf

Reference17 articles.

1. Behnel S, Bradshaw R, Citro C, Dalcin L, Seljebotn DS, Smith K. Cython: The best of both worlds. Comput Sci Eng. 2011;13(2):31–9.

2. Bodine J, Hochbaum DS. The max-cut decision tree: Improving on the accuracy and running time of decision trees. In: Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR,, pp. 59–70. INSTICC, SciTePress (2020). https://doi.org/10.5220/0010107400590070

3. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/a:1010933404324.

4. Breiman L, Friedman J, Stone C, Olshen R. Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis (1984). https://books.google.com/books?id=JwQx-WOmSyQC

5. Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst. 2009;47:547–53.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Classification of Indian Native English Accents;Advances in Web Technologies and Engineering;2024-05-16

2. A decomposition scheme for continuous Level of Detail, streaming and lossy compression of unordered point clouds;Graphical Models;2023-12