Contribution to Decision Tree Induction with Python: A Review-Reference-Cited by-同舟云学术

Contribution to Decision Tree Induction with Python: A Review

Published:2021-01-20 Issue: Volume: Page:
ISSN:
Container-title:Data Mining - Methods, Applications and Systems
language:
Short-container-title:

Author:

Lamrini Bouchra

Abstract

Among the learning algorithms, one of the most popular and easiest to understand is the decision tree induction. The popularity of this method is related to three nice characteristics: interpretability, efficiency, and flexibility. Decision tree can be used for both classification and regression kind of problem. Automatic learning of a decision tree is characterised by the fact that it uses logic and mathematics to generate rules instead of selecting them based on intuition and subjectivity. In this review, we present essential steps to understand the fundamental concepts and mathematics behind decision tree from training to building. We study criteria and pruning algorithms, which have been proposed to control complexity and optimize decision tree performance. A discussion around several works and tools will be exposed to analyze the techniques of variance reduction, which do not improve or change the representation bias of decision tree. We chose Pima Indians Diabetes dataset to cover essential questions to understand pruning process. The paper’s original contribution is to provide an up-to-date overview that is fully focused on implemented algorithms to build and optimize decision trees. This contributes to evolve future developments of decision tree induction.

Publisher

IntechOpen

Link

http://www.intechopen.com/download/pdf/72646

Reference36 articles.

1. Morgan J, Sonquist J. Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association. 1963;58(2):415-435

2. Morgan J, Messenger R. THAID-A Sequential Analysis Program for the Analysis of Nominal Scale Dependent Variables. Ann Arbor: Survey Research Center, Institute for Social Research, University of Michigan; 1973

3. Kass G. An exploratory technique for investigating large quantities of categorical data. Applied Statistics. 1973;29(2):119-127

4. Breiman L, Friedman J, Stone C, Olshen R. Classification and Regression Trees. Taylor & Francis;; 1984. Available from: https://books.google.fr/books?id=JwQx-WOmSyQC

5. Hunt E, Marin J, Stone P. Experiments in Induction. New York, NY, USA: Academic Press; 1997. Available from: http://www.univ-tebessa.dz/fichiers/mosta/544f77fe0cf29473161c8f87.pdf

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees;Entropy;2022-11-04

2. State-of-the-art review on advancements of data mining in structural health monitoring;Measurement;2022-04