Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy-Reference-Cited by-同舟云学术

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

Published:2017-05-25 Issue:6 Volume:19 Page:247
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Abellán Joaquín,Castellano Javier

Abstract

Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, which is based on general entropy, can be used as a quick variable selection method. This measure ranks the importance of the attribute variables on a variable under study via the information obtained from a dataset. The main drawback is that it is always non-negative and it requires setting the information threshold to select the set of most important variables for each dataset. We introduce here a new quick variable selection method that generalizes the method based on the Info-Gain measure. It uses imprecise probabilities and the maximum entropy measure to select the most informative variables without setting a threshold. This new variable selection method, combined with the Naive Bayes classifier, improves the original method and provides a valuable tool for handling datasets with a very large number of features and a huge amount of data, where more complex methods are not computationally feasible.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/19/6/247/pdf

Reference27 articles.

1. Benchmarking attribute selection techniques for discrete class data mining

2. Induction of decision trees

3. Statistical Reasoning with Imprecise Probabilities;Walley,1991

4. A Mathematical Theory of Communication

5. Uncertainty and Information: Foundations of Generalized Information Theory;Klir,2005

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance of Hybrid Stacking and Bagging Methods Based on Machine Learning Algorithms in the Classification of Dengue Fever Incidence Rate;2023 3rd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA);2023-12-13

2. Closed-Loop Uncertainty: The Evaluation and Calibration of Uncertainty for Human–Machine Teams under Data Drift;Entropy;2023-10-12

3. Comparative Analysis of CNN Models and Bayesian Optimization-Based Machine Learning Algorithms in Leaf Type Classification;Balkan Journal of Electrical and Computer Engineering;2023-01-30

4. Improving the Results in Credit Scoring by Increasing Diversity in Ensembles of Classifiers;IEEE Access;2023

5. Sentiments comparison on Twitter about LGBT;Procedia Computer Science;2023