Merging of Numerical Intervals in Entropy-Based Discretization-Reference-Cited by-同舟云学术

Merging of Numerical Intervals in Entropy-Based Discretization

Published:2018-11-16 Issue:11 Volume:20 Page:880
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Grzymala-Busse Jerzy,Mroczek Teresa

Abstract

As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

http://www.mdpi.com/1099-4300/20/11/880/pdf

Reference39 articles.

1. Online fuzzy time series analysis based on entropy discretization and a Fast Fourier Transform

2. Global discretization of continuous attributes as preprocessing for machine learning

3. Entropy and MDL discretization of continuous variables for Bayesian belief networks

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Correlation‐based sequence alignment models for detecting masquerades in cloud computing;IET Information Security;2020-01

2. Evaluating the Sustainability of Microfinance Institutions Considering Macro-Environmental Factors: A Cross-Country Study;Sustainability;2019-10-25