Real-World Data Difficulty Estimation with the Use of Entropy-Reference-Cited by-同舟云学术

Real-World Data Difficulty Estimation with the Use of Entropy

Published:2021-12-01 Issue:12 Volume:23 Page:1621
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Juszczuk Przemysław^ORCID,Kozak Jan^ORCID,Dziczkowski Grzegorz^ORCID,Głowania Szymon^ORCID,Jach Tomasz^ORCID,Probierz Barbara^ORCID

Abstract

In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/23/12/1621/pdf

Reference65 articles.

1. Big data analytics and machine learning: A retrospective overview and bibliometric analysis

2. Overview and comparative study of dimensionality reduction techniques for high dimensional data

3. Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions

4. A 50-year personal journey through time with principal component analysis

5. A self-adaptive weighted differential evolution approach for large-scale feature selection

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Polycentric spatial Structure, digital economy and urban green sustainable development;Journal of Cleaner Production;2024-08

2. A Novel Classification Method: Neighborhood-Based Positive Unlabeled Learning Using Decision Tree (NPULUD);Entropy;2024-05-04

3. Entropy removal of medical diagnostics;Scientific Reports;2024-01-12

4. Stepwise Approach to Automatically Building an Ensemble of Classifiers on Football Data;Communications in Computer and Information Science;2024

5. Edge valency-based entropies of tetrahedral sheets of clay minerals;PLOS ONE;2023-07-21