The Impact of Missing Data on Data Mining-Reference-Cited by-同舟云学术

The Impact of Missing Data on Data Mining

Published:2003 Issue: Volume: Page:174-198
ISSN:
Container-title:Data Mining
language:
Short-container-title:

Author:

Brown Marvin L.¹,Kros John F.²

Affiliation:

1. Hawaii Pacific University, USA

2. East Carolina University, USA

Abstract

Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers, and noise. The actual data-mining process deals significantly with prediction, estimation, classification, pattern recognition, and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for model training and testing. The issue of missing data must be addressed since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions.

Publisher

IGI Global

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing eCommerce Data;Advances in Electronic Commerce;2024-09-27

2. Application of Data Fusion via Canonical Polyadic Decomposition in Risk Assessment of Musculoskeletal Disorders in Construction: Procedure and Stability Evaluation;Journal of Construction Engineering and Management;2021-08

3. Handling of missing data to improve the mining of large feed databases1;Journal of Animal Science;2013-01-01

4. Imprecise Data and the Data Mining Process;Encyclopedia of Data Warehousing and Mining, Second Edition;2009