The problem of disguised missing data-Reference-Cited by-同舟云学术

The problem of disguised missing data

Published:2006-06 Issue:1 Volume:8 Page:83-92
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Pearson Ronald K.¹

Affiliation:

1. ProSanos Corporation, Harrisburg, PA

Abstract

Missing data is a well-recognized problem in large datasets, widely discussed in the statistics and data analysis literature. Many programming environments provide explicit codes for missing data, but these are not standardized and are not always used. This lack of standardization is one of the leading causes of the subtle problem of disguised missing data , in which unknown, inapplicable, or otherwise nonspecified responses are encoded as valid data values. Following a brief overview of the problem of explicitly coded missing data, this paper discusses sources, consequences, and detection of disguised missing data, including two real-world examples. As the first of these examples illustrates, the consequences of disguised missing data can be quite serious. The key to its detection lies in first, recognizing disguised missing data as a possibility and second, finding a sufficiently informative view of the data to reveal its presence.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1147234.1147247

Reference25 articles.

1. V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley 3rd edition 1994. V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley 3rd edition 1994.

2. Heuristics of instability and stabilization in model selection

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Quantum mechanics-based missing value estimation framework for industrial data;Expert Systems with Applications;2024-02

2. A review on the significance of body temperature interpretation for early infectious disease diagnosis;Artificial Intelligence Review;2023-06-19

3. Efficient permutation testing of variable importance measures by the example of random forests;Computational Statistics & Data Analysis;2023-05

4. Spatiotemporal Generative Adversarial Imputation Networks: An Approach to Address Missing Data for Wind Turbines;IEEE Transactions on Instrumentation and Measurement;2023

5. Utilization of real‐world data in assessing treatment effectiveness for diffuse large B‐cell lymphoma;American Journal of Hematology;2022-10-31