Affiliation:
1. Dalian Maritime University, Dalian, China
2. Northeastern University, Shenyang, China
3. The Pennsylvania State University
4. University of Louisiana at Lafayette, Lafayette
Abstract
Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called
Real-time and Error-tolerant Missing vAlue ImputatioN
(REMAIN), to impute MVs in poor-quality streaming data. Instead of imputing MVs based on
all
the observed data, REMAIN first initializes the MV imputation model based on
a-RANSAC
which is capable of detecting and rejecting anomalies in an efficient manner, and then incrementally updates the model parameters upon the arrival of new data to support real-time MV imputation. As the correlations among attributes of the data may change over time in unforseenable ways, we devise a
deterioration detection
mechanism to capture the deterioration of the imputation model to further improve the imputation accuracy. Finally, we conduct an extensive evaluation on the proposed algorithms using real-world and synthetic datasets. Experimental results demonstrate that REMAIN achieves significantly higher imputation accuracy over existing solutions. Meanwhile, REMAIN improves up to one order of magnitude in time cost compared with existing approaches.
Funder
China Postdoctoral Science Foundation
National Science Foundation
Liaoning Revitalization Talents Program
National Natural Science Foundation of China
Liaoning Collaborative Fund
Publisher
Association for Computing Machinery (ACM)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献