Affiliation:
1. Tsinghua University, China
2. The Hong Kong University of Science & Technology, Hong Kong
Abstract
Incomplete information often occur along with many database applications, e.g., in data integration, data cleaning or data exchange. The idea of data imputation is to fill the missing data with the values of its neighbors who share the same information. Such neighbors could either be identified certainly by editing rules or statistically by relational dependency networks. Unfortunately, owing to data sparsity, the number of neighbors (identified w.r.t. value equality) is rather limited, especially in the presence of data values with
variances.
In this paper, we argue to extensively enrich similarity neighbors by similarity rules with tolerance to small variations.
More
fillings can thus be acquired that the aforesaid equality neighbors fail to reveal. To fill the missing values
more
, we study the problem of maximizing the missing data imputation. Our major contributions include (1) the np-hardness analysis on solving and approximating the problem, (2) exact algorithms for tackling the problem, and (3) efficient approximation with performance guarantees. Experiments on real and synthetic data sets demonstrate that the filling accuracy can be improved.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Missing Value Imputation for Radar-Derived Time-Series Tracks of Aerial Targets Based on Improved Self-Attention-Based Network;Computers, Materials & Continua;2024
2. Computing Minimum Subset Repair on Incomplete Data;Lecture Notes in Computer Science;2024
3. MDRAE: An Attention Mechanism-Based Autoencoder for Missing Data Recovery in Smart Grids;2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2);2023-12-15
4. Data Imputation Under Similarity Rule Constraints Using Fuzzy Multi-Objective Programming;2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13
5. The MICS Project: A Data Science Pipeline for Industry 4.0 Applications;2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE);2023-10-25