Enriching data imputation with extensive similarity neighbors-Reference-Cited by-同舟云学术

Enriching data imputation with extensive similarity neighbors

Published:2015-07 Issue:11 Volume:8 Page:1286-1297
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Song Shaoxu¹,Zhang Aoqian¹,Chen Lei²,Wang Jianmin¹

Affiliation:

1. Tsinghua University, China

2. The Hong Kong University of Science & Technology, Hong Kong

Abstract

Incomplete information often occur along with many database applications, e.g., in data integration, data cleaning or data exchange. The idea of data imputation is to fill the missing data with the values of its neighbors who share the same information. Such neighbors could either be identified certainly by editing rules or statistically by relational dependency networks. Unfortunately, owing to data sparsity, the number of neighbors (identified w.r.t. value equality) is rather limited, especially in the presence of data values with variances. In this paper, we argue to extensively enrich similarity neighbors by similarity rules with tolerance to small variations. More fillings can thus be acquired that the aforesaid equality neighbors fail to reveal. To fill the missing values more , we study the problem of maximizing the missing data imputation. Our major contributions include (1) the np-hardness analysis on solving and approximating the problem, (2) exact algorithms for tackling the problem, and (3) efficient approximation with performance guarantees. Experiments on real and synthetic data sets demonstrate that the filling accuracy can be improved.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2809974.2809989

Cited by 34 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Missing Value Imputation for Radar-Derived Time-Series Tracks of Aerial Targets Based on Improved Self-Attention-Based Network;Computers, Materials & Continua;2024

2. Computing Minimum Subset Repair on Incomplete Data;Lecture Notes in Computer Science;2024

3. MDRAE: An Attention Mechanism-Based Autoencoder for Missing Data Recovery in Smart Grids;2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2);2023-12-15

4. Data Imputation Under Similarity Rule Constraints Using Fuzzy Multi-Objective Programming;2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13

5. The MICS Project: A Data Science Pipeline for Industry 4.0 Applications;2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE);2023-10-25