Affiliation:
1. Zhejiang University, Hangzhou, China
2. The Hong Kong University of Science and Technology, Hong Kong, China
Abstract
Data imputation has been extensively explored to solve the missing data problem. The dramatically rising volume of missing data makes the training of imputation models computationally infeasible in real-life scenarios. In this paper, we propose an efficient and effective data imputation system with
influence functions
, named EDIT, which quickly trains a parametric imputation model with representative samples under imputation accuracy guarantees. EDIT mainly consists of two modules, i.e., an
imputation influence evaluation
(IIE) module and a
representative sample selection
(RSS) module. IIE leverages the influence functions to estimate the effect of (in)complete samples on the prediction result of parametric imputation models. RSS builds a minimum set of the high-effect samples to satisfy a user-specified imputation accuracy. Moreover, we introduce a weighted loss function that drives the parametric imputation model to pay more attention on the high-effect samples. Extensive experiments upon ten state-of-the-art imputation methods demonstrate that, EDIT adopts only about 5% samples to speed up the model training by 4x in average with more than 11% accuracy gain.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference48 articles.
1. An introduction to kernel and nearest -neighbor non-parametric regression;Altman Naomi S;The American Statistician,1992
2. Discovery of genuine functional dependencies from relational data with missing values
3. DataWig: Missing value imputation for tables;Biessmann Felix;Journal of Machine Learning Research,2019
4. Muzellec Boris Josse Julie Boyer Claire and Cuturi Marco. 2020. Missing data imputation using optimal transport. In ICML. 1--18. Muzellec Boris Josse Julie Boyer Claire and Cuturi Marco. 2020. Missing data imputation using optimal transport. In ICML. 1--18.
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献