Author:
Yang Fengyu,Fan Yongjian,Xie Lingze,Zhong Yihui
Abstract
Abstract
With the rapid development of the information age, a large amount of data is used in popular research areas such as data mining. Missing data has a very serious impact on both the process and the result of data mining, and it is important to find out how to fill the missing values accurately and efficiently. In this paper, we propose a method to optimally fill the missing values based on the backpropagation of evaluation functions for non-time-series data. Based on both the target value error and its own error after filling, four classical filling methods, namely mean, interpolation, model prediction, and K-nearest neighbor, are considered for selection. Finally, single-model padding and multi-model weighted padding schemes are compared, and the results show that the padding method with the highest fitness value is selected to work best for data with different degrees of missingness in different datasets.
Subject
Computer Science Applications,History,Education