Abstract
Data in the Internet of Things (IoT) domain may be missing due to connectivity errors, environmental extremes, sensor malfunctions, and human errors. Despite the many approaches for imputing missing values, the most significant difficulty in terms of imputation precision or compute complexity for larger missing sub-sequences in uni-variate series is still being explored. This work introduced IMD-MP (Imputation of Missing Data using Matrix Profile), a new technique that improves imputation accuracy for big data analysis in IoT applications based on spatial-temporal correlations using a novel distance metric Matrix Profile Distance (MPD). Our method preserves spatial correlation by grouping the sensors present in the network (using grouping algorithm-GA) to impute the missing data of the failed sensor node. After grouping, similar sensor nodes to the failed sensor node are identified using the Node Similarity Algorithm (NSF). From its similar sensor data, a certain number of sub-sequences that are most similar to the one preceding the failed node’s missing values are gathered. These sub-sequences heights are optimized to ensure temporal correlation in the imputed data. To find the optimal imputation sequence, the current research uses MPD and similarity scores. Numerical findings using sensor data from real-time environmental mon-itoring and Intel data sets demonstrate the algorithm’s effectiveness compared to other benchmarks.