Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
Author:
Chakrabarti Shweta1, Biswas Nupur1, Karnani Khushi2, Padul Vijay1, Jones Lawrence D.3, Kesari Santosh4, Ashili Shashaanka3ORCID
Affiliation:
1. Rhenix Lifesciences, Hyderabad 500038, India 2. Department of BioSciences and BioEngineering, Indian Institute of Technology, Guwahati 781039, India 3. CureScience, 5820 Oberlin Dr, 202, San Diego, CA 92121, USA 4. Department of Translational Neurosciences, Pacific Neuroscience Institute and Saint John’s Cancer Institute at Providence Saint John’s Health Center, Santa Monica, CA 90404, USA
Abstract
The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains.
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference44 articles.
1. Garcia-Duran, A., and West, R. (2021, January 6–11). Recursive Input and State Estimation: A General Framework for Learning from Time Series with Missing Data. Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021, Toronto, ON, Canada. 2. A survey on missing data in machine learning;Emmanuel;J. Big Data,2021 3. Wu, X., Mattingly, S., Mirjafari, S., Huang, C., and Chawla, N.V. (2020, January 19–23). Personalized Imputation on Wearable-Sensory Time Series via Knowledge Transfer. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM: Virtual Event, Ireland. 4. Bogl, M., Filzmoser, P., Gschwandtner, T., Miksch, S., Aigner, W., Rind, A., and Lammarsch, T. (2015, January 25–30). Visually and Statistically Guided Imputation of Missing Values in Univariate Seasonal Time Series. Proceedings of the 2015 IEEE Conference on Visual An-alytics Science and Technology (VAST), Chicago, IL, USA. 5. Multiple Imputation in Practice: Comparison of Software Packages for Regression Models with Missing Variables;Horton;Am. Stat.,2001
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Wearable Sensors as a Preoperative Assessment Tool: A Review;Sensors;2024-01-12 2. Handling missing data in the time-series data from wearables;Time Series Analysis - Recent Advances, New Perspectives and Applications [Working Title];2023-08-24
|
|