Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data-Reference-Cited by-同舟云学术

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

Published:2022-01-13 Issue:1 Volume:17 Page:e0262131
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Mir Adil Aslam^ORCID,Kearfott Kimberlee Jane,Çelebi Fatih Vehbi,Rafique Muhammad^ORCID

Abstract

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested onsoil radon gas concentration (SRGC)data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from1stMarch 2017to the11thof May 2018, including4seismic activities that have taken place during the data collection time.

Funder

Higher Education Commision, Pakistan

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference77 articles.

1. The chemistry of Norwegian groundwaters: I. The distribution of radon, major and minor elements in 1604 crystalline bedrock groundwaters;D Banks;Science of The Total Environment,1998

2. Preliminary experiences with 222Rn gas in Arizona homes;KJ Kearfott;Health physics,1989

3. Mitigation of elevated indoor radon gas resulting from underground air return usage;K Kearfott;Health physics,1992

4. The role of atmospheric conditions in CO2 and radon emissions from an abandoned water well;E Levintal;Science of The Total Environment,2020

5. Radon concentration levels in ground water from Toluca, Mexico;M Olguin;Science of The Total Environment,1993

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient use of binned data for imputing univariate time series data;Frontiers in Big Data;2024-08-21

2. Time series data recovery in SHM of large-scale bridges: Leveraging GAN and Bi-LSTM networks;Structures;2024-05

3. A Study on TVAE-Based Data Augmentation and Verification to Predict Physiologically Active Ingredients of Medicine Plants According to Climate Change;2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC);2024-02-19

4. Time Series Reconstruction With Feature-Driven Imputation: A Comparison of Base Learning Algorithms;IEEE Access;2024

5. A Comprehensive Bibliometric Analysis of Missing Value Imputation;IEEE Access;2024