A data-driven missing value imputation approach for longitudinal datasets-Reference-Cited by-同舟云学术

A data-driven missing value imputation approach for longitudinal datasets

Published:2021-03-06 Issue:8 Volume:54 Page:6277-6307
ISSN:0269-2821
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Ribeiro Caio^ORCID,Freitas Alex A.

Abstract

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10462-021-09963-5.pdf

Reference29 articles.

1. Albridge KM, Standish J, Fries JF (1988) Hierarchical time-oriented approaches to missing data inference. Computers and Biomedical Research 21(4):349–366

2. Banks J, Breeze E, Lessof C, Nazroo J (2016) The dynamics of ageing: Evidence from the English Longitudinal Study of Ageing 2002–15 (Wave 7). Institute for Fiscal Studies, London. http://www.elsa-project.ac.uk/publicationDetails/id/8696

3. Banks J, Batty G, Coughlin K, Deepchand K, Marmot M, Nazroo J, Oldfield Z, Steel N, Steptoe MA, Wood, Zaninotto P (2019) English longitudinal study of ageing: Waves 0–8, 1998–2017.[data collection]

4. Belger M, Haro J, Reed C, Happich M, Kahle-Wrobleski K, Argimon J, Bruno G, Dodel R, Jones R, Vellas B et al (2016) How to deal with missing longitudinal data in cost of illness analysis in alzheimer’s disease–suggestions from the geras observational study. BMC Medical Research Methodology 16(1):83

5. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful?. In: International conference on database theory. Springer, pp 217–235

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep learning to predict rapid progression of Alzheimer’s disease from pooled clinical trials: A retrospective study;PLOS Digital Health;2024-04-10

2. Conditional Generative Adversarial Network for Early Classification of Longitudinal Datasets Using an Imputation Approach;ACM Transactions on Knowledge Discovery from Data;2024-03-26

3. A lexicographic optimisation approach to promote more recent features on longitudinal decision-tree-based classifiers: applications to the English Longitudinal Study of Ageing;Artificial Intelligence Review;2024-03-09

4. Machine Learning Based Missing Data Imputation in Categorical Datasets;IEEE Access;2024

5. Fusion Learning of Regression Models for Missing Data Imputation in Breast Cancer Dataset;2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI);2023-12-29