Author:
Teixeira Raquel,Rodrigues Carina,Moreira Carla,Barros Henrique,Camacho Rui
Abstract
AbstractThe timely identification of cohort participants at higher risk for attrition is important to earlier interventions and efficient use of research resources. Machine learning may have advantages over the conventional approaches to improve discrimination by analysing complex interactions among predictors. We developed predictive models of attrition applying a conventional regression model and different machine learning methods. A total of 542 very preterm (< 32 gestational weeks) infants born in Portugal as part of the European Effective Perinatal Intensive Care in Europe (EPICE) cohort were included. We tested a model with a fixed number of predictors (Baseline) and a second with a dynamic number of variables added from each follow-up (Incremental). Eight classification methods were applied: AdaBoost, Artificial Neural Networks, Functional Trees, J48, J48Consolidated, K-Nearest Neighbours, Random Forest and Logistic Regression. Performance was compared using AUC- PR (Area Under the Curve—Precision Recall), Accuracy, Sensitivity and F-measure. Attrition at the four follow-ups were, respectively: 16%, 25%, 13% and 17%. Both models demonstrated good predictive performance, AUC-PR ranging between 69 and 94.1 in Baseline and from 72.5 to 97.1 in Incremental model. Of the whole set of methods, Random Forest presented the best performance at all follow-ups [AUC-PR1: 94.1 (2.0); AUC-PR2: 91.2 (1.2); AUC-PR3: 97.1 (1.0); AUC-PR4: 96.5 (1.7)]. Logistic Regression performed well below Random Forest. The top-ranked predictors were common for both models in all follow-ups: birthweight, gestational age, maternal age, and length of hospital stay. Random Forest presented the highest capacity for prediction and provided interpretable predictors. Researchers involved in cohorts can benefit from our robust models to prepare for and prevent loss to follow-up by directing efforts toward individuals at higher risk.
Funder
Horizon 2020 Framework Programme
Fundação para a Ciência e a Tecnologia
Publisher
Springer Science and Business Media LLC
Reference56 articles.
1. Marcellus, L. Are we missing anything? Pursuing research on attrition. Can. J. Nurs. Res. Arch. 36, 82–98 (2004).
2. Nohr, E. A., Frydenberg, M., Henriksen, T. B. & Olsen, J. Does low participation in cohort studies induce bias?. Epidemiology 17, 413–418 (2006).
3. Touloumi, G., Pocock, S. J., Babiker, A. G. & Darbyshire, J. H. Impact of missing data due to selective dropouts in cohort studies and clinical trials. Epidemiology 13, 347–355 (2002).
4. Little, R. J. & Rubin, D. B. Statistical Analysis with Missing Data (Wiley, 2019).
5. Pedersen, A. B. et al. Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 9, 157 (2017).
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献