Author:
Ribeiro Caio,Freitas Alex A.
Abstract
AbstractSupervised machine learning algorithms rarely cope directly with the temporal information inherent to longitudinal datasets, which have multiple measurements of the same feature across several time points and are often generated by large health studies. In this paper we report on experiments which adapt the feature-selection function of decision tree-based classifiers to consider the temporal information in longitudinal datasets, using a lexicographic optimisation approach. This approach gives higher priority to the usual objective of maximising the information gain ratio, and it favours the selection of features more recently measured as a lower priority objective. Hence, when selecting between features with equivalent information gain ratio, priority is given to more recent measurements of biomedical features in our datasets. To evaluate the proposed approach, we performed experiments with 20 longitudinal datasets created from a human ageing study. The results of these experiments show that, in addition to an improvement in predictive accuracy for random forests, the changed feature-selection function promotes models based on more recent information that is more directly related to the subject’s current biomedical situation and, thus, intuitively more interpretable and actionable.
Publisher
Springer Science and Business Media LLC
Reference40 articles.
1. Abell J, Amin-Smith N, Banks J, Batty GD, Breeden J, Buffel T, Cadar D, Crawford R, Demakakos P, de Oliveira C, Hussey D, Lassale C, Matthews K, Nazroo J, Norton M, Oldfield Z, Oskala A, Prattley J, Steptoe A, Zaninotto P (2018) The dynamics of ageing: evidence from the English Longitudinal Study of Ageing 2002-2016 (Wave 8). Institute for Fiscal Studies, London. https://doi.org/10.1920/re.ifs.2019.0000. https://www.ifs.org.uk/publications/13510
2. Aghili M, Tabarestani S, Adjouadi M, Adeli E (2018) Predictive modeling of longitudinal data for Alzheimer’s disease diagnosis using rnns. In: International workshop on PRedictive Intelligence In MEdicine, pp 112–119. Springer
3. Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining Knowl Discov 31(3):606–660
4. Banks J, Batty G, Coughlin K, Deepchand K, Marmot M, Nazroo J, Oldfield Z, Steel N, Steptoe MA, Wood, Zaninotto P (2019) English longitudinal study of ageing: waves 0–8, 1998–2017 [data collection]
5. Basgalupp MP, Barros RC, de Carvalho AC, Freitas AA, Ruiz DD (2009) Legal-tree: a lexicographic multi-objective genetic algorithm for decision tree induction. In: Proceedings of the 2009 ACM symposium on applied computing. ACM, pp 1085–1090