Author:
Noronha Marta D.M.,Zárate Luis E.
Abstract
Characterizing longevity profiles from longitudinal studies is a task with many challenges. Firstly, the longitudinal databases usually have high dimensionality, and the similarities between long-lived and non-long-lived records are a highly burdening task for profile characterization. Addressing these issues, in this work, we use data from the English Longitudinal Study of Ageing (ELSA-UK) to characterize longevity profiles through data mining. We propose a method for feature engineering for reducing data dimensionality through merging techniques, factor analysis and biclustering. We apply biclustering to select relevant features discriminating both profiles. Two classification models, one based on a decision tree and the other on a random forest, are built from the preprocessed dataset. Experiments show that our methodology can successfully discriminate longevity profiles. We identify insights into features contributing to individuals being long-lived or non-long-lived. According to the results presented by both models, the main factor that impacts longevity is related to the correlations between the economic situation and the mobility of the elderly. We suggest that this methodology can be applied to identify longevity profiles from other longitudinal studies since that factor is deemed relevant for profile classification.