Alzheimer stage diagnosis from genomic and clinical data modalities using ‘Deep Learning’


Sarma Manash1,Chatterjee Subarna1





INTRODUCTION: This study focusses on diagnosis of stages of AD (Alzheimer’s disease) including MCI (Mild Cognitive Impairment) from two data modalities - gene expression and clinical data of ADNI (Alzheimer’s Disease Neuroimaging Initiative ) participants using multiclassification. The gene expression dataset is highly imbalanced and of HDLSS (high-dimensional and low-sample-size) characteristics. This is the only study where multiclassification based AD stage diagnosis is done to identify multiple stages of Alzheimer. We are able to achieve the best multiclassification result in both the modalities and identify new genetic biomarkers. METHODS: Combination of XGBoost and SFBS (“Sequential Floating Backward Selection”) methods is used to select features. We are able to select the most effective 95 gene probsets out of 49,386. For clinical study data, 8 most effective biomarkers could be selected using SFBS. For both genomic and clinical data, DL (‘Deep Learning’) classifier is used to identify stages - CN (Cognitive Normal), MCI (Mild Cognitive Impairment), AD (Alzheimer’s Disease / Dementia). Because of high data imbalance in genomic data, border line oversampling is used for model training and original data for validation. RESULT & DISCUSSION: With clinical data, we achieved ‘ROC AUC’ scores 0.97, 0.95, 0.94 for CN, MCI, Dementia stage respectively . We achieve ‘ROC AUC’ scores 0.75, 0.74, 0.70 for CN, MCI, Dementia stage respectively and 0.67 for both micro average F1 scores and micro weighted F1 score. This is the best result so far for AD stage diagnosis from gene expression profile data through multiclassification with ADNI data. Results reflect that our multiclassification model can efficiently handle the imbalanced data of HDLSS nature to identify samples of minority class. MAPK14, ZNF835, MID1, HLA-DQA1, TEP1 are some of the new genes found to be associated with AD risk. DRAXIN, HSPA12B, USP47 etc. are found to be AD preventive or suppressor.


Research Square Platform LLC

Reference11 articles.

1. 1. Angelucci F, Spalletta G, di Iulio F, Ciaramella A, Salani F, Colantoni L, Varsi AE, Gianni W, Sancesario G, Caltagirone C, Bossù P. Alzheimer's disease (AD) and Mild Cognitive Impairment (MCI) patients are characterized by increased BDNF serum levels. Curr Alzheimer Res. 2010 Feb;7(1):15–20. doi: 10.2174/156720510790274473. PMID: 20205668.

2. 2. Cummings, JL., Morstorf, T., Zhong, K.: Alzheimer’s disease drug development pipeline: few candidates, frequent failures. Alzheimer’s Res Ther (2014)

3. 3. A. A. Willette, V. D. Calhoun, J. M. Egan, D. Kapogiannis, and A. s. D. N. Initiative, "Prognostic classification of mild cognitive impairment and Alzheimer s disease: MRI independent component analysis," Psychiatry Research: Neuroimaging, vol. 224, no. 2, pp. 81–88, 2014.

4. 4. H. Gorji and J. Haddadnia, "A novel method for early diagnosis of Alzheimer's disease based on pseudo Zernike moment from structural MRI," Neuroscience, vol. 305, pp. 361–371, 2015.

5. 5. Tanzi RE. The genetics of Alzheimer disease. Cold Spring Harb Perspect Med. 2012 Oct 1;2(10):a006296. doi: 10.1101/cshperspect.a006296. PMID: 23028126; PMCID: PMC3475404. Shen L, Jia J. An Overview of Genome-Wide Association Studies in Alzheimer's Disease. Neurosci Bull. 2016;32(2):183–190. doi:10.1007/s12264-016-0011-3 “Genetics.” Alzheimer's Disease and Dementia, Marian AJ. Molecular genetic studies of complex phenotypes. Transl Res. 2012;159:64–79. doi: 10.1016/j.trsl.2011.08.001. Lee T, Lee H. Prediction of Alzheimer's disease using blood gene expression data. Sci Rep. 2020 Feb 26;10(1):3485. doi: 10.1038/s41598-020-60595-1. PMID: 32103140; PMCID: PMC7044318. Patel H, Dobson RJB, Newhouse SJ. A Meta-Analysis of Alzheimer's Disease Brain Transcriptomic Data. J Alzheimers Dis. 2019;68(4):1635–1656. doi: 10.3233/JAD-181085. PMID: 30909231; PMCID: PMC6484273. Liew CC, Ma J, Tang HC, Zheng R, Dempsey AA. The peripheral blood transcriptome dynamically reflects system wide biology: a potential diagnostic tool. J Lab Clin Med. 2006;147:126–32. Saykin AJ, Shen L, Foroud TM, et al. Alzheimer's Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimers Dement. 2010;6(3):265–273. doi:10.1016/j.jalz.2010.03.013 P. Fehlbaum-Beurdeley et al., "Toward an Alzheimer's disease diagnosis via high-resolution blood gene expression," Alzheimer's & Dementia, vol. 6, no. 1, pp. 25–38, 2010. K. Lunnon et al., "A blood gene expression marker of early Alzheimer's disease," Journal Of Alzheimer's Disease, vol. 33, no. 3, pp. 737–753, 2013. Li, H. et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of Alzheimer’s disease. Sci. Rep. 7, 14027 (2017). Li, X. et al. Systematic analysis and biomarker study for Alzheimer’s disease. Sci. Rep. 8, 17394 (2018). C. Park, J. Ha and S. Park, "Prediction of Alzheimer's disease based on deep neural network by integrating gene expression and DNA methylation dataset", Expert Syst. Appl., vol. 140, pp. 112873, 2020. Kalkan H, Akkaya UM, Inal-Gültekin G, Sanchez-Perez AM. Prediction of Alzheimer's Disease by a Novel Image-Based Representation of Gene Expression. Genes (Basel). 2022 Aug 8;13(8):1406. doi: 10.3390/genes13081406. PMID: 36011317; PMCID: PMC9407775. Shen, Liran and Qingbo Yin. “The classification for High-dimension low-sample size data.” Pattern Recognit. 130 (2020): 108828. Sarma, M., Chatterjee, S. (2020). Identification and Prediction of Alzheimer Based on Biomarkers Using ‘Machine Learning’. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. Catchpoole DR, Kennedy P, Skillicorn DB, Simoff S (2010) The curse of dimensionality: a blessing to personalized medicine. J Clin Oncol 28: 723–724. Marcilio, Wilson Estecio and Danilo Medeiros Eler. “From explanations to feature selection: assessing SHAP values as feature selection mechanism.” 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (2020): 340–347. Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer International Publishing: Cham, Switzerland, 2018; pp. 197–226. Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5, 221–232 (2016). Ahmed, S.F., Alam, M.S.B., Hassan, M. et al. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artif Intell Rev 56, 13521–13617 (2023). Brownlee, J. Imbalanced Classification with Python. (2020) Chawla, N. V. et al. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. Han, H., Wang, WY., Mao, BH. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg.







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3