Alzheimer stage diagnosis from genomic and clinical data modalities using ‘Deep Learning’

Author:

Sarma Manash1,Chatterjee Subarna1

Affiliation:

1. MSRUAS

Abstract

Abstract

INTRODUCTION: This study focusses on diagnosis of stages of AD (Alzheimer’s disease) including MCI (Mild Cognitive Impairment) from two data modalities - gene expression and clinical data of ADNI (Alzheimer’s Disease Neuroimaging Initiative ) participants using multiclassification. The gene expression dataset is highly imbalanced and of HDLSS (high-dimensional and low-sample-size) characteristics. This is the only study where multiclassification based AD stage diagnosis is done to identify multiple stages of Alzheimer. We are able to achieve the best multiclassification result in both the modalities and identify new genetic biomarkers. METHODS: Combination of XGBoost and SFBS (“Sequential Floating Backward Selection”) methods is used to select features. We are able to select the most effective 95 gene probsets out of 49,386. For clinical study data, 8 most effective biomarkers could be selected using SFBS. For both genomic and clinical data, DL (‘Deep Learning’) classifier is used to identify stages - CN (Cognitive Normal), MCI (Mild Cognitive Impairment), AD (Alzheimer’s Disease / Dementia). Because of high data imbalance in genomic data, border line oversampling is used for model training and original data for validation. RESULT & DISCUSSION: With clinical data, we achieved ‘ROC AUC’ scores 0.97, 0.95, 0.94 for CN, MCI, Dementia stage respectively . We achieve ‘ROC AUC’ scores 0.75, 0.74, 0.70 for CN, MCI, Dementia stage respectively and 0.67 for both micro average F1 scores and micro weighted F1 score. This is the best result so far for AD stage diagnosis from gene expression profile data through multiclassification with ADNI data. Results reflect that our multiclassification model can efficiently handle the imbalanced data of HDLSS nature to identify samples of minority class. MAPK14, ZNF835, MID1, HLA-DQA1, TEP1 are some of the new genes found to be associated with AD risk. DRAXIN, HSPA12B, USP47 etc. are found to be AD preventive or suppressor.

Publisher

Research Square Platform LLC

Reference11 articles.

1. 1. Angelucci F, Spalletta G, di Iulio F, Ciaramella A, Salani F, Colantoni L, Varsi AE, Gianni W, Sancesario G, Caltagirone C, Bossù P. Alzheimer's disease (AD) and Mild Cognitive Impairment (MCI) patients are characterized by increased BDNF serum levels. Curr Alzheimer Res. 2010 Feb;7(1):15–20. doi: 10.2174/156720510790274473. PMID: 20205668.

2. 2. Cummings, JL., Morstorf, T., Zhong, K.: Alzheimer’s disease drug development pipeline: few candidates, frequent failures. Alzheimer’s Res Ther (2014)

3. 3. A. A. Willette, V. D. Calhoun, J. M. Egan, D. Kapogiannis, and A. s. D. N. Initiative, "Prognostic classification of mild cognitive impairment and Alzheimer s disease: MRI independent component analysis," Psychiatry Research: Neuroimaging, vol. 224, no. 2, pp. 81–88, 2014.

4. 4. H. Gorji and J. Haddadnia, "A novel method for early diagnosis of Alzheimer's disease based on pseudo Zernike moment from structural MRI," Neuroscience, vol. 305, pp. 361–371, 2015.

5. 5. Tanzi RE. The genetics of Alzheimer disease. Cold Spring Harb Perspect Med. 2012 Oct 1;2(10):a006296. doi: 10.1101/cshperspect.a006296. PMID: 23028126; PMCID: PMC3475404. Shen L, Jia J. An Overview of Genome-Wide Association Studies in Alzheimer's Disease. Neurosci Bull. 2016;32(2):183–190. doi:10.1007/s12264-016-0011-3 “Genetics.” Alzheimer's Disease and Dementia, www.alz.org/alzheimers-dementia/what-is-alzheimers/causes-and-risk-factors/genetics. Marian AJ. Molecular genetic studies of complex phenotypes. Transl Res. 2012;159:64–79. doi: 10.1016/j.trsl.2011.08.001. Lee T, Lee H. Prediction of Alzheimer's disease using blood gene expression data. Sci Rep. 2020 Feb 26;10(1):3485. doi: 10.1038/s41598-020-60595-1. PMID: 32103140; PMCID: PMC7044318. Patel H, Dobson RJB, Newhouse SJ. A Meta-Analysis of Alzheimer's Disease Brain Transcriptomic Data. J Alzheimers Dis. 2019;68(4):1635–1656. doi: 10.3233/JAD-181085. PMID: 30909231; PMCID: PMC6484273. Liew CC, Ma J, Tang HC, Zheng R, Dempsey AA. The peripheral blood transcriptome dynamically reflects system wide biology: a potential diagnostic tool. J Lab Clin Med. 2006;147:126–32. Saykin AJ, Shen L, Foroud TM, et al. Alzheimer's Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimers Dement. 2010;6(3):265–273. doi:10.1016/j.jalz.2010.03.013 P. Fehlbaum-Beurdeley et al., "Toward an Alzheimer's disease diagnosis via high-resolution blood gene expression," Alzheimer's & Dementia, vol. 6, no. 1, pp. 25–38, 2010. K. Lunnon et al., "A blood gene expression marker of early Alzheimer's disease," Journal Of Alzheimer's Disease, vol. 33, no. 3, pp. 737–753, 2013. Li, H. et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of Alzheimer’s disease. Sci. Rep. 7, 14027 (2017). Li, X. et al. Systematic analysis and biomarker study for Alzheimer’s disease. Sci. Rep. 8, 17394 (2018). C. Park, J. Ha and S. Park, "Prediction of Alzheimer's disease based on deep neural network by integrating gene expression and DNA methylation dataset", Expert Syst. Appl., vol. 140, pp. 112873, 2020. Kalkan H, Akkaya UM, Inal-Gültekin G, Sanchez-Perez AM. Prediction of Alzheimer's Disease by a Novel Image-Based Representation of Gene Expression. Genes (Basel). 2022 Aug 8;13(8):1406. doi: 10.3390/genes13081406. PMID: 36011317; PMCID: PMC9407775. Shen, Liran and Qingbo Yin. “The classification for High-dimension low-sample size data.” Pattern Recognit. 130 (2020): 108828. Sarma, M., Chatterjee, S. (2020). Identification and Prediction of Alzheimer Based on Biomarkers Using ‘Machine Learning’. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_23 Catchpoole DR, Kennedy P, Skillicorn DB, Simoff S (2010) The curse of dimensionality: a blessing to personalized medicine. J Clin Oncol 28: 723–724. Marcilio, Wilson Estecio and Danilo Medeiros Eler. “From explanations to feature selection: assessing SHAP values as feature selection mechanism.” 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (2020): 340–347. Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer International Publishing: Cham, Switzerland, 2018; pp. 197–226. Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5, 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0 Ahmed, S.F., Alam, M.S.B., Hassan, M. et al. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artif Intell Rev 56, 13521–13617 (2023). https://doi.org/10.1007/s10462-023-10466-8 Brownlee, J. Imbalanced Classification with Python. (2020) Chawla, N. V. et al. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953 Han, H., Wang, WY., Mao, BH. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_91

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3