Multi-Modality Machine Learning Predicting Parkinson’s Disease

Author:

Makarious Mary B.ORCID,Leonard Hampton L.,Vitale Dan,Iwaki Hirotaka,Sargent Lana,Dadu Anant,Violich Ivo,Hutchins Elizabeth,Saffo David,Bandres-Ciga Sara,Kim Jonggeol Jeff,Song Yeajin,Bookman Matt,Nojopranoto Willy,Campbell Roy H.,Hashemi Sayed Hadi,Botia Juan A.ORCID,Carter John F.,Maleknia Melina,Craig David W.,Keuren-Jensen Kendall Van,Morris Huw R.,Hardy John A.,Blauwendraat Cornelis,Singleton Andrew B.,Faghri Faraz,Nalls Mike A.

Abstract

SUMMARYBackgroundPersonalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multi-modal data is key moving forward. We build upon previous work to deliver multi-modal predictions of Parkinson’s Disease (PD).MethodsWe performed automated ML on multi-modal data from the Parkinson’s Progression Marker Initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Finally, networks were built to identify gene communities specific to PD.FindingsOur initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification, increased the diagnosis prediction accuracy (balanced accuracy) and other metrics. Combining data modalities outperforms the single biomarker paradigm. UPSIT was the largest contributing predictor for the classification of PD. The transcriptomic data was used to construct a network of disease-relevant transcripts.InterpretationWe have built a model using an automated ML pipeline to make improved multi-omic predictions of PD. The model developed improves disease risk prediction, a critical step for better assessment of PD risk. We constructed gene expression networks for the next generation of genomics-derived interventions. Our automated ML approach allows complex predictive models to be reproducible and accessible to the community.FundingNational Institute on Aging, National Institute of Neurological Disorders and Stroke, the Michael J. Fox Foundation, and the Global Parkinson’s Genetics Program.RESEARCH IN CONTEXTEvidence before this studyPrior research into predictors of Parkinson’s disease (PD) has either used basic statistical methods to make predictions across data modalities, or they have focused on a single data type or biomarker model. We have done this using an open-source automated machine learning (ML) framework on extensive multi-modal data, which we believe yields robust and reproducible results. We consider this the first true multi-modality ML study of PD risk classification.Added value of this studyWe used a variety of linear, non-linear, kernel, neural networks, and ensemble ML algorithms to generate an accurate classification of both cases and controls in independent datasets using data that is not involved in PD diagnosis itself at study recruitment. The model built in this paper significantly improves upon our previous models that used the entire training dataset in previous work1. Building on this earlier work, we showed that the PD diagnosis can be refined using improved algorithmic classification tools that may yield potential biological insights. We have taken careful consideration to develop and validate this model using public controlled-access datasets and an open-source ML framework to allow for reproducible and transparent results.Implications of all available evidenceTraining, validating, and tuning a diagnostic algorithm for PD will allow us to augment clinical diagnoses or risk assessments with less need for complex and expensive exams. Going forward, these models can be built on remote or asynchronously collected data which may be important in a growing telemedicine paradigm. More refined diagnostics will also increase clinical trial efficiency by potentially refining phenotyping and predicting onset, allowing providers to identify potential cases earlier. Early detection could lead to improved treatment response and higher efficacy. Finally, as part of our workflow, we built new networks representing communities of genes correlated in PD cases in a hypothesis-free manner, showing how new and existing genes may be connected and highlighting therapeutic opportunities.

Publisher

Cold Spring Harbor Laboratory

Reference49 articles.

1. Diagnosis of Parkinson's disease on the basis of clinical and genetic classification: a population-based modelling study

2. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

3. GenoML · Automated Machine Learning (AutoML) for Genomics. https://genoml.github.io/index.html (accessed Nov 11, 2020).

4. Home. https://amp-pd.org/ (accessed Nov 11, 2020).

5. Makarious MB , Leonard HL , Vitale D , et al. GenoML: automated machine learning for genomics. arXiv:210303221 [cs, q-bio] 2021; published online March 4. (accessed March 5, 2021).

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3