Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score
-
Published:2023-02-07
Issue:1
Volume:21
Page:
-
ISSN:1479-5876
-
Container-title:Journal of Translational Medicine
-
language:en
-
Short-container-title:J Transl Med
Author:
Lim Ashley J. W., Tyniana C. Tera, Lim Lee Jin, Tan Justina Wei Lynn, Koh Ee Tzun, Ang Andrea Ee Ling, Chan Grace Yin Lai, Chan Madelynn Tsu-Li, Chia Faith Li-Ann, Chng Hiok Hee, Chua Choon Guan, Howe Hwee Siew, Koh Li Wearn, Kong Kok Ooi, Law Weng Giap, Lee Samuel Shang Ming, Lian Tsui Yee, Lim Xin Rong, Loh Jess Mung Ee, Manghani Mona, Tan Sze-Chin, Teo Claire Min-Li, Thong Bernard Yu-Hor, Tjokrosaputro Paula Permatasari, Xu Chuanhui, Chong Samuel S., Khor Chiea Chuen, Leong Khai Pang, Lee Caroline G.ORCID,
Abstract
Abstract
Background
The popular statistics-based Genome-wide association studies (GWAS) have provided deep insights into the field of complex disorder genetics. However, its clinical applicability to predict disease/trait outcomes remains unclear as statistical models are not designed to make predictions. This study employs statistics-free machine-learning (ML)-optimized polygenic risk score (PRS) to complement existing GWAS and bring the prediction of disease/trait outcomes closer to clinical application. Rheumatoid Arthritis (RA) was selected as a model disease to demonstrate the robustness of ML in disease prediction as RA is a prevalent chronic inflammatory joint disease with high mortality rates, affecting adults at the economic prime. Early identification of at-risk individuals may facilitate measures to mitigate the effects of the disease.
Methods
This study employs a robust ML feature selection algorithm to identify single nucleotide polymorphisms (SNPs) that can predict RA from a set of training data comprising RA patients and population control samples. Thereafter, selected SNPs were evaluated for their predictive performances across 3 independent, unseen test datasets. The selected SNPs were subsequently used to generate PRS which was also evaluated for its predictive capacity as a sole feature.
Results
Through robust ML feature selection, 9 SNPs were found to be the minimum number of features for excellent predictive performance (AUC > 0.9) in 3 independent, unseen test datasets. PRS based on these 9 SNPs was significantly associated with (P < 1 × 10–16) and predictive (AUC > 0.9) of RA in the 3 unseen datasets. A RA ML-PRS calculator of these 9 SNPs was developed (https://xistance.shinyapps.io/prs-ra/) to facilitate individualized clinical applicability. The majority of the predictive SNPs are protective, reside in non-coding regions, and are either predicted to be potentially functional SNPs (pfSNPs) or in high linkage disequilibrium (r2 > 0.8) with un-interrogated pfSNPs.
Conclusions
These findings highlight the promise of this ML strategy to identify useful genetic features that can robustly predict disease and amenable to translation for clinical application.
Funder
Duke-NUS Medical School National Medical Research Council National Cancer Centre of Singapore
Publisher
Springer Science and Business Media LLC
Subject
General Biochemistry, Genetics and Molecular Biology,General Medicine
Reference67 articles.
1. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genetics. 2019;20:467–84. 2. Nicholls HL, John CR, Watson DS, Munroe PB, Barnes MR, Cabrera CP. Reaching the end-game for GWAS: machine learning approaches for the prioritization of complex disease loci. Front Genet. 2020;11:350. 3. Bzdok D, Altman N, Krzywinski M. Points of significance: statistics versus machine learning. Nat Methods. 2018;15(4):233–4. 4. Stewart M. The actual difference between statistics and machine learning. PhD Researcher. Towards data science. 5. Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiat. 2020;77:534–40.
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|