Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research-Reference-Cited by-同舟云学术

Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research

Published:2022-09-08 Issue:1 Volume:16 Page:
ISSN:1479-7364
Container-title:Human Genomics
language:en
Short-container-title:Hum Genomics

Author:

Lopez-Pineda Arturo,Vernekar Manvi,Moreno-Grau Sonia,Rojas-Muñoz Agustin,Moatamed Babak,Lee Ming Ta Michael,Nava-Aguilar Marco A.,Gonzalez-Arroyo Gilberto,Numakura Kensuke,Matsuda Yuta,Ioannidis Alexander,Katsanis Nicholas,Takano Tomohiro,Bustamante Carlos D.

Abstract

Abstract Introduction A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension. Methods We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6 months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case–control setting using the BASIL algorithm. Results We collected data on N = 4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N = 4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC = 0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers. Discussion DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly. Conclusions The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated.

Funder

New Energy and Industrial Technology Development Organization

Publisher

Springer Science and Business Media LLC

Subject

Drug Discovery,Genetics,Molecular Biology,Molecular Medicine

Link

https://link.springer.com/content/pdf/10.1186/s40246-022-00406-y.pdf

Reference60 articles.

1. Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3(11):442–520.

2. Psaltopoulou T, Ilias I, Alevizaki M. The role of diet and lifestyle in primary, secondary, and tertiary diabetes prevention: a review of meta-analyses. Rev Diabet Stud. 2010;7(1):26–35.

3. Cousin E, Duncan BB, Stein C, Ong KL, Vos T, Abbafati C, Haque S. Diabetes mortality and trends before 25 years of age: an analysis of the Global Burden of Disease Study 2019. Lancet Diabetes Endocrinol. 2022. https://doi.org/10.1016/S2213-8587(21)00349-1.

4. World Health Organization. (2022a). Diabetes. World Health Organization. Retrieved February 15, 2022, from https://www.who.int/news-room/fact-sheets/detail/diabetes

5. Tsimihodimos V, Gonzalez-Villalpando C, Meigs JB, Ferrannini E. Hypertension and diabetes mellitus: coprediction and time trajectories. Hypertension. 2018;71(3):422–8.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Polygenic risk score portability for common diseases across genetically diverse populations;Human Genomics;2024-09-02

2. Integrating Common Risk Factors with Polygenic Scores Improves the Prediction of Type 2 Diabetes;International Journal of Molecular Sciences;2023-01-04