Abstract
AbstractThe low portability of polygenic scores (PGS) across global populations is a major concern that must be addressed before PGS can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGS are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a high-resolution, country-specific level, based on a simple, robust and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes, and show a systematic and dramatic reduction in portability of PGS trained in the inferred ancestral UK population and applied to the inferred ancestral Polish - Italian - Iranian - Indian - Chinese - Caribbean - Nigerian - Ashkenazi populations, respectively. These analyses, performed at a finer scale than the usual continental scale, demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to PC distance, even when all individuals reside in the same country and are genotyped and phenotyped as part of the same cohort. Our study provides high-resolution and robust insights into the PGS portability problem.
Publisher
Cold Spring Harbor Laboratory
Reference53 articles.
1. Accurate and robust genomic prediction of celiac disease using statistical learning;PLoS genetics,2014
2. Abraham, G. , Qiu, Y. , and Inouye, M. (2017). FlashPCA2: principal component analysis of biobank-scale genotype datasets. Bioinformatics.
3. Albiñana, C. , Grove, J. , McGrath, J. J. , Agerbo, E. , Wray, N. R. , Werge, T. , Børglum, A. D. , Mortensen, P. B. , Privé, F. , and Vilhjálmsson, B. J. (2020). Leveraging both individual-level genetic data and gwas summary statistics increases polygenic prediction. bioRxiv.
4. Fast model-based estimation of ancestry in unrelated individuals
5. No evidence from genome-wide data of a khazar origin for the ashkenazi jews;Human biology,2013