Abstract
AbstractApart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMIeffects by nearly 20% per standard deviation change. We observed overlap with covariates that had significant R2differences between strata and interaction effects – across all covariates, their main effects on BMI were correlated with maximum R2differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGS-score individuals have highest R2and PGS effect increases. Given significant and replicable evidence for context-specific PGSBMIperformance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R2(mean 23%) across datasets. Finally, creating PGSBMIdirectly from GxAge GWAS effects increased relative R2by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMIperformance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
Publisher
Cold Spring Harbor Laboratory