Author:
Zhang Tianyu,Klei Lambertus,Liu Peng,Chouldechova Alexandra,Roeder Kathryn,G’Sell Max,Devlin Bernie
Abstract
AbstractPolygenic scores (PGS) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWAS, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum. In the simulation settings we explore, Joint-Lassosum provides more accurate PGS compared with other methods, especially when measured in terms of fairness. Like all PGS methods, Joint-Lassosum requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how Joint-Lassosum can help mitigate fairness-related harms that might result from the use of PGS scores in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWAS for different ancestries, Joint-Lassosum is an effective approach for enhancing portability and reducing predictive bias.
Publisher
Cold Spring Harbor Laboratory