Summix: A method for detecting and adjusting for population structure in genetic summary data

Author:

Arriaga-MacKenzie ISORCID,Matesi G,Chen S,Ronco A,Marker KMORCID,Hall JR,Scherenberg R,Khajeh-Sharafabadi M,Wu Y,Gignoux CRORCID,Null M,Hendricks AEORCID

Abstract

AbstractPublicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.

Publisher

Cold Spring Harbor Laboratory

Reference47 articles.

1. The mutational constraint spectrum quantified from variation in 141,456 humans

2. Karczewski, K.J. , Francioli, L.C. , Tiao, G. , and Cummings, B.B. (2019). Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. BioRxiv.

3. Phan, L. , Jin, Y. , Zhang, H. , Qiang, W. , Shekhtman, E. , Shao, D. , Revoe, D. , Villamarin, R. , Ivanchenko, E. , Kimura, M. , et al. (2020). ALFA: Allele Frequency Aggregator. National Center for Biotechnology Information, US National Library of Medicine. Available Online: www.Ncbi.Nlm.Nih.Gov/snp/docs/gsr/alfa/ (accessed on 10 March 2020).

4. Burden Testing of Rare Variants Identified through Exome Sequencing via Publicly Available Control Data

5. ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls;PLoS Genet,2018

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3