The effects of reference panel perturbations on the accuracy of genotype imputation

Author:

Li Jeremiah H.ORCID,Liu Andrew,Buerkle C. Alex,Palmer William,Belbin Gillian M.,Ahangari Mohammad,Gibson Matthew J.S.,Flagel Lex

Abstract

AbstractReference-based genotype imputation is a standard technique that has become increasingly popular in large-scale studies involving genomic data. The two key elements involved in the process of genotype imputation are (1) the haplotype reference panel to which a target individual is being imputed, and (2) the imputation algorithm used to infer missing genotypes in the target individual. The imputation literature has historically focused mainly on (2), with a typical comparative study investigating the relative performance of various imputation algorithms while holding the reference panel constant. However, the role of the reference panel itself (1) on overall imputation performance is equally, if not more, important than the choice among many high-performing algorithms. Even though it is intuitive that the quality of a reference panel should play a role in the accuracy of imputation, it is nonetheless unclear to what extent common errors during panel creation (e.g., genotyping and phase error) lead to suboptimal imputation performance. In this study, we investigate the effects of applying three distinct modes of perturbations to a widely used haplotype reference panel in human genetics on the resulting imputation accuracy. Specifically, we perturb the reference panel by (1) randomly introducing phase errors, (2) randomly introducing genotype errors, and (3) randomly pruning variants from the panel (all at varying magnitudes). We then impute a set of diverse individuals at various sequencing coverages (0.5x, 1.0x, and 2.0x) to these various perturbed panels and evaluate imputation accuracy using ther2metric for the entire cohort as well as ancestry-stratified subsets. We observe that both phase- and genotype-perturbations can dramatically affect imputation accuracy, particularly at very low allele frequencies, while pruning variants has a far smaller effect. We then empirically verified that our simulations reliably predict the impact of potential filtering techniques in a real-world dataset. In the context of haplotype reference panels, these results suggest that phasing and genotyping accuracy are far more important than the density of a reference panel used for imputation.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3