Genotype imputation using the Positional Burrows Wheeler Transform-Reference-Cited by-同舟云学术

Genotype imputation using the Positional Burrows Wheeler Transform

Published:2020-11-16 Issue:11 Volume:16 Page:e1009049
ISSN:1553-7404
Container-title:PLOS Genetics
language:en
Short-container-title:PLoS Genet

Author:

Rubinacci Simone^ORCID,Delaneau Olivier^ORCID,Marchini Jonathan^ORCID

Abstract

Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.

Funder

European Research Council

Engineering and Physical Sciences Research Council

Publisher

Public Library of Science (PLoS)

Subject

Cancer Research,Genetics(clinical),Genetics,Molecular Biology,Ecology, Evolution, Behavior and Systematics

Reference25 articles.

1. Genotype imputation for genome-wide association studies;J Marchini;Nature Reviews Genetics,2010

2. The UK Biobank resource with deep phenotyping and genomic data;C Bycroft;Nature,2018

3. Meta-analysis in genome-wide association studies;E Zeggini;Pharmacogenomics,2009

4. Improved whole-chromosome phasing for disease and population genetic studies;O Delaneau;Nature Methods,2013

Cited by 110 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Systematic comparison of genotype imputation strategies in aquaculture: A case study in Nile tilapia (Oreochromis niloticus) populations;Aquaculture;2024-11

2. Comprehensive genome-wide analysis of genetic loci and candidate genes associated with litter traits in purebred Berkshire pigs of Korea;Animal Bioscience;2024-10-01

3. A scalable approach for genome-wide inference of ancestral recombination graphs;2024-09-02

4. Genomic analyses identify 15 susceptibility loci and revealHDAC2,SOX2-OT, andIGF2BP2in a naturally-occurring canine model of gastric cancer;2024-08-16

5. A Genomics England haplotype reference panel and imputation of UK Biobank;Nature Genetics;2024-08-12