Affiliation:
1. Institute of Clinical Molecular Biology, Kiel University , 24105 Kiel, Germany
2. Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen , 2200 Copenhagen, Denmark
Abstract
Abstract
Motivation
Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used.
Results
We developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features. We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2–30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical genome-wide association studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files and various user-configurable algorithm and output options. Due to the technical optimizations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of 1 million genomes.
Availability and implementation
EagleImp is implemented in C++ and freely available for download at https://github.com/ikmb/eagleimp.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
DFG
Deutsche Forschungsgemeinschaft
German Federal Ministry of Education and Research
DFG Cluster of Excellence 2167
Precision Medicine in Chronic Inflammation
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献