Affiliation:
1. Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
Abstract
Abstract
Summary
The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory.
Availability and implementation
ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
Genome BC and Genome Canada
National Institutes of Health
National Institutes of Health or other funding organizations
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献