Linear time complexity de novo long read genome assembly with GoldRush-Reference-Cited by-同舟云学术

Linear time complexity de novo long read genome assembly with GoldRush

Published:2023-05-22 Issue:1 Volume:14 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Wong Johnathan^ORCID,Coombe Lauren^ORCID,Nikolić Vladimir^ORCID,Zhang Emily,Nip Ka Ming^ORCID,Sidhu Puneet,Warren René L.^ORCID,Birol Inanç^ORCID

Abstract

AbstractCurrent state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap – its most costly step – was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary

Link

https://www.nature.com/articles/s41467-023-38716-x.pdf

Reference54 articles.

1. Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).

2. Haubold, B. & Wiehe, T. How repetitive are genomes? BMC Bioinform. 7, 541 (2006).

3. de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. & Pollock, D. D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7, e1002384 (2011).

4. Afshinfard, A. et al. Physlr: next-generation physical maps. DNA 2, 116–130 (2022).

5. Coombe, L. et al. ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinform. 19, 234 (2018).

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sexual dimorphism in the tardigrade Paramacrobiotus metropolitanus transcriptome;Zoological Letters;2024-06-20

2. A High-quality Oxford Nanopore Assembly of the Hourglass Dolphin (Lagenorhynchus cruciger) Genome;2024-06-02

3. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline;F1000Research;2024-05-31

4. Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies;Microbial Genomics;2024-05-08

5. Sexual dimorphism in the tardigradeParamacrobiotus metropolitanustranscriptome;2024-04-23