Benchmarking of long-read correction methods-Reference-Cited by-同舟云学术

Benchmarking of long-read correction methods

Published:2020-05-25 Issue:2 Volume:2 Page:
ISSN:2631-9268
Container-title:NAR Genomics and Bioinformatics
language:en
Short-container-title:

Author:

Dohm Juliane C¹,Peters Philipp¹,Stralis-Pavese Nancy¹,Himmelbauer Heinz¹

Affiliation:

1. Institute of Computational Biology, Department of Biotechnology, University of Life Sciences and Natural Resources, Vienna (BOKU), Muthgasse 18, 1190 Vienna, Austria

Abstract

Abstract Third-generation sequencing technologies provided by Pacific Biosciences and Oxford Nanopore Technologies generate read lengths in the scale of kilobasepairs. However, these reads display high error rates, and correction steps are necessary to realize their great potential in genomics and transcriptomics. Here, we compare properties of PacBio and Nanopore data and assess correction methods by Canu, MARVEL and proovread in various combinations. We found total error rates of around 13% in the raw datasets. PacBio reads showed a high rate of insertions (around 8%) whereas Nanopore reads showed similar rates for substitutions, insertions and deletions of around 4% each. In data from both technologies the errors were uniformly distributed along reads apart from noisy 5′ ends, and homopolymers appeared among the most over-represented kmers relative to a reference. Consensus correction using read overlaps reduced error rates to about 1% when using Canu or MARVEL after patching. The lowest error rate in Nanopore data (0.45%) was achieved by applying proovread on MARVEL-patched data including Illumina short-reads, and the lowest error rate in PacBio data (0.42%) was the result of Canu correction with minimap2 alignment after patching. Our study provides valuable insights and benchmarks regarding long-read data and correction methods.

Publisher

Oxford University Press (OUP)

Subject

General Medicine

Link

http://academic.oup.com/nargab/article-pdf/2/2/lqaa037/34054389/lqaa037.pdf

Reference34 articles.

1. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology;English;PLoS One,2012

2. Reconstructing complex regions of genomes using long-read sequencing technology;Huddleston;Genome Res.,2014

3. Single haplotype assembly of the human genome from a hydatidiform mole;Steinberg;Genome Res.,2014

4. A single-molecule long-read survey of the human transcriptome;Sharon;Nat. Biotechnol.,2013

5. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction;Minoche;Genome Biol.,2015

Cited by 85 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey of k-mer methods and applications in bioinformatics;Computational and Structural Biotechnology Journal;2024-12

2. Diversity of ribosomes at the level of rRNA variation associated with human health and disease;Cell Genomics;2024-09

3. Resolving a neonatal intensive care unit outbreak of methicillin-resistantStaphylococcus aureusto the SNV level using Oxford Nanopore simplex reads and HERRO error correction;2024-07-12

4. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates;BMC Genomics;2024-07-08

5. SWIGH-SCORE: A translational light-weight approach in computational detection of rearranged immunoglobulin heavy chain to be used in monoclonal lymphoproliferative disorders;MethodsX;2024-06