CONSENT: Scalable long read self-correction and assembly polishing with multiple sequence alignment-Reference-Cited by-同舟云学术

CONSENT: Scalable long read self-correction and assembly polishing with multiple sequence alignment

Published:2019-02-11 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Morisse Pierre,Marchet Camille,Limasset Antoine,Lecroq Thierry,Lefebvre Arnaud

Abstract

MotivationThird-generation sequencing technologies Pacific Biosciences and Oxford Nanopore allow the sequencing of long reads of tens of kbp, that are expected to solve various problems, such as contig and haplotype assembly, scaffolding, and structural variant calling. However, they also display high error rates that can reach 10 to 30%, for basic ONT and non-CCS PacBio reads. As a result, error correction is often the first step of projects dealing with long reads. As first long reads sequencing experiments produced reads displaying error rates higher than 15% on average, most methods relied on the complementary use of short reads data to perform correction, in a hybrid approach. However, these sequencing technologies evolve fast, and the error rate of the long reads now reaches 10 to 12%. As a result, self-correction is now frequently used as the first step of third-generation sequencing data analysis projects. As of today, efficient tools allowing to perform self-correction of the long reads are available, and recent observations suggest that avoiding the use of second-generation sequencing reads could bypass their inherent bias.ResultsWe introduce CONSENT, a new method for the self-correction of long reads that combines different strategies from the state-of-the-art. More precisely, we combine a multiple sequence alignment strategy with the use of local de Bruijn graphs. Moreover, the multiple sequence alignment benefits from an efficient segmentation strategy based on k-mer chaining, which allows a considerable speed improvement. Our experiments show that CONSENT compares well to the latest state-of-the-art self-correction methods, and even outperforms them on real Oxford Nanopore datasets. In particular, they show that CONSENT is the only method able to efficiently scale to the correction of Oxford Nanopore ultra-long reads, and is able to process a full human dataset, containing reads reaching lengths up to 1.5 Mbp, in 15 days. Additionally, CONSENT also implements an assembly polishing feature, and is thus able to correct errors directly from raw long read assemblies. Our experiments show that CONSENT outperforms state-of-the-art polishing tools in terms of resource consumption, and provides comparable results. Moreover, we also show that, for a full human dataset, assembling the raw data and polishing the assembly afterwards is less time consuming than assembling the corrected reads, while providing better quality results.Availability and implementationCONSENT is implemented in C++, supported on Linux platforms and freely available at https://github.com/morispi/CONSENT.Contactpierre.morisse2@univ-rouen.fr

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. HALC: High throughput algorithm for long read error correction

2. E. Bao , F. Xie , C. Song , and S. Dandan . HALS: Fast and High Throughput Algorithm for PacBio Long Read Self-Correction. RECOMB-SEQ 2018, 2018.

3. Scaffolding and completing genome assemblies in real-time with nanopore sequencing;Nature Communications,2017

4. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

5. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data;Nature Methods,2013

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Long-read PacBio genome sequencing of four environmental saprophytic Sporothrix species spanning the pathogenic clade;BMC Genomics;2022-07-12

2. Prospects for multi-omics in the microbial ecology of water engineering;Water Research;2021-10

3. phasebook: haplotype-aware de novo assembly of diploid genomes from long reads;2021-07-04

4. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis;Nature Communications;2021-01-04

5. Optical map guided genome assembly;BMC Bioinformatics;2020-07-06