CARE: context-aware sequencing read error correction-Reference-Cited by-同舟云学术

CARE: context-aware sequencing read error correction

Published:2020-08-20 Issue:7 Volume:37 Page:889-895
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Kallenborn Felix¹^ORCID,Hildebrandt Andreas¹,Schmidt Bertil¹

Affiliation:

1. Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany

Abstract

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration. Availabilityand implementation CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Deutsche Forschungsgemeinschaft

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa738/34388391/btaa738.pdf

Reference31 articles.

1. Athena: automated tuning of k-mer based genomic error correction algorithms using language models;Abdallah;Sci. Rep,2019

2. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data;Allam;Bioinformatics,2015

3. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing;Bankevich;J. Comput. Biol.,2012

4. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing;Berlin;Nat. Biol.,2015

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads;2024-06-03

2. From GPUs to AI and quantum: three waves of acceleration in bioinformatics;Drug Discovery Today;2024-06

3. CAREx: context-aware read extension of paired-end sequencing data;BMC Bioinformatics;2024-05-10

4. BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction;Future Generation Computer Systems;2024-05

5. Turn ‘noise’ to signal: accurately rectify millions of erroneous short reads through graph learning on edit distances;2024-04-09