ECHO: A reference-free short-read error correction algorithm-Reference-Cited by-同舟云学术

ECHO: A reference-free short-read error correction algorithm

Published:2011-04-11 Issue:7 Volume:21 Page:1181-1192
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Kao Wei-Chun,Chan Andrew H.,Song Yun S.

Abstract

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference38 articles.

1. High-Throughput Gene Mapping in Caenorhabditis elegans

2. De novo transcriptome assembly with ABySS

3. Quality scores and SNP detection in sequencing-by-synthesis systems

4. ALLPATHS: De novo assembly of whole-genome shotgun microreads

5. Fragment assembly with short reads

Cited by 80 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads;2024-06-03

2. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads;BMC Bioinformatics;2024-02-07

3. Methods to improve the accuracy of next-generation sequencing;Frontiers in Bioengineering and Biotechnology;2023-01-20

4. Genome sequence assembly algorithms and misassembly identification methods;Molecular Biology Reports;2022-09-23

5. Lightweight Pattern Matching Method for DNA Sequencing in Internet of Medical Things;Computational Intelligence and Neuroscience;2022-09-08