Abstract
Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing programconsed.
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics (clinical),Genetics
Reference8 articles.
1. A graph theoretic approach to the analysis of DNA sequencing data.
2. Ewing, B., L. Hillier, M. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. (this issue).
3. An adaptive, object oriented strategy for base calling in DNA sequence analysis
4. Golden J.B. Torgersen D. Tibbetts C. (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. in Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, eds Hunter L. Searls D. Shavlick J. (AAAI Press, Menlo Park, CA), pp 136–144.
5. Gordon, D., C. Abajian, and P. Green. 1998. Consed: A graphical tool for sequence finishing. Genome Res. (this issue).
Cited by
5090 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献