A SNP discovery method to assess variant allele probability from next-generation resequencing data-Reference-Cited by-同舟云学术

A SNP discovery method to assess variant allele probability from next-generation resequencing data

Published:2009-12-17 Issue:2 Volume:20 Page:273-280
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Shen Yufeng,Wan Zhengzheng,Coarfa Cristian,Drabek Rafal,Chen Lei,Ostrowski Elizabeth A.,Liu Yue,Weinstock George M.,Wheeler David A.,Gibbs Richard A.,Yu Fuli

Abstract

Accurate identification of genetic variants from next-generation sequencing (NGS) data is essential for immediate large-scale genomic endeavors such as the 1000 Genomes Project, and is crucial for further genetic analysis based on the discoveries. The key challenge in single nucleotide polymorphism (SNP) discovery is to distinguish true individual variants (occurring at a low frequency) from sequencing errors (often occurring at frequencies orders of magnitude higher). Therefore, knowledge of the error probabilities of base calls is essential. We have developed Atlas-SNP2, a computational tool that detects and accounts for systematic sequencing errors caused by context-related variables in a logistic regression model learned from training data sets. Subsequently, it estimates the posterior error probability for each substitution through a Bayesian formula that integrates prior knowledge of the overall sequencing error probability and the estimated SNP rate with the results from the logistic regression model for the given substitutions. The estimated posterior SNP probability can be used to distinguish true SNPs from sequencing errors. Validation results show that Atlas-SNP2 achieves a false-positive rate of lower than 10%, with an ∼5% or lower false-negative rate.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference20 articles.

1. An SNP map of the human genome generated by reduced representation shotgun sequencing

2. Quality scores and SNP detection in sequencing-by-synthesis systems

3. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing

4. The Complete Genome Sequence of Escherichia coli DH10B: Insights into the Biology of a Laboratory Workhorse

5. Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy Assessment

Cited by 131 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Elucidating the process of SNPs identification in non-reference genome crops;Journal of Biomolecular Structure and Dynamics;2023-04-05

2. Methods to improve the accuracy of next-generation sequencing;Frontiers in Bioengineering and Biotechnology;2023-01-20

3. Computational Genomics Approaches for Livestock Improvement and Management;Livestock Diseases and Management;2023

4. Human Retrotransposons and Effective Computational Detection Methods for Next-Generation Sequencing Data;Life;2022-10-12

5. Mutational Analysis of Triple-Negative Breast Cancer Using Targeted Kinome Sequencing;Journal of Breast Cancer;2022