QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species-Reference-Cited by-同舟云学术

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Published:2006-10-09 Issue:1 Volume:7 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Tang Jifeng,Vosman Ben,Voorrips Roeland E,van der Linden C Gerard,Leunissen Jack AM

Abstract

Abstract Background Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. Results We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. Conclusion QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at http://www.bioinformatics.nl/tools/snpweb/ and as Additional files.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-7-438.pdf

Reference39 articles.

1. Brookes AJ: The essence of SNPs. Gene 1999, 234: 177–186. 10.1016/S0378-1119(99)00219-X

2. Useche FJ, Gao G, Harafey M, Rafalski A: High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform Ser Workshop Genome Inform 2001, 12: 194–203.

3. Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M: Mining SNPs from EST databases. Genome Res 1999, 9: 167–174.

4. Syvanen AC: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat Rev Genet 2001, 2: 930–942. 10.1038/35103535

5. Rickert AM, Kim JH, Meyer S, Nagel A, Ballvora A, Oefner P, Gebhardt C: First-generation SNP/InDel markers tagging loci for pathogen resistance in the potato genome. Plant Biotech J 2003, 1: 399–410. 10.1046/j.1467-7652.2003.00036.x

Cited by 111 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Elucidating the process of SNPs identification in non-reference genome crops;Journal of Biomolecular Structure and Dynamics;2023-04-05

2. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture;International Journal of Biological Macromolecules;2023-04

3. In Silico Mining and Characterization of High-Quality SNP/Indels in Some Agro-Economically Important Species Belonging to the Family Euphorbiaceae;Genes;2023-01-27

4. Identification and validation of quantitative trait loci for chlorophyll content of flag leaf in wheat under different phosphorus treatments;Frontiers in Plant Science;2022-11-17

5. Predicting the predisposition to colorectal cancer based on SNP profiles of immune phenotypes using supervised learning models;Medical & Biological Engineering & Computing;2022-11-11