CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data-Reference-Cited by-同舟云学术

CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data

Published:2009-01 Issue:S1 Volume:10 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Wang Jun,Guo Mao-zu,Wang Chun-yu

Abstract

Abstract Background Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are able to capture most variation in a population, to represent the rest SNPs. Several methods have been proposed to find the minimum set of tagSNPs, but most of them still have some disadvantages such as information loss and block-partition limit. Results This paper proposes a new hybrid method named CGTS which combines the ideas of the clustering and the graph algorithms to select tagSNPs on genotype data. This method aims to maximize the number of the discarding nontagSNPs in the given set. CGTS integrates the information of the LD association and the genotype diversity using the site graphs, discards redundant SNPs using the algorithm based on these graph structures. The clustering algorithm is used to reduce the running time of CGTS. The efficiency of the algorithm and quality of solutions are evaluated on biological data and the comparisons with three popular selecting methods are shown in the paper. Conclusion Our theoretical analysis and experimental results show that our algorithm CGTS is not only more efficient than other methods but also can be get higher accuracy in tagSNP selection.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-10-S1-S71.pdf

Reference26 articles.

1. Kruglyak L, Nickerson DA: Variation is the spice of life. Nat Genet. 2001, 27: 234-236.

2. Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003, 31: 28-33.

3. Sachidanandam R, Weissman D: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409 (6822): 928-933.

4. Dawson E, Abecasis G, Bumpstead S, Chen Y, Hunt S, Beare D, Pabila J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lomhmussaar E, Zernant J, Tonisson N, Remm M, Magi R, Puurand J, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley D, Cardon L, Dunham I: A first-generation linkage disequilibrium map of human chromosome 22. Nature. 2002, 418: 544-548.

5. Martin ER, Lai EH, Gilbert JR, Rogala AR, Afshari AJ, Riley J, Finch KL, Stevens JF, Livak KJ, Slotterbeck BD: SNPing away at complex diseases: analysis of singlenucleotide polymorphisms around APOE in Alzheimer's disease. Am J Hum Genet. 2000, 67: 383-394.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sequence Analysis Based Adaptive Hierarchical Clustering Approach for Admixture Population Structure Inference;Lecture Notes in Electrical Engineering;2012

2. A Refined and Heuristic Algorithm for LD tagSNPs Selection;2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications;2011-11

3. Highly Sensitive and Selective Bifunctional Oligonucleotide Probe for Homogeneous Parallel Fluorescence Detection of Protein and Nucleotide Sequence;Analytical Chemistry;2011-03-29

4. Vertical Arrays of Anisotropic Particles by Gravity-Driven Self-Assembly;Small;2011-02-18

5. Virus-PEDOT Nanowires for Biosensing;Nano Letters;2010-11-01