SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS-Reference-Cited by-同舟云学术

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS

Published:2013-01 Issue:S1 Volume:14 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Merelli Ivan,Calabria Andrea,Cozzi Paolo,Viti Federica,Mosca Ettore,Milanesi Luciano

Abstract

Abstract Background The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. Results We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. Conclusions Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/1471-2105-14-S1-S9.pdf

Reference53 articles.

1. de Bakker PIW, Yelensky R, Peter I, Gabriel SB, Daly MJ, Altshuler D: Efficiency and power in genetic association studies. Nature Genet. 2005, 37 (11): 1217-1223. 10.1038/ng1669.

2. Goldstein DB, Cavalleri GL: Genomics: understanding human diversity. Nature. 2005, 437 (7063): 1241-1242. 10.1038/4371241a.

3. Botstein D, Risch N: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 2003, 33 (Suppl): 228-37.

4. Kruglyak L, Nickerson DA: Variation is the spice of life. Nature Genet. 2001, 27: 234-236. 10.1038/85776.

5. Zhang H, Liu L, Wang X, Gruen JR: Guideline for data analysis of genome-wide association studies. Cancer Genomics Proteomics. 2007, 4 (1): 27-34.

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semantic and Population Analysis of the Genetic Targets Related to COVID-19 and Its Association with Genes and Diseases;Advances in Experimental Medicine and Biology;2023

2. Exploring Machine Learning Algorithms to Unveil Genomic Regions Associated With Resistance to Southern Root-Knot Nematode in Soybeans;Frontiers in Plant Science;2022-05-03

3. SNP characteristics and validation success in genome wide association studies;Human Genetics;2022-01-04

4. Revisiting genome-wide association studies from statistical modelling to machine learning;Briefings in Bioinformatics;2020-10-30

5. Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci;Frontiers in Genetics;2020-04-15