Predicting deleterious nsSNPs: an analysis of sequence and structural attributes-Reference-Cited by-同舟云学术

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes

Published:2006-04-21 Issue:1 Volume:7 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Dobson Richard J,Munroe Patricia B,Caulfield Mark J,Saqi Mansoor AS

Abstract

Abstract Background There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. Results The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. Conclusion The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-7-217.pdf

Reference33 articles.

1. Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29: 308–11. 10.1093/nar/29.1.308

2. Fredman D, Munns G, Rios D, Sjoholm F, Siegfried M, Lenhard B, Lehvaslaiho H, Brookes A: HGVbase: a curated resource describing human DNA variation and phenotype relationships. Nucleic Acids Res 2004, (32 Database):D516–9. 10.1093/nar/gkh111

3. Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365–70. 10.1093/nar/gkg095

4. Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A: The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum Mutat 2004, 23(5):464–470. 10.1002/humu.20021

5. Wang Z, Moult J: SNPs, protein structure, and disease. Hum Mutat 2001, 17(4):263–270. 10.1002/humu.22

Cited by 77 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unraveling the potential effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on the Protein structure and function of the human SLC30A8 gene on type 2 diabetes and colorectal cancer: An In silico approach;Heliyon;2024-09

2. Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment;Human Genomics;2024-08-27

3. Prediction of the most deleterious non-synonymous SNPs in the human IL1B gene: evidence from bioinformatics analyses;BMC Genomic Data;2024-06-10

4. DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences;International Journal of Molecular Sciences;2022-09-20

5. Structural Consequences of IRS-2 nsSNPs and Implication for Insulin Receptor Substrate-2 Protein Stability;Biochemical Genetics;2022-06-21