Predicting DNA-binding sites of proteins from amino acid sequence-Reference-Cited by-同舟云学术

Predicting DNA-binding sites of proteins from amino acid sequence

Published:2006-05-19 Issue:1 Volume:7 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Yan Changhui,Terribilini Michael,Wu Feihong,Jernigan Robert L,Dobbs Drena,Honavar Vasant

Abstract

Abstract Background Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. Results We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions. Conclusion Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-7-262.pdf

Reference38 articles.

1. Ghosh D, Papavassiliou AG: Transcription factor therapeutics: long-shot or lodestone. Curr Med Chem 2005, 12: 691–701.

2. Blancafort P, Segal DJ, Barbas CFIII: Designing transcription factor architectures for drug discovery. Mol Pharmacol 2004, 66: 1361–1371. 10.1124/mol.104.002758

3. Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem 1992, 61: 1053–1095. 10.1146/annurev.bi.61.070192.005201

4. Laity JH, Lee BM, Wright PE: Zinc finger proteins: new insights into structural and functional diversity. Current Opinion in Structural Biology 2001, 11: 39–46. 10.1016/S0959-440X(00)00167-6

5. Lawson CL, Swigon D, Murakami KS, Darst SA, Berman HM, Ebright RH: Catabolite activator protein: DNA binding and transcription activation. Current Opinion in Structural Biology 2004, 14: 10–20. 10.1016/j.sbi.2004.01.012

Cited by 123 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Efficient Deep Learning Approach for DNA-Binding Proteins Classification from Primary Sequences;International Journal of Computational Intelligence Systems;2024-04-11

2. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond;Briefings in Bioinformatics;2024-03-27

3. HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins;Nucleic Acids Research;2023-12-04

4. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features;Frontiers in Genetics;2023-04-06

5. Exploration of protein sequence embeddings for protein-ligand binding site detection;2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2022-12-06