Affiliation:
1. Universidad de Talca Centro de Bioinformática, Simulación y Modelado (CBSM) Talca Chile
2. Universidad de Talca Facultad de Ingeniería Talca Chile
Abstract
Introduction:
Transcription factors are of great interest in biotechnology due to their key
role in the regulation of gene expression. One of the most important transcription factors in gramnegative
bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial
agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant
features.
Methods:
In this study, we evaluated several machine learning algorithms for the prediction of
DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription
factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT),
and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived
from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature
selection procedure was employed to select fewer descriptors in each case by maintaining a
good classification performance.
Results:
The best results were obtained with the SVM model using twelve sequence-derived attributes
and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively.
Conclusion:
The performance obtained indicates that the descriptors we used are relevant for predicting
DNA-binding sites since they can discriminate between binding and non-binding regions of a
protein.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry