A hybrid approach for predicting transcription factors-Reference-Cited by-同舟云学术

A hybrid approach for predicting transcription factors

Published:2024-07-25 Issue: Volume:4 Page:
ISSN:2673-7647
Container-title:Frontiers in Bioinformatics
language:
Short-container-title:Front. Bioinform.

Author:

Patiyal Sumeet,Tiwari Palak,Ghai Mohit,Dhapola Aman,Dhall Anjali,Raghava Gajendra P. S.

Abstract

Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver/Python Package Index/standalone package of “TransFacPred” (https://webs.iiitd.edu.in/raghava/transfacpred).

Publisher

Frontiers Media SA

Reference57 articles.

1. Identification of mannose interacting residues using local composition;Agarwal;PloS one,2011

2. SAMbinder: a web server for predicting s-adenosyl-l-methionine binding residues of a protein from its amino acid sequence;Agrawal;Front. Pharmacol.,2020

3. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000;Bairoch;Nucleic Acids Res.,2000

4. Targeting transcription factors in cancer;Bhagwat;Trends Cancer,2015

5. UniProtKB/Swiss-Prot;Boutet;Methods Mol. Biol.,2007