Affiliation:
1. Department of Information Technology (Bioinformatics), Indian Institute of Information Technology-Allahabad, Jhalwa, Prayagraj, India
2. Department of Bioinformatics and Applied Science, Indian Institute of Information Technology- Allahabad, Jhalwa, Prayagraj, India
Abstract
Aims:
To develop a tool that can annotate subcellular localization of human proteins.
Background:
With the progression of high throughput human proteomics projects, an enormous
amount of protein sequence data has been discovered in the recent past. All these raw sequence data
require precise mapping and annotation for their respective biological role and functional attributes.
The functional characteristics of protein molecules are highly dependent on the subcellular localization/
compartment. Therefore, a fully automated and reliable protein subcellular localization prediction
system would be very useful for current proteomic research.
Objective:
To develop a machine learning-based predictive model that can annotate the subcellular localization
of human proteins with high accuracy and precision.
Methods:
In this study, we used the PSI-CD-HIT homology criterion and utilized the sequence-based
features of protein sequences to develop a powerful subcellular localization predictive model. The dataset
used to train the HumDLoc model was extracted from a reliable data source, Uniprot knowledge
base, which helps the model to generalize on the unseen dataset.
Result :
The proposed model, HumDLoc, was compared with two of the most widely used techniques:
CELLO and DeepLoc, and other machine learning-based tools. The result demonstrated promising
predictive performance of HumDLoc model based on various machine learning parameters such
as accuracy (≥97.00%), precision (≥0.86), recall (≥0.89), MCC score (≥0.86), ROC curve (0.98 square
unit), and precision-recall curve (0.93 square unit).
Conclusion:
In conclusion, HumDLoc was able to outperform several alternative tools for correctly
predicting subcellular localization of human proteins. The HumDLoc has been hosted as a web-based
tool at https://bioserver.iiita.ac.in/HumDLoc/.
Publisher
Bentham Science Publishers Ltd.
Subject
Genetics(clinical),Genetics
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献