Affiliation:
1. Département d’Informatique , Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf , USTO-MB BP 1505, El M’Naouer, 31000 , Oran , Algeria
Abstract
Abstract
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein–protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.
Reference125 articles.
1. Wang, X, Li, S. Protein mislocalization: mechanisms, functions and clinical applications in cancer. Acta Biochim Biophys Sin 2014;1846:13–25. https://doi.org/10.1016/j.bbcan.2014.03.006.
2. Horton, P, Mukai, Y, Nakai, K. Protein subcellular localization prediction. In: Wong, L for Infocomm Research, editors. Review Volume practical-bioinformatician. Singapore: World Scientific Publishing Co. Pte. Ltd; 2004, vol 2, ch 9:193–216 pp.
3. Nakai, K, Kanehisa, M. Expert system for predicting protein localization sites in gram-negative bacteria. Protein Struct Funct Genet 1991;11:95–110. https://doi.org/10.1002/prot.340110203.
4. Nakai, K, Kanehisa, M. A knowledge base for predicting protein localisation sites in eukaryotic cells. Genomics 1992;14:897–911. https://doi.org/10.1016/s0888-7543(05)80111-9.
5. Horton, P, Nakai, K. A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of intelligent systems in molecular biology. St. Louis, USA; 1996:109–15 pp.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献