Affiliation:
1. Department of Computer Science and Software Engineering , Concordia University , Montreal , Canada
Abstract
Abstract
Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.
Reference44 articles.
1. Ghazikhani, H, Butler, G. A study on the application of protein language models in the analysis of membrane proteins. In: Machado, JM, Chamoso, P, Hernández, G, Bocewicz, G, Loukanova, R, Jove, E, et al., editors. Distributed computing and artificial intelligence, special sessions, 19th international conference. Lecture notes in networks and systems. Cham: Springer International Publishing; 2023:147–52 pp.
2. Ghazikhani, H, Butler, G. TooT-BERT-M: discriminating membrane proteins from non-membrane proteins using a BERT representation of protein primary sequences. In: 2022 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB); 2022:1–8 pp.
3. Sadée, W, Drübbisch, V, Amidon, GL. Biology of membrane transport proteins. Pharmaceut Res 1995;12:1823–37. https://doi.org/10.1023/a:1016211015926.
4. Saier, MHJr. Families of transporters and their classification. In: Transmembrane transporters. New York: John Wiley & Sons, Ltd; 2002:1–17 pp.
5. Yıldırım, MA, Goh, KI, Cusick, ME, Barabási, AL, Vidal, M. Drug-target network. Nat Biotechnol 2007;25:1119–26. https://doi.org/10.1038/nbt1338.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献