Affiliation:
1. KASTAMONU ÜNİVERSİTESİ
Abstract
The study of the structures of proteins and the relationships of amino acids remains a challenging problem in biology. Although some bioinformatics-based studies provide partial solutions, some major problems remain. At the beginning of these problems are the logic of the sequence of amino acids and the diversity of proteins. Although these variations are biologically detectable, these experiments are costly and time-consuming. Considering that there are many unclassified sequences in the world, it is inevitable that a faster solution must be found. For this reason, we propose a deep learning model to classify transcription factor proteins of primates. Our model has a hybrid structure that uses Recurrent Neural Network (RNN) based Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks with Word2Vec preprocessing step. Our model has 97.96% test accuracy, 97.55% precision, 95.26% recall, 96.22% f1-score. Our model was also tested with 5-fold cross-validation and reached 97.42% result. In the prepared model, LSTM was used in layers with fewer units, and GRU was used in layers with more units, and it was aimed to make the model a model that can be trained and run as quickly as possible. With the added dropout layers, the overfitting problem of the model is prevented.
Publisher
Balkan Journal of Electrical & Computer Engineering (BAJECE)
Reference57 articles.
1. J. J. Shu, “A new integrated symmetrical table for genetic codes,” Biosystems, vol. 151, pp. 21–26, Jan. 2017, doi: 10.1016/J.BIOSYSTEMS.2016.11.004.
2. J. D. WATSON and F. H. C. CRICK, “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid,” Nature, vol. 171, no. 4356, pp. 737–738, Apr. 1953, doi: 10.1038/171737a0.
3. D. R. Ferrier, “Protein Yapısı ve İşlevi,” in Lippincott Biyokimya: Görsel Anlatımlı Çalışma Kitapları, B. A. Jameson, Ed. İstanbul: Nobel Tıp Kitapevleri, 2019, pp. 1–68.
4. Pfam, “Family: HLH (PF00010).” http://pfam.xfam.org/family/pf00010 (accessed Feb. 02, 2019).
5. T. Kaplan and M. D. Biggin, “Quantitative Models of the Mechanisms that Control Genome-Wide Patterns of Animal Transcription Factor Binding,” Methods Cell Biol, vol. 110, pp. 263–283, Jan. 2012, doi: 10.1016/B978-0-12-388403-9.00011-4.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献