Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures-Reference-Cited by-同舟云学术

Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures

Published:2021-02-10 Issue:10 Volume:15 Page:1130-1136
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Du Zhihua¹,Xiao Xiangdong¹,Uversky Vladimir N.²

Affiliation:

1. Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, China

2. Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, Florida, (V.N.U.), United States

Abstract

Background: Chromosomal DNA contains most of the genetic information of eukaryotes and plays an important role in the growth, development and reproduction of living organisms. Most chromosomal DNA sequences are known to wrap around histones, and distinguishing these DNA sequences from ordinary DNA sequences is important for understanding the genetic code of life. The main difficulty behind this problem is the feature selection process. DNA sequences have no explicit features, and the common representation methods, such as onehot coding, introduced the major drawback of high dimensionality. Recently, deep learning models have been proved to be able to automatically extract useful features from input patterns. Objective: We aim to investigate which deep learning networks could achieve notable improvements in the field of DNA sequence classification using only sequence information. Methods: In this paper, we present four different deep learning architectures using convolutional neural networks and long short-term memory networks for the purpose of chromosomal DNA sequence classification. Natural language model Word2vec was used to generate word embedding of sequence and learn features from it by deep learning. Results: The comparison of these four architectures is carried out on 10 chromosomal DNA datasets. The results show that the architecture of convolutional neural networks combined with long short-term memory networks is superior to other methods with regards to the accuracy of chromosomal DNA prediction. Conclusion: In this study, four deep learning models were compared for an automatic classification of chromosomal DNA sequences with no steps of sequence preprocessing. In particular, we have regarded DNA sequences as natural language and extracted word embedding with Word2Vec to represent DNA sequences. Results show a superiority of the CNN+LSTM model in the ten classification tasks. The reason for this success is that the CNN module captures the regulatory motifs, while the following LSTM layer captures the long-term dependencies between them.

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Reference36 articles.

1. Struhl K.; Segal E.J.N.; Determinants of nucleosome positioning. Nat Struct Mol Biol 2013,20(3),267

2. Yuan G.C.J.W.I.R.S.B.; Linking genome to epigenome. Nat Struct Mol Biol 2012,4(3),297-309

3. Altschul S.F.; Gish W.; Miller W.; Myers W.; Lipman D.J.; Basic local alignment search tool. J Mol Biol 1990,215,403-410

4. Vinga S.; Almeida J.J.B.; Alignment-free sequence comparison—a review. Bioinformatics 2003,19(3),513-523

5. Bosco G.L.; Di Gangi M.A.; Deep learning architectures for DNA sequence classification. International Workshop on Fuzzy Logic and Applications International Workshop on Fuzzy Logic and Applications Cham: Springer 2016

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences;Current Bioinformatics;2024-01-30

2. DNA sequence classification using artificial intelligence;Applications of Artificial Intelligence in Healthcare and Biomedicine;2024

3. SDBA: Score Domain-Based Attention for DNA N4-Methylcytosine Site Prediction from Multiperspectives;Journal of Chemical Information and Modeling;2023-08-30

4. A Survey on Gene Classification Based on DNA Sequence;Intelligent Sustainable Systems;2023

5. GC6mA-Pred: A deep learning approach to identify DNA N6-methyladenine sites in the rice genome;Methods;2022-08