Novel transformer networks for improved sequence labeling in genomics-Reference-Cited by-同舟云学术

Novel transformer networks for improved sequence labeling in genomics

Published:2019-11-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Clauwaert Jim^ORCID,Waegeman Willem

Abstract

AbstractIn genomics, a wide range of machine learning methodologies have been investigated to annotate biological sequences for positions of interest such as transcription start sites, translation initiation sites, methylation sites, splice sites and promoter start sites. In recent years, this area has been dominated by convolutional neural networks, which typically outperform previously-designed methods as a result of automated scanning for influential sequence motifs. However, those architectures do not allow for the efficient processing of the full genomic sequence. As an improvement, we introduce transformer architectures for whole genome sequence labeling tasks. We show that these architectures, recently introduced for natural language processing, are better suited for processing and annotating long DNA sequences. We apply existing networks and introduce an optimized method for the calculation of attention from input nucleotides. To demonstrate this, we evaluate our architecture on several sequence labeling tasks, and find it to achieve state-of-the-art performances when comparing it to specialized models for the annotation of transcription start sites, translation initiation sites and 4mC methylation in E. coli.

Publisher

Cold Spring Harbor Laboratory

Reference42 articles.

1. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

2. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning

3. Promoter recognition by Escherichia coli RNA polymerase

4. J. L. Ba , J. R. Kiros , and G. E. Hinton . Layer Normalization. arXiv:1607.06450 [cs, stat], July 2016. arXiv: 1607.06450.

5. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transformers and large language models in healthcare: A review;Artificial Intelligence in Medicine;2024-08

2. Artificial intelligence and the future of life sciences;Drug Discovery Today;2021-11

3. CpG Transformer for imputation of single-cell methylomes;2021-06-09

4. DeCban: Prediction of circRNA-RBP Interaction Sites by Using Double Embeddings and Cross-Branch Attention Networks;Frontiers in Genetics;2021-01-22

5. Explainable Transformer Models for Functional Genomics in Prokaryotes;2020-03-18