BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction-Reference-Cited by-同舟云学术

BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction

Published:2022-08-10 Issue:4 Volume:14 Page:841-862
ISSN:1913-2751
Container-title:Interdisciplinary Sciences: Computational Life Sciences
language:en
Short-container-title:Interdiscip Sci Comput Life Sci

Author:

Asim Muhammad Nabeel^ORCID,Ibrahim Muhammad Ali,Zehe Christoph,Trygg Johan,Dengel Andreas,Ahmed Sheraz

Abstract

Abstract Background and objective: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. Method The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. Results BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. Conclusion In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. Availability: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. Graphic Abstract

Funder

Sartorius Artificial Intelligence Lab

Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI)

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Computer Science Applications,General Biochemistry, Genetics and Molecular Biology

Link

https://link.springer.com/content/pdf/10.1007/s12539-022-00535-x.pdf

Reference108 articles.

1. Le NQK, Yapp EKY, Ho Q-T, Nagasundaram N, Ou Y-Y, Yeh H-Y (2019) iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding analytical biochemistry. Elsevier, vol 571, pp 53–61. https://doi.org/10.1016/j.ab.2019.02.017

2. Asim MN, Ibrahim MA, Malik MI, Dengel A, Ahmed S (2020) Enhancer-dsnet: a supervisedly prepared enriched sequence representation for the identification of enhancers and their strength, pp 38–48. https://doi.org/10.1007/978-3-030-63836-8_4