Abstract
AbstractThe number of available biological sequences has increased significantly in recent years due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed to analyze and extract information from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The utilization of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several obstacles that motivate new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes to study and analyze a feature extraction pipeline based on mathematical models (Numerical Mapping, Fourier, Entropy, and Complex Networks). As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this work into two studies, e.g., (I) we assessed our proposal with the most addressed problem in our review, e.g., lncRNA vs. mRNA; (II) we tested its generalization on different classification problems, e.g., circRNA vs. lncRNA. The experimental results demonstrated three main contributions: (1) An in-depth study of several mathematical models; (2) a new feature extraction pipeline and (3) its generalization and robustness for distinct biological sequence classification.
Publisher
Cold Spring Harbor Laboratory
Reference111 articles.
1. H. Lou , M. Schwartz , J. Bruck , F. Farnoud , Evolution of k-mer frequencies and entropy in duplication and substitution mutation systems, IEEE Transactions on Information Theory (2019).
2. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era;deep Learning in Bioinformatics;Methods,2019
3. R. Min , Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis, University of Toronto, 2010.
4. Bioinformatic analysis and prediction of the function and regulatory network of long non-coding rnas in hepatocellular carcinoma;Oncology letters,2018
5. W. J. d. S. Diniz , F. Canduri , Bioinformatics: an overview and its applications, Genet Mol Res 16 (1) (2017).
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献