Affiliation:
1. Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
2. TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
Abstract
Abstract
Motivation
Non-coding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and there is a pressing need for accurate computational detection methods that could be used to efficiently scan genomes to detect novel ncRNAs. However, unlike coding genes, ncRNAs often lack distinctive sequence features that could be used for recognizing them. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful cues for computational prediction, it has been also shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. Currently, the most effective ncRNA detection methods combine structure-based techniques with a comparative genome analysis approach to improve the prediction performance.
Results
In this paper, we propose RNAdetect, a computational method incorporating novel features for accurate detection of ncRNAs in combination with comparative genome analysis. Given a sequence alignment, RNAdetect can accurately detect the presence of functional ncRNAs by incorporating novel predictive features based on the concept of generalized ensemble defect (GED), which assesses the degree of structure conservation across multiple related sequences and the conformation of the individual folding structures to a common consensus structure. Furthermore, n-gram models (NGMs) are used to extract features that can effectively capture sequence homology to known ncRNA families. Utilization of NGMs can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrate that RNAdetect can accurately and reliably detect novel ncRNAs, outperforming the current state-of-the-art methods.
Availability and implementation
The source code for RNAdetect and the benchmark data used in this paper can be downloaded at https://github.com/bjyoontamu/RNAdetect.
Funder
National Science Foundation
United States Department of Agriculture National Institute of Food and Agriculture Award
Bioinformatics and Genomic Systems Engineering
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献