Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations

Author:

Li Ying1,Zhang Qi2,Liu Zhaoqian3,Wang Cankun4,Han Siyu5,Ma Qin6,Du Wei2

Affiliation:

1. College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China

2. College of Computer Science and Technology, Jilin University, Changchun, China

3. School of Mathematics, Shandong University, and now she is a visiting scholar at Ohio State University

4. Ohio State University

5. Department of Computer Science, Faculty of Engineering, University of Bristol

6. Department of Biomedical Informatics, Ohio State University

Abstract

Abstract Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Jilin Province

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference50 articles.

1. The noncoding RNA revolution—trashing old rules to forge new ones;Cech;Cell,2014

2. The RNA world is alive and well;Meyers;Trends Plant Sci,2008

3. Non-coding RNA: a new frontier in regulatory biology;Fu;Natl Sci Rev,2014

4. MiRNAs in human cancer;Farazi;J Pathol,2011

5. Therapeutic siRNAs;Sioud;Trends Pharmacol Sci,2004

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3