Abstract
ABSTRACTRNA plays essential regulatory roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available.Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 provides an integrative solution by incorporating diverse types of experimental and genomic data in an accessible fashion via the Galaxy framework. We demonstrate that the tasks of clustering and annotation of structured RNAs can be considerably improved, through a scalable methodology that also supports structure probing data. Based on this, we further introduce an off-the-shelf procedure to identify locally conserved structure candidates in long RNAs. In this way, we suggest the presence and the sparsity of phylogenetically conserved local structures in some long non-coding RNAs. Furthermore, we demonstrate the advantage of a scalable clustering for discovering structured motifs under inherent and experimental biases and uncover prominent targets of the double-stranded RNA binding protein Roquin-1 that are evolutionary conserved.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献