Abstract
Abstract
Background
Animal genomes contain thousands of long noncoding RNA (lncRNA) genes, a growing subset of which are thought to be functionally important. This functionality is often mediated by short sequence elements scattered throughout the RNA sequence that correspond to binding sites for small RNAs and RNA binding proteins. Throughout vertebrate evolution, the sequences of lncRNA genes changed extensively, so that it is often impossible to obtain significant alignments between sequences of lncRNAs from evolutionary distant species, even when synteny is evident. This often prohibits identifying conserved lncRNAs that are likely to be functional or prioritizing constrained regions for experimental interrogation.
Results
We introduce here LncLOOM, a novel algorithmic framework for the discovery and evaluation of syntenic combinations of short motifs. LncLOOM is based on a graph representation of the input sequences and uses integer linear programming to efficiently compare dozens of sequences that have thousands of bases each and to evaluate the significance of the recovered motifs. We show that LncLOOM is capable of identifying specific, biologically relevant motifs which are conserved throughout vertebrates and beyond in lncRNAs and 3′UTRs, including novel functional RNA elements in the CHASERR lncRNA that are required for regulation of CHD2 expression.
Conclusions
We expect that LncLOOM will become a broadly used approach for the discovery of functionally relevant elements in the noncoding genome.
Funder
H2020 European Research Council
Publisher
Springer Science and Business Media LLC
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献