Author:
Niazi Farshad,Valadkhan Saba
Abstract
Recent transcriptome analyses have indicated that a large part of mammalian genomes are transcribed into long non-protein-coding RNAs (lncRNAs). However, only a very small fraction of them have been individually studied, and whether the majority of lncRNAs found in large-scale studies have a cellular role is debated. To gain insight into the sequence features and genomic architecture of the subset of lncRNAs that have been proven to be functional, we created a database containing studied lncRNAs manually culled from the literature along with a parallel database containing all annotated protein-coding human RNAs. The Functional lncRNA Database, which contains 204 lncRNAs and their splicing variants, is available at valadkhanlab.org/database. Analysis of the lncRNAs and their comparison to protein-coding transcripts revealed sequence features including paucity of introns and low GC content in lncRNAs, which could explain several biological characteristics of these transcripts, such as their nuclear localization and low expression level. The predicted ORFs in lncRNAs have poor start codon and ORF contexts, which would lead to activation of the nonsense-mediated decay pathways and thus make it unlikely for most lncRNAs to code for even short peptides. Interestingly, our analyses revealed significant similarities between the lncRNAs and the 3′ untranslated regions (3′ UTRs) in protein-coding RNAs in structural features and sequence composition. The presence of these intriguing parallels between the lncRNAs and 3′ UTRs, which constitute the two main components of the RNA-mediated cellular regulatory system, indicates that highly similar evolutionary constraints govern the function of regulatory RNA sequences in the cell.
Publisher
Cold Spring Harbor Laboratory
Cited by
148 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献