Abstract
AbstractAnalyzing the immense diversity of RNA isoforms in large RNA-seq repositories requires laborious data processing using specialized tools. Indexing techniques based on k-mers have previously been effective at searching for RNA sequences across thousands of RNA-seq libraries but falling short of enabling direct RNA quantification. We show here that RNAs queried in the form of k-mer sets can be quantified in seconds, with a precision akin to that of conventional RNA quantification methods. We showcase several applications by exploring an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples. Non-reference RNA sequences such as RNAs harboring driver mutations and fusions, splicing isoforms or RNAs derived from repetitive elements, can be retrieved with high accuracy. Moreover, we show that k-mer indexing offers a powerful means to reveal variant RNAs induced by specific gene alterations, for instance in splicing factors. A web server allows public queries in CCLE and other indexes:https://transipedia.fr. Code is provided to allow users to set up their own server from any RNA-seq dataset.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献