Improved detection of gene fusions by applying statistical methods reveals new oncogenic RNA cancer drivers

Author:

Dehghannasiri RoozbehORCID,Freeman Donald Eric,Jordanski Milos,Hsieh Gillian L.,Damljanovic Ana,Lehnert Erik,Salzman Julia

Abstract

Short AbstractThe extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce a new algorithm, DEEPEST, that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling ten-fold fewer false-positive fusions in non-transformed human tissues. We leverage the increased precision of DEEPEST to discover new cancer biology. For example, 888 new candidate oncogenes are identified based on over-representation in DEEPEST-Fusion calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs partners, demonstrating a previously unappreciated prevalence and potential for function. Specific protein domains are enriched in DEEPEST calls, demonstrating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. DEEPEST also reveals a high enrichment for fusions involving known and novel oncogenes in diseases including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.SignificanceGene fusions are tumor-specific genomic aberrations and are among the most powerful biomarkers and drug targets in translational cancer biology. The advent of RNA-Seq technologies over the past decade has provided a unique opportunity for detecting novel fusions via deploying computational algorithms on public sequencing databases. Yet, precise fusion detection algorithms are still out of reach. We develop DEEPEST, a highly specific and efficient statistical pipeline specially designed for mining massive sequencing databases, and apply it to all 33 tumor types and 10,500 samples in The Cancer Genome Atlas database. We systematically profile the landscape of detected fusions via employing classic statistical models and identify several signatures of selection for fusions in tumors.Software availabilityDEEPEST-Fusion workflow with a detailed readme file is available as a Github repository:https://github.com/salzmanlab/DEEPEST-Fusion. In addition to the main workflow, which is based on CWL, example input and batch scripts (for job submission on local clusters), and codes for building the SBT files and SBT querying are provided in the repository. All custom scripts used for systematic analysis of fusions are also available in the same repository.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3