Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data

Author:

Marot-Lassauzaie Valérie12ORCID,Beneyto-Calabuig Sergi34,Obermayer Benedikt5,Velten Lars34,Beule Dieter56,Haghverdi Laleh1ORCID

Affiliation:

1. Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Str. 28, 10115 Berlin, Germany

2. Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin , Charitéplatz 1, 10117 Berlin , Germany

3. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology , 08003 Barcelona, Spain

4. Universitat Pompeu Fabra (UPF) , Barcelona, Spain

5. Core Unit Bioinformatics, Berlin Institute of Health at Charité – Universitätsmedizin Berlin , 10117 Berlin, Germany

6. Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC) , Robert-Rössle-Str. 10, 13125 Berlin, Germany

Abstract

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells’ expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data. Results To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data. Availability and implementation Source code is available at github.com/HaghverdiLab/CCLONE.

Funder

Bundesministerium für Bildung und Forschung

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3