Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads

Author:

He DongzeORCID,Gao YuanORCID,Chan Spencer SkylarORCID,Quintana-Parrilla Natalia,Patro RobORCID

Abstract

AbstractMotivationShort-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses.ResultsWe developForseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types.Forseticombines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of reads and identify the true gene origin of multi-gene mapped reads.AvailabilityForsetiand the code used for producing the results are available athttps://github.com/COMBINE-lab/forsetiunder a BSD 3-clause license.

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. 10x Genomics (2018). Technical Note – Base Composition of Sequencing Reads of Chromium Single Cell 3’ v2 Libraries, Document Number CG000080, 10x Genomics, (2018, November 19).

2. 10x Genomics (2021). Technical Note – Interpreting Intronic and Antisense Reads in 10x Genomics Single Cell Gene Expression Data, Document Number CG000376, 10x Genomics, (2021, August 9).

3. 10x Genomics (2022a). Technical Note – Assay Scheme and Configuration of Chromium Single Cell 3’ v2 Libraries, Document Number CG000108, 10x Genomics, (2022, December 2).

4. 10x Genomics (2022b). Technical Note – Interpreting Single Cell Gene Expression Data With and Without Intronic Reads, Document Number CG000554, 10x Genomics, (2022, June 21).

5. Generalizing RNA velocity to transient cell states through dynamical modeling

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3