Abstract
AbstractBoolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expressions in cells, as scRNA-Seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-Seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-Seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-Seq datasets, including dropout events, with Boolean states is a challenging task.We presentscBoolSeq, a method for the bidirectional linking of scRNA-Seq data and Boolean activation state of genes. Given a reference scRNA-Seq dataset,scBoolSeqcomputes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions,scBoolSeqcan perform both binarisation of scRNA-Seq datasets, and generate synthetic scRNA-Seq datasets from Boolean trajectories, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application ofscBoolSeq’s binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-Seq data generated byscBoolSeqwith BoolODE from the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-Seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in a two-dimensional projection of the data.Author summaryThe qualitative and logical modeling of cell dynamics has brought precious insight on gene regulatory mechanisms that drive cellular differentiation and fate decisions by predicting cellular trajectories and mutations for their control. However, the design and validation of these models is impeded by the quantitative nature of experimental measurements of cellular states. In this paper, we provide and assess a new methodology,scBoolSeqfor bridging single-cell level pseudocounts of RNA transcripts with Boolean classification of gene activity levels. Our method, implemented as a Python package, enables both tobinarisescRNA-Seq data in order to match quantitative measurements with states of logicals models, and to generate synthetic data from Boolean trajectories in order to benchmark inference methods. We show thatscBoolSeqaccurately captures main statistical features of scRNA-Seq data, including measurement dropouts, improving significantly the state of the art. Overall, scBoolSeq brings a statistically-grounded method for enabling the inference and validation of qualitative models from scRNA-Seq data.
Publisher
Cold Spring Harbor Laboratory