Abstract
AbstractInference of gene regulatory networks from single-cell expression data, such as single-cell RNA sequencing, is a popular problem in computational biology. Despite diverse methods spanning information theory, machine learning, and statistics, it is unsolved. This shortcoming can be attributed to measurement errors, lack of perturbation data, or difficulty in causal inference. Yet, it is not known if kinetic properties of gene expression also cause an issue. We show how the relative stability of mRNA and protein hampers inference. Available inference methods perform benchmarking on synthetic data lacking protein species, which is biologically incorrect. We use a simple model of gene expression, incorporating both mRNA and protein, to show that a more stable protein than mRNA can cause loss in correlation between the mRNA of a transcription factor and its target gene. This can also happen when mRNA and protein are on the same timescale. The relative difference in timescales affects true interactions more strongly than false positives, which may not be suppressed. Besides correlation, we find that information-theoretic nonlinear measures are also prone to this problem. Finally, we demonstrate these principles in real single-cell RNA sequencing data for over 1700 yeast genes.
Publisher
Cold Spring Harbor Laboratory