EVALUATING THE COMPLEXITY OF GENE COEXPRESSION ESTIMATION FOR SINGLE-CELL DATA
-
Published:2023
Issue:3
Volume:4
Page:37-82
-
ISSN:2689-3967
-
Container-title:Journal of Machine Learning for Modeling and Computing
-
language:en
-
Short-container-title:J Mach Learn Model Comput
Author:
Zhang Jiaqi,Singh Ritambhara
Abstract
With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene coexpression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene coexpression estimation methods on simulation datasets with known ground truth coexpression networks. We generate these novel datasets using two simulation processes, NORmal-To-Anything (NORTA) and Single-cell ExpRession of Genes In silicO (SERGIO), that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our
performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate, potentially caused by high sparsity levels in the data. Finally, we find that commonly used preprocessing approaches, such as normalization and imputation, do not improve the coexpression estimation. Overall, our benchmark setup contributes to the coexpression estimator development, and our study provides valuable insights for the community for single-cell data analyses.
Reference102 articles.
1. Albert, R., Scale-Free Networks in Cell Biology, J. Cell Sci., vol. 118, no. 21, pp. 4947-4957, 2005. 2. Allen, G.I. and Liu, Z., A Local Poisson Graphical Model for Inferring Networks from Sequencing Data, IEEE Trans. Nanobiosci., vol. 12, no. 3, pp. 189-198, 2013. 3. Allocco, D.J., Kohane, I.S., and Butte, A.J., Quantifying the Relationship between Coexpression, Coregulation and Gene Function, BMC Bioinf., vol. 5, no. 1, pp. 1-10, 2004. 4. Andrews, T.S. and Hemberg,M., False Signals Induced by Single-Cell Imputation, F1000 Research, vol. 7, 2018. DOI: 10.12688/f1000research.16613.2 5. Aoki, K., Ogata, Y., and Shibata, D., Approaches for Extracting Practical Information from Gene Coexpression Networks in Plant Biology, Plant Cell Physiol., vol. 48, no. 3, pp. 381-390, 2007.
|
|