Abstract
ABSTRACTThe accumulation of single-cell omics datasets in the public domain has opened new opportunities for reusing and leveraging the vast amount of information they contain. Such uses, however, are complicated by the need for complex and resource-consuming procedures for data transfer, normalization and integration that must be addressed prior to any analysis. Here we present scvi-hub: a platform for efficiently sharing and accessing single-cell omics datasets using pre-trained probabilistic models. We demonstrate that scvi-hub allows immediate access to a slew of fundamental tasks like visualization, imputation, annotation, outlier detection, and deconvolution of new (query) datasets, using state of the art algorithms and with a requirement for storage and compute resources that is much lower compared to standard approaches. We also show that the pre-trained models enable efficient analysis and new discoveries with existing references, including large atlases such as the CZ CELLxGENE Discover Census. Scvi-hub is built within the scvi-tools open source environment and integrated into scverse. It provides powerful and readily available tools for utilizing a large collection of already-loaded datasets while also enabling easy inclusion of new datasets, thus putting the power of atlas-level analysis at the fingertips of a broad community of users.
Publisher
Cold Spring Harbor Laboratory
Reference37 articles.
1. “10X Visium Prostate.” n.d. Accessed January 1, 2024. https://www.10xgenomics.com/resources/datasets/human-prostate-cancer-adjacent-normal-section-with-if-staining-ffpe-1-standard.
2. Single-Cell and Spatial Transcriptomics Enables Probabilistic Inference of Cell Type Topography;Communications Biology,2020
3. MultiVI: Deep Generative Model for the Integration of Multimodal Data;Nature Methods,2023
4. An Empirical Bayes Method for Differential Expression Analysis of Single Cells with Deep Generative Models;Proceedings of the National Academy of Sciences of the United States of America,2023
5. Integrating single-cell transcriptomic data across different conditions, technologies, and species