Integrated analysis of multimodal single-cell data with structural similarity

Author:

Cao Yingxin123ORCID,Fu Laiyi14,Wu Jie5,Peng Qinke4,Nie Qing623ORCID,Zhang Jing1,Xie Xiaohui1ORCID

Affiliation:

1. Department of Computer Science, University of California , Irvine , CA  92697,  USA

2. Center for Complex Biological Systems, University of California , Irvine , CA  92697,  USA

3. NSF-Simons Center for Multiscale Cell Fate Research, University of California , Irvine , CA  92697,  USA

4. Systems Engineering Institute, School of Electronic and Information Engineering, Xi’an Jiaotong University , Xi’an , Shannxi  710049,  China

5. Department of Biological Chemistry, University of California , Irvine , CA  92697,  USA

6. Department of Mathematics, University of California , Irvine , CA  92697,  USA

Abstract

Abstract Multimodal single-cell sequencing technologies provide unprecedented information on cellular heterogeneity from multiple layers of genomic readouts. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality analysis. How to efficiently utilize the extra information from single cell multi-omics to delineate cell states and identify meaningful signal remains as a significant computational challenge. In this work, we propose a deep learning framework, named SAILERX, for efficient, robust, and flexible analysis of multi-modal single-cell data. SAILERX consists of a variational autoencoder with invariant representation learning to correct technical noises from sequencing process, and a multimodal data alignment mechanism to integrate information from different modalities. Instead of performing hard alignment by projecting both modalities to a shared latent space, SAILERX encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, which facilitates various downstream analysis such as clustering, imputation, and marker gene detection. Furthermore, the invariant representation learning part enables SAILERX to perform integrative analysis on both multi- and single-modal datasets, making it an applicable and scalable tool for more general scenarios.

Funder

National Science Foundation

National Institutes of Health

National Institute of Mental Health

Simons Foundation

Publisher

Oxford University Press (OUP)

Subject

Genetics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3