Affiliation:
1. School of Mathematical Sciences, Peking University, Beijing 100871, China
2. Center for Quantitative Biology, Peking University, Beijing 100871, China
3. Center for Statistical Science, Peking University, Beijing 100871, China
Abstract
Abstract
Motivation
Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high-quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets.
Results
Herein, a robust deep learning-based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-the-art annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets.
Availability and implementation
An implementation of scMRA is available from https://github.com/ddb-qiwang/scMRA-torch.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
National Key Research and Development Program of China
National Key Basic Research Project of China
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献