LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection-Reference-Cited by-同舟云学术

LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection

Published:2019-04-30 Issue:22 Volume:35 Page:4696-4706
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Johnson Travis S¹²^ORCID,Wang Tongxin²³,Huang Zhi²⁴^ORCID,Yu Christina Y¹²,Wu Yi²,Han Yatong⁵,Zhang Yan¹⁶,Huang Kun²⁷,Zhang Jie⁸

Affiliation:

1. Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA

2. Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA

3. Department of Computer Science, Indiana University, Bloomington, IN, USA

4. School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

5. Harbin Engineering University, Harbin, China

6. The Ohio State University Comprehensive Cancer Center (OSUCCC – James), Columbus, OH, USA

7. Regenstrief Institute, Indiana University School of Medicine, Indianapolis, IN, USA

8. Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA

Abstract

Abstract Motivation Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to (i) remove systematic biases, (ii) model multiple datasets with ambiguous labels and (iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously and generalizability beyond a single model type. Results We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%; simulated 2 datasets: 94%; pancreas datasets: 88% and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% versus 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared with CaSTLe (63%), scmap (50%) and MetaNeighbor (50%). LAmbDA is model- and dataset-independent and generalizable to diverse data types representing an advance in biocomputing. Availability and implementation github.com/tsteelejohnson91/LAmbDA Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

NLM-MIDAS

NLM-NRSA

The Ohio State University

Indiana University School of Medicine

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz295/28768993/btz295.pdf

Reference55 articles.

1. A web server for comparative analysis of single-cell RNA-seq data;Alavi;Nat Commun,2018

2. Identifying cell populations with scRNASeq;Andrews;Mol. Aspects Med,2018

3. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure;Baron;Cell Syst,2016

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. scPLAN: a hierarchical computational framework for single transcriptomics data annotation, integration and cell-type label refinement;Briefings in Bioinformatics;2024-05-23

2. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data;npj Digital Medicine;2024-05-14

3. SCIPAC: quantitative estimation of cell-phenotype associations;Genome Biology;2024-05-13

4. Single-cell type annotation with deep learning in 265 cell types for humans;Bioinformatics Advances;2024-01-01

5. CellSTAR: a comprehensive resource for single-cell transcriptomic annotation;Nucleic Acids Research;2023-10-19