Identification of Kidney Cell Types in scRNA-seq and snRNA-seq Data Using Machine Learning Algorithms-Reference-Cited by-同舟云学术

Identification of Kidney Cell Types in scRNA-seq and snRNA-seq Data Using Machine Learning Algorithms

Published:2024-01-31 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tisch Adam¹,Madapoosi Siddharth²^ORCID,Blough Stephen¹,Rosa Jan¹,Eddy Sean³,Mariani Laura³,Naik Abhijit³,Limonte Christine⁴,Mccown Philip³,Menon Rajasree⁵,Rosas Sylvia⁶,Parikh Chirag⁷,Kretzler Matthias³,Mahfouz Ahmed⁸,Alakwaa Fadhl³^ORCID

Affiliation:

1. University of Michigan

2. University of Michigan Medical School

3. Michigan Medicine: University of Michigan Michigan Medicine

4. University of Washington

5. University of Michigan Department of Computational Medicine and Bioinformatics

6. Joslin Diabetes Center

7. Johns Hopkins School of Medicine: The Johns Hopkins University School of Medicine

8. Leiden University Medical Center: Leids Universitair Medisch Centrum

Abstract

Background Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) provide valuable insights into the cellular states of kidney cells. However, the annotation of cell types often requires extensive domain expertise and time-consuming manual curation, limiting scalability and generalizability. To facilitate this process, we tested the performance of five supervised classification methods for automatic cell type annotation. Results We analyzed publicly available sc/snRNA-seq datasets from five expert-annotated studies, comprising 62,120 cells from 79 kidney biopsy samples. Datasets were integrated by harmonizing cell type annotations across studies. Five different supervised machine learning algorithms (support vector machines, random forests, multilayer perceptrons, k-nearest neighbors, and extreme gradient boosting) were applied to automatically annotate cell types using four training datasets and one testing dataset. Performance metrics, including accuracy (F1 score) and rejection rates, were evaluated. All five machine learning algorithms demonstrated high accuracies, with a median F1 score of 0.94 and a median rejection rate of 1.8%. The algorithms performed equally well across different datasets and successfully rejected cell types that were not present in the training data. However, F1 scores were lower when models trained primarily on scRNA-seq data were tested on snRNA-seq data. Conclusions Our findings demonstrate that machine learning algorithms can accurately annotate a wide range of adult kidney cell types in scRNA-seq/snRNA-seq data. This approach has the potential to standardize cell type annotation and facilitate further research on cellular mechanisms underlying kidney disease.

Publisher

Research Square Platform LLC

Reference43 articles.

1. Defining cell-type specificity at the transcriptional level in human disease;Ju W;Genome Res,2013

2. Cell type-specific gene expression differences in complex tissues;Shen-Orr SS;Nat Methods,2010

3. Correction to: A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases;Gawel DR;Genome Med,2020

4. A comparison of automatic cell identification methods for single-cell RNA sequencing data;Abdelaal T;Genome Biol,2019

5. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors;Young MD;Science,2018