Artificial Neural Networks for classification of single cell gene expression-Reference-Cited by-同舟云学术

Artificial Neural Networks for classification of single cell gene expression

Published:2021-07-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zhong Jiahui^ORCID,Lyu Minjie^ORCID,Jin Huan^ORCID,Cao Zhiwei^ORCID,Chitkushev Lou T.^ORCID,Zhang Guanglan^ORCID,Keskin Derin B.^ORCID,Brusic Vladimir^ORCID

Abstract

AbstractBackgroundSingle-cell transcriptome (SCT) sequencing technology has reached the level of high-throughput technology where gene expression can be measured concurrently from large numbers of cells. The results of gene expression studies are highly reproducible when strict protocols and standard operating procedures (SOP) are followed. However, differences in sample processing conditions result in significant changes in gene expression profiles making direct comparison of different studies difficult. Unsupervised machine learning (ML) uses clustering algorithms combined with semi-automated cell labeling and manual annotation of individual cells. They do not scale up well and a workflow used on a specific dataset will not perform well with other studies. Supervised ML classification shows superior classification accuracy and generalization properties as compared to unsupervised ML methods. We describe a supervised ML method that deploys artificial neural networks (ANN), for 5-class classification of healthy peripheral blood mononuclear cells (PBMC) from multiple diverse studies.ResultsWe used 58 data sets to train ANN incrementally – over ten cycles of training and testing. The sample processing involved four protocols: separation of PBMC, separation of PBMC + enrichment (by negative selection), separation of PBMC + FACS, and separation of PBMC + MACS. The training data set included between 85 and 110 thousand cells, and the test set had approximately 13 thousand cells. Training and testing were done with various combinations of data sets from four principal data sources. The overall accuracy of classification on independent data sets reached 5-class classification accuracy of 94%. Classification accuracy for B cells, monocytes, and T cells exceeded 95%. Classification accuracy of natural killer (NK) cells was 75% because of the similarity between NK cells and T cell subsets. The accuracy of dendritic cells (DC) was low due to very low numbers of DC in the training sets.ConclusionsThe incremental learning ANN model can accurately classify the main types of PBMC. With the inclusion of more DC and resolving ambiguities between T cell and NK cell gene expression profiles, we will enable high accuracy supervised ML classification of PBMC. We assembled a reference data set for healthy PBMC and demonstrated a proof-of-concept for supervised ANN method in classification of previously unseen SCT data. The classification shows high accuracy, that is consistent across different studies and sample processing methods.

Publisher

Cold Spring Harbor Laboratory

Reference60 articles.

1. Single-cell multi-omics and its prospective application in cancer biology;Proteomics.,2020

2. Future medical applications of single-cell sequencing in cancer;Genome Med.,2011

3. Mosallaei M , Ehtesham N , Rahimirad S , Saghi M , Vatandoost N , Khosravi S. PBMCs: A new source of diagnostic and prognostic biomarkers. Arch. Physiol. Biochem. 2020:1–7.

4. PHD1 controls muscle mTORC1 in a hydroxylation-independent manner by stabilizing leucyl tRNA synthetase

5. Identification of a novel cancer stem cell subpopulation that promotes progression of human fatal renal cell carcinoma by single-cell RNA-seq analysis;Int. J. Boil. Sci.,2020