Abstract
AbstractT cell heterogeneity presents a challenge for accurate cell identification, understanding their inherent plasticity, and characterizing their critical role in adaptive immunity. Immunologists have traditionally employed techniques such as flow cytometry to identify T cell subtypes based on a well-established set of surface protein markers. With the advent of single-cell RNA sequencing (scRNA-seq), researchers can now investigate the gene expression profiles of these surface proteins at the single-cell level. The insights gleaned from these profiles offer valuable clues and a deeper understanding of cell identity. However, CD45RA, the isoform of CD45 which distinguish between naïve/central memory T cells and effector memory/effector memory cells re-expressing CD45RA T cells, cannot be well profiled by scRNA-seq due to the difficulty in mapping short reads to genes. In order to facilitate cell type annotation in T cell scRNA-seq analysis, we employed machine learning and trained a CD45RA+/-classifier on single-cell mRNA count data annotated with known CD45RA antibody levels provided by cellular indexing of transcriptomes and epitopes sequencing (CITE-seq) data. Among all algorithms we tested, the trained support vector machine (SVM) with a radial basis function (RBF) kernel with optimized hyperparameters achieved a 99.96% accuracy on an unseen dataset. The multilayer Perceptron (MLP) classifier, the second most predictive method overall, also achieved a decent accuracy of 99.74%. Our simple yet robust machine learning approach provides a valid inference on the CD45RA level, assisting the cell identity annotation and further exploring the heterogeneity within human T cells.
Publisher
Cold Spring Harbor Laboratory