Embedding to reference t-SNE space addresses batch effects in single-cell classification-Reference-Cited by-同舟云学术

Embedding to reference t-SNE space addresses batch effects in single-cell classification

Published:2021-08-24 Issue: Volume: Page:
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Poličar Pavlin G.^ORCID,Stražar Martin,Zupan Blaž

Abstract

AbstractDimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data. When jointly visualising multiple data sets, a straightforward application of these methods often fails; instead of revealing underlying classes, the resulting visualizations expose dataset-specific clusters. To circumvent these batch effects, we propose an embedding procedure that uses a t-SNE visualization constructed on a reference data set as a scaffold for embedding new data points. Each data instance from a new, unseen, secondary data is embedded independently and does not change the reference embedding. This prevents any interactions between instances in the secondary data and implicitly mitigates batch effects. We demonstrate the utility of this approach by analyzing six recently published single-cell gene expression data sets with up to tens of thousands of cells and thousands of genes. The batch effects in our studies are particularly strong as the data comes from different institutions using different experimental protocols. The visualizations constructed by our proposed approach are clear of batch effects, and the cells from secondary data sets correctly co-cluster with cells of the same type from the primary data. We also show the predictive power of our simple, visual classification approach in t-SNE space matches the accuracy of specialized machine learning techniques that consider the entire compendium of features that profile single cells.

Funder

Slovenian Research Agency Program Grant

BioPharm.SI

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-021-06043-1.pdf

Reference42 articles.

1. Bard, J., Rhee, S. Y., & Ashburner, M. (2005). An ontology for cell types. Genome Biology, 6, 2.

2. Baron, M., Veres, A., Wolock, S. L., Faust, A. L., Gaujoux, R., Vetere, A., et al. (2016). A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Systems, 3(4), 346–360.

3. Becht, E., McInnes, L., Healy, J., Dutertre, C. A., Kwok, I. W. H., Ng, L. G., et al. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology, 37(1), 38–47.

4. Belkina, A. C., Ciccolella, C. O., Anno, R., Halpert, R., Spidlen, J., & Snyder-Cappione, J. E. (2019). Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nature Communications, 10(1), 1–12.

5. Bickel, S., & Brückner, M. & Scheffer, T. . (2009). Discriminative learning under covariate shift. Journal of Machine Learning Research, 10, 2137–2155.

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic grid topology detection method based on Lasso algorithm and t-SNE algorithm;Energy Informatics;2024-05-29

2. Characterization of CD34+ Cells from Patients with Acute Myeloid Leukemia (AML) and Myelodysplastic Syndromes (MDS) Using a t-Distributed Stochastic Neighbor Embedding (t-SNE) Protocol;Cancers;2024-03-28

3. Use of t‐distributed stochastic neighbour embedding in vibrational spectroscopy;Journal of Chemometrics;2024-03-23

4. Technical Understanding from Interactive Machine Learning Experience: a Study Through a Public Event for Science Museum Visitors;Interacting with Computers;2024-03-12

5. BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework;Bioinformatics;2024-03-01