Confidence estimation for t-SNE embeddings using random forest-Reference-Cited by-同舟云学术

Confidence estimation for t-SNE embeddings using random forest

Published:2022-09-10 Issue:12 Volume:13 Page:3981-3992
ISSN:1868-8071
Container-title:International Journal of Machine Learning and Cybernetics
language:en
Short-container-title:Int. J. Mach. Learn. & Cyber.

Author:

Ozgode Yigin Busra^ORCID,Saygili Gorkem

Abstract

AbstractDimensionality reduction algorithms are commonly used for reducing the dimension of multi-dimensional data to visualize them on a standard display. Although many dimensionality reduction algorithms such as the t-distributed Stochastic Neighborhood Embedding aim to preserve close neighborhoods in low-dimensional space, they might not accomplish that for every sample of the data and eventually produce erroneous representations. In this study, we developed a supervised confidence estimation algorithm for detecting erroneous samples in embeddings. Our algorithm generates a confidence score for each sample in an embedding based on a distance-oriented score and a random forest regressor. We evaluate its performance on both intra- and inter-domain data and compare it with the neighborhood preservation ratio as our baseline. Our results showed that the resulting confidence score provides distinctive information about the correctness of any sample in an embedding compared to the baseline. The source code is available at https://github.com/gsaygili/dimred.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s13042-022-01635-2.pdf

Reference45 articles.

1. Mahfouz A, van de Giessen M, van der Maaten L, Huisman S, Reinders M, Hawrylycz MJ, Lelieveldt BP (2015) Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings. Methods 73:79–89. https://doi.org/10.1016/j.ymeth.2014.10.004

2. Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell rna-seq based on a multinomial model. Genom Biol 20(1):116. https://doi.org/10.1186/s13059-019-1861-6

3. Kobak D, Berens P (2019) The art of using t-sne for single-cell transcriptomics. Nat Commun 10(5416). https://doi.org/10.1038/s41467-019-13056-x

4. Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17(4):628–641. https://doi.org/10.1093/bib/bbv108

5. Warmerdam VD, Kober T, Tatman R (2020) Going beyond t-sne: Exposing what lies in text embeddings. In: Proceedings of second workshop for nlp open source software (NLP-OSS), pp 52–60. https://doi.org/10.18653/v1/2020.nlposs-1.8

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of synthetic cathinone positional isomers using electron activated dissociation mass spectrometry;Analytica Chimica Acta;2024-08

2. Weighted t-Distributed Stochastic Neighbor Embedding for Projection-Based Clustering;Progress in Artificial Intelligence and Pattern Recognition;2023-12-20

3. Comparison of Machine Learning Techniques for Heart Disease Diagnosis and Prediction;2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA);2023-11-14

4. Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data;Scientific Reports;2023-04-21