Abstract
AbstractThe increasing throughput of single-cell technologies and the pace of data generation are enhancing the resolution at which we observe cell state transitions. The characterization and visualization of these transitions rely on the construction of a low dimensional embedding, which is usually done via non-parametric methods such as t-SNE or UMAP. However, existing approaches become more and more inefficient as the size of the data gets larger and larger. Here, we test the viability of using parametric methods for the fact that they can be trained with a small subset of the data and be applied to future data when needed. We observed that the recently developed parametric version of UMAP is generalizable and robust to dropout. Additionally, to certify the robustness of the model, we use the theoretical upper and lower bounds of the mapped coordinates in the UMAP space to regularize the training process.
Publisher
Cold Spring Harbor Laboratory