Author:
Simpson Claire,Tabatsky Evgeniy,Rahil Zainab,Eddins Devon J.,Tkachev Sasha,Georgescauld Florian,Papalegis Derek,Culka Martin,Levy Tyler,Gregoretti Ivan,Chernyshev Andrei,Koeppen Hartmut,Walther Guenther,Ghosn Eliver E. B.,Orlova Darya
Abstract
AbstractUnsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structure, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, high-dimensional data may include transcriptomics, proteomics, and a variety of single-cell omics data. Most existing clustering algorithms operate directly in the high-dimensional space, and their performance may be negatively affected by the phenomenon known as the curse of dimensionality. Here, we show an alternative clustering approach that alleviates the curse by sequentially projecting high-dimensional data into a low-dimensional representation. We validated the effectiveness of our approach, named APP, across various biological data modalities, including flow and mass cytometry data, scRNA-seq, multiplex imaging data, and T-cell receptor repertoire data. APP efficiently recapitulated experimentally validated cell-type definitions and revealed new biologically meaningful patterns.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. A Projection Pursuit Algorithm for Exploratory Data Analysis
2. Friedman, J. H. & Stuetzle, W. (1982). Projection pursuit methods for data analysis, in Modern Data Analysis, R.L., Launer & A.F., Siegel , eds, Academic Press (pp. 123–147).
3. Hastie, T. , Tibshirani, R. , & Friedman, J . (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Stanford University Press.
4. Bellman, R. E. & Rand Corporation (1957). Dynamic programming. Princeton University Press (pp. ix.) ISBN 978-0-691-07951-6.
5. Bellman, R. E. (1961). Adaptive control processes: a guided tour. Princeton University Press. ISBN 9780691079011.