Abstract
AbstractHigh-throughput multi-omic molecular profiling allows probing biological systems at unprecedented resolution. However, the integration and interpretation of high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biology using current methods is particularly difficult because they are not based on biological principles, but instead focus exclusively on a dimensionality reduction task. Here we introduce MIDAA (Multiomic Integration with Deep Archetypal Analysis), a framework that combines archetypal analysis, an approach grounded in biological principles, with deep learning. Using the concept of archetypes that are based on evolutionary trade-offs and Pareto optimality – MIDAA finds extreme data points that define the geometry of the latent space, preserving the complexity of biological interactions while retaining an interpretable output. We demonstrate that indeed these extreme points represent cellular programmes reflecting the underlying biology. We show on real and simulated multi-omics data how MIDAA outperforms state-of-the-art methods in identifying parsimonious, interpretable, and biologically relevant patterns.
Publisher
Cold Spring Harbor Laboratory