Abstract
AbstractMass spectrometry imaging (MSI) is an emerging technology that holds potential for improving clinical diagnosis, biomarker discovery, metabolomics research and pharmaceutical applications. The large data size and high dimensional nature of MSI pose computational and memory complexities that hinder accurate identification of biologically-relevant molecular patterns. We propose msiPL, a robust and generic probabilistic generative model based on a fully-connected variational autoencoder for unsupervised analysis and peak learning of MSI data. The method can efficiently learn and visualize the underlying non-linear spectral manifold, reveal biologically-relevant clusters of tumor heterogeneity and identify underlying informative m/z peaks. The method provides a probabilistic parametric mapping to allow a trained model to rapidly analyze a new unseen MSI dataset in a few seconds. The computational model features a memory-efficient implementation using a minibatch processing strategy to enable the analyses of big MSI data (encompassing more than 1 million high-dimensional datapoints) with significantly less memory. We demonstrate the robustness and generic applicability of the application on MSI data of large size from different biological systems and acquired using different mass spectrometers at different centers, namely: 2D Matrix-Assisted Laser Desorption Ionization (MALDI) Fourier Transform Ion Cyclotron Resonance (FT ICR) MSI data of human prostate cancer, 3D MALDI Time-of-Flight (TOF) MSI data of human oral squamous cell carcinoma, 3D Desorption Electrospray Ionization (DESI) Orbitrap MSI data of human colorectal adenocarcinoma, 3D MALDI TOF MSI data of mouse kidney, and 3D MALDI FT ICR MSI data of a patient-derived xenograft (PDX) mouse brain model of glioblastoma.SignificanceMass spectrometry imaging (MSI) provides detailed molecular characterization of a tissue specimen while preserving spatial distributions. However, the complex nature of MSI data slows down the processing time and poses computational and memory challenges that hinder the analysis of multiple specimens required to extract biologically relevant patterns. Moreover, the subjectivity in the selection of parameters for conventional pre-processing approaches can lead to bias. Here, we present a generative probabilistic deep-learning model that can analyze and non-linearly visualize MSI data independent of the nature of the specimen and of the MSI platform. We demonstrate robustness of the method with application to different tissue types, and envision it as a new generation of rapid and robust analysis for mass spectrometry data.
Publisher
Cold Spring Harbor Laboratory