Abstract
AbstractA grand challenge of analytical chemistry is the identification of unknown molecules based on tandem mass spectrometry (MS/MS) spectra. Current metabolite annotation approaches are often manual or partially automated, and commonly rely on a spectral database to search from or train machine learning classifiers on. Unfortunately, spectral databases are often instrument specific and incomplete due to the limited availability of compound standards or a molecular database, which limits the ability of methods utilizing them to predict novel molecule structures. We describe a generative modeling approach that can leverage the vast amount of unpaired and/or unlabeled molecule structures and MS/MS spectra to learn general rules for synthesizing molecule structures and MS/MS spectra. The approach is based on recent work using semi-supervised deep variational autoencoders to learn joint latent representations of multiple and complex modalities. We show that adding molecule structures with no spectra to the training set improves the prediction quality on spectra from a structure disjoint dataset of new molecules, which is not possible using bi-modal supervised approaches. The described methodology provides a demonstration and framework for how recent advances in semi-supervised machine learning can be applied to overcome bottlenecks in missing annotations and noisy data to tackle unaddressed problems in the life sciences where large volumes of data are available.
Publisher
Cold Spring Harbor Laboratory
Reference63 articles.
1. SmartPeak Automates Targeted and Quantitative Metabolomics Data Processing;Anal. Chem,2020
2. XCMS-MRM and METLIN-MRM: a cloud library and public resource for targeted analysis of small molecules
3. MRM-DIFF: data processing strategy for differential analysis in large scale MRM-based lipidomics studies;Front. Genet,2014
4. El-MAVEN: A Fast, Robust, and User-Friendly Mass Spectrometry Data Processing Engine for Metabolomics;Methods Mol. Biol,2019
5. The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics;MassSpectrom. Rev,2020
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献