Abstract
AbstractElucidating the structure of a chemical compound is a fundamental task in chemistry with applications in multiple domains including drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, these methods fail for novel molecules that are not present in the reference database. We propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key molecular substructures from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree methods particularly when test structure information is not available during training or present in the reference database.
Publisher
Springer Science and Business Media LLC
Subject
Materials Chemistry,Biochemistry,Environmental Chemistry,General Chemistry
Reference37 articles.
1. Nalbantoğlu, S. Metabolomics: Basic principles and strategies. In Nalbantoğlu, S. & Amri, H. (eds.) Molecular Medicine (IntechOpen, 2019).
2. Lee, S. et al. Exploring the metabolomic diversity of plant species across spatial (leaf and stem) components and phylogenic groups. BMC Plant Biol. 20, 39 (2020).
3. Emwas, A. H. The strengths and weaknesses of NMR spectroscopy and mass spectrometry with particular focus on metabolomics research. Methods in Molecular Biology 161–193 (2015).
4. Wishart, D. S. Computational strategies for metabolite identification in metabolomics. Bioanalysis 1, 1579–1596 (2009).
5. Demartini, D. R. A short overview of the components in mass spectrometry instrumentation for proteomics analyses. In Coelho, A. V. & Ferraz Franco, C. D. M. (eds.) Tandem Mass Spectrometry - Molecular Characterization (IntechOpen, 2013).
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献