Abstract
AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. As biological interpretation relies on accurate structure annotations, the ability to assign confidence to such annotations is a key outstanding problem. We introduce the COSMIC workflow that combines structure database generation, in silico annotation, and a confidence score consisting of kernel density p-value estimation and a Support Vector Machine with enforced directionality of features. In evaluation, COSMIC annotates a substantial number of hits at small false discovery rates, and outperforms spectral library search for this purpose. To demonstrate that COSMIC can annotate structures never reported before, we annotated twelve novel bile acid conjugates; nine structures were confirmed by manual evaluation and two structures using synthetic standards. Second, we annotated and manually evaluated 315 molecular structures in human samples currently absent from the Human Metabolome Database. Third, we applied COSMIC to 17,400 experimental runs and annotated 1,715 structures with high confidence that were absent from spectral libraries.
Publisher
Cold Spring Harbor Laboratory
Reference85 articles.
1. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides;Nat Microbiol,2016
2. Feature-based molecular networking in the GNPS analysis environment
3. MetaboLights: a resource evolving in response to the needs of its scientific community;Nucleic Acids Res,2019
4. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools;Nucleic Acids Res,2015