Abstract
AbstractLarge-scale metabolomics research faces challenges in accurate metabolite annotation and false discovery rate (FDR) estimation. Recent progress in addressing these challenges has leveraged experience from proteomics and inspiration from other sciences. Although the target-decoy strategy has been applied to metabolomics, generating reliable decoy libraries is difficult due to the complexity of metabolites. Additionally, continuous bioinformatic efforts are necessary to increase the utilization of growing spectra resources while reducing false identifications. Here we introduce the concept of ion entropy and present two entropy-based decoy generation methods. The assessment of public spectral databases using ion entropy validated it as a good metric for ion information content in massive metabolomics data. The decoy generation method developed based on this concept outperformed current representative decoy strategies in metabolomics and achieved the best FDR estimation performance. We analyzed 47 public metabolomics datasets using the constructed workflow to provide instructive suggestions. Finally, we present MetaPhoenix, a tool equipped with a well-constructed FDR estimation workflow that facilitates the development of accurate FDR-controlled analysis in the metabolomics field.
Publisher
Cold Spring Harbor Laboratory