MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets-Reference-Cited by-同舟云学术

MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets

Published:2023-06-16 Issue:6 Volume:19 Page:e1011163
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Potgieter Matthys G.^ORCID,Nel Andrew J. M.,Fortuin Suereta^ORCID,Garnett Shaun,Wendoh Jerome M.,Tabb David L.^ORCID,Mulder Nicola J.,Blackburn Jonathan M.^ORCID

Abstract

Background Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. Results We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. Conclusions By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

Funder

National Research Foundation

South African Tuberculosis Bioinformatics Initiative

South African Medical Research Council

Department of Science and Technology, South Africa

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference37 articles.

1. Global Change and the Soil Microbiome: A Human-Health Perspective;R. Ochoa-Hueso;Front Ecol Evol [Internet].,2017

2. Toward a Predictive Understanding of Earth’s Microbiomes to Address 21st Century Challenges;MJ Blaser;mBioInternet],2016

3. Leading Edge Review On the Dependency of Cellular Protein Levels on mRNA Abundance;Y Liu,2016

4. Navigating through metaproteomics data: A logbook of database searching;T Muth;Proteomics,2015

5. The impact of sequence database choice on metaproteomic results in gut microbiota studies;A Tanca;Microbiome,2016

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MOSCA 2.0: A bioinformatics framework for metagenomics, metatranscriptomics and metaproteomics data analysis and visualization;Molecular Ecology Resources;2024-08-04

2. An integrated metaproteomics workflow for studying host-microbe dynamics in bronchoalveolar lavage samples applied to cystic fibrosis disease;mSystems;2024-07-23

3. Microbial metaproteomics——From sample processing to data acquisition and analysis;Chinese Journal of Chromatography;2024-07-01

4. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease;mSphere;2024-06-25

5. NovoLign: metaproteomics by sequence alignment;2024-04-06