Microbiome-based classification models for fresh produce safety and quality evaluation-Reference-Cited by-同舟云学术

Microbiome-based classification models for fresh produce safety and quality evaluation

Published:2024-04-02 Issue:4 Volume:12 Page:
ISSN:2165-0497
Container-title:Microbiology Spectrum
language:en
Short-container-title:Microbiol Spectr

Author:

Liao Chao¹^ORCID,Wang Luxin¹^ORCID,Quon Gerald²^ORCID

Affiliation:

1. Department of Food Science and Technology, University of California Davis, Davis, California, USA

2. Department of Molecular and Cellular Biology, University of California Davis, Davis, California, USA

Abstract

ABSTRACT Small sample sizes and loss of sequencing reads during the microbiome data preprocessing can limit the statistical power of differentiating fresh produce phenotypes and prevent the detection of important bacterial species associated with produce contamination or quality reduction. Here, we explored a machine learning-based k -mer hash analysis strategy to identify DNA signatures predictive of produce safety (PS) and produce quality (PQ) and compared it against the amplicon sequence variant (ASV) strategy that uses a typical denoising step and ASV-based taxonomy strategy. Random forest-based classifiers for PS and PQ using 7-mer hash data sets had significantly higher classification accuracy than those using the ASV data sets. We also demonstrated that the proposed combination of integrating multiple data sets and leveraging a 7-mer hash strategy leads to better classification performance for PS and PQ compared to the ASV method but presents lower PS classification accuracy compared to the feature-selected ASV-based taxonomy strategy. Due to the current limitation of generating taxonomy using the 7-mer hash strategy, the ASV-based taxonomy strategy with remarkably less computing time and memory usage is more efficient for PS and PQ classification and applicable for important taxa identification. Results generated from this study lay the foundation for future studies that wish and need to incorporate and/or compare different microbiome sequencing data sets for the application of machine learning in the area of microbial safety and quality of food. IMPORTANCE Identification of generalizable indicators for produce safety (PS) and produce quality (PQ) improves the detection of produce contamination and quality decline. However, effective sequencing read loss during microbiome data preprocessing and the limited sample size of individual studies restrain statistical power to identify important features contributing to differentiating PS and PQ phenotypes. We applied machine learning-based models using individual and integrated k -mer hash and amplicon sequence variant (ASV) data sets for PS and PQ classification and evaluated their classification performance and found that random forest (RF)-based models using integrated 7-mer hash data sets achieved significantly higher PS and PQ classification accuracy. Due to the limitation of taxonomic analysis for the 7-mer hash, we also developed RF-based models using feature-selected ASV-based taxonomic data sets, which performed better PS classification than those using the integrated 7-mer hash data set. The RF feature selection method identified 480 PS indicators and 263 PQ indicators with a positive contribution to the PS and PQ classification.

Funder

National Science Foundation

UC Davis Innovative Data Science Seed Funding Program Grant

Publisher

American Society for Microbiology

Link

https://journals.asm.org/doi/pdf/10.1128/spectrum.03448-23

Reference80 articles.

1. Emerging Perspectives on the Natural Microbiome of Fresh Produce Vegetables

2. Omics approaches in food safety: fulfilling the promise?

3. Characterization of the Bacterial Community Naturally Present on Commercially Grown Basil Leaves: Evaluation of Sample Preparation Prior to Culture-Independent Techniques

4. Shifts in spinach microbial communities after chlorine washing and storage at compliant and abusive temperatures

5. Culture dependent and independent analysis of bacterial communities associated with commercial salad leaf vegetables