Abstract
AbstractMicrobiome management research and applications rely on temporally-resolved measurements of community composition. Current technologies to assess community composition either make use of cultivation or sequencing of genomic material, which can become time consuming and/or laborious in case high-throughput measurements are required. Here, using data from a shrimp hatchery as an economically relevant case study, we combined 16S rRNA gene amplicon sequencing and flow cytometry data to develop a computational workflow that allows the prediction of taxon abundances based on flow cytometry measurements. The first stage of our pipeline consists of a classifier to predict the presence or absence of the taxon of interest, with yields an average accuracy of 88.13±4.78 % across the top 50 OTUs of our dataset. In the second stage, this classifier was combined with a regression model to predict the relative abundances of the taxon of interest, which yields an average R2 of 0.35±0.24 across the top 50 OTUs of our dataset. Application of the models on flow cytometry time series data showed that the generated models can predict the temporal dynamics of a large fraction of the investigated taxa. Using cell-sorting we validated that the model correctly associates taxa to regions in the cytometric fingerprint where they are detected using 16S rRNA gene amplicon sequencing. Finally, we applied the approach of our pipeline on two other datasets of microbial ecosystems. This pipeline represents an addition to the expanding toolbox for flow cytometry-based monitoring of bacterial communities and complements the current plating- and marker gene-based methods.ImportanceMonitoring of microbial community composition is crucial for both microbiome management research and applications. Existing technologies, such as plating and amplicon sequencing, can become laborious and expensive when high-throughput measurements are required. Over the recent years, flow cytometry-based measurements of community diversity have been shown to correlate well to those derived from 16S rRNA gene amplicon sequencing in several aquatic ecosystems, suggesting there is a link between the taxonomic community composition and phenotypic properties as derived through flow cytometry. Here, we further integrated 16S rRNA gene amplicon sequencing and flow cytometry survey data in order to construct models that enable the prediction of both the presence and the abundance of individual bacterial taxa in mixed communities using flow cytometric fingerprinting. The developed pipeline holds great potential to be integrated in routine monitoring schemes and early warning systems for biotechnological applications.
Publisher
Cold Spring Harbor Laboratory