Author:
Zhang Boyang,Lin Shili,Moraes Luis,Firkins Jeffrey,Hristov Alexander N.,Kebreab Ermias,Janssen Peter H.,Bannink André,Bayat Alireza R.,Crompton Les A.,Dijkstra Jan,Eugène Maguy A.,Kreuzer Michael,McGee Mark,Reynolds Christopher K.,Schwarm Angela,Yáñez-Ruiz David R.,Yu Zhongtang
Abstract
AbstractMethane (CH4) emissions from ruminants are of a significant environmental concern, necessitating accurate prediction for emission inventories. Existing models rely solely on dietary and host animal-related data, ignoring the predicting power of rumen microbiota, the source of CH4. To address this limitation, we developed novel CH4 prediction models incorporating rumen microbes as predictors, alongside animal- and feed-related predictors using four statistical/machine learning (ML) methods. These include random forest combined with boosting (RF-B), least absolute shrinkage and selection operator (LASSO), generalized linear mixed model with LASSO (glmmLasso), and smoothly clipped absolute deviation (SCAD) implemented on linear mixed models. With a sheep dataset (218 observations) of both animal data and rumen microbiota data (relative sequence abundance of 330 genera of rumen bacteria, archaea, protozoa, and fungi), we developed linear mixed models to predict CH4 production (g CH4/animal·d, ANIM-B models) and CH4 yield (g CH4/kg of dry matter intake, DMI-B models). We also developed models solely based on animal-related data. Prediction performance was evaluated 200 times with random data splits, while fitting performance was assessed without data splitting. The inclusion of microbial predictors improved the models, as indicated by decreased root mean square prediction error (RMSPE) and mean absolute error (MAE), and increased Lin’s concordance correlation coefficient (CCC). Both glmmLasso and SCAD reduced the Akaike information criterion (AIC) and Bayesian information criterion (BIC) for both the ANIM-B and the DMI-B models, while the other two ML methods had mixed outcomes. By balancing prediction performance and fitting performance, we obtained one ANIM-B model (containing 10 genera of bacteria and 3 animal data) fitted using glmmLasso and one DMI-B model (5 genera of bacteria and 1 animal datum) fitted using SCAD. This study highlights the importance of incorporating rumen microbiota data in CH4 prediction models to enhance accuracy and robustness. Additionally, ML methods facilitate the selection of microbial predictors from high-dimensional metataxonomic data of the rumen microbiota without overfitting. Moreover, the identified microbial predictors can serve as biomarkers of CH4 emissions from sheep, providing valuable insights for future research and mitigation strategies.
Funder
National Institute of Food and Agriculture
Publisher
Springer Science and Business Media LLC