KGML-ag: a modeling framework of knowledge-guided machine learning to simulate agroecosystems: a case study of estimating N<sub>2</sub>O emission using data from mesocosm experiments

Author:

Liu Licheng,Xu Shaoming,Tang Jinyun,Guan Kaiyu,Griffis Timothy J.,Erickson Matthew D.,Frie Alexander L.,Jia Xiaowei,Kim TaegonORCID,Miller Lee T.,Peng Bin,Wu Shaowei,Yang Yufeng,Zhou Wang,Kumar Vipin,Jin Zhenong

Abstract

Abstract. Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gas (GHG) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human activities. Existing approaches such as process-based (PB) models have well-known limitations due to insufficient representations of the processes or uncertainties of model parameters, and due to leverage recent advances in machine learning (ML) a new method is needed to unlock the “black box” to overcome its limitations such as low interpretability, out-of-sample failure and massive data demand. In this study, we developed a first-of-its-kind knowledge-guided machine learning model for agroecosystems (KGML-ag) by incorporating biogeophysical and chemical domain knowledge from an advanced PB model, ecosys, and tested it by comparing simulating daily N2O fluxes with real observed data from mesocosm experiments. The gated recurrent unit (GRU) was used as the basis to build the model structure. To optimize the model performance, we have investigated a range of ideas, including (1) using initial values of intermediate variables (IMVs) instead of time series as model input to reduce data demand; (2) building hierarchical structures to explicitly estimate IMVs for further N2O prediction; (3) using multi-task learning to balance the simultaneous training on multiple variables; and (4) pre-training with millions of synthetic data generated from ecosys and fine-tuning with mesocosm observations. Six other pure ML models were developed using the same mesocosm data to serve as the benchmark for the KGML-ag model. Results show that KGML-ag did an excellent job in reproducing the mesocosm N2O fluxes (overall r2=0.81, and RMSE=3.6 mgNm-2d-1 from cross validation). Importantly, KGML-ag always outperforms the PB model and ML models in predicting N2O fluxes, especially for complex temporal dynamics and emission peaks. Besides, KGML-ag goes beyond the pure ML models by providing more interpretable predictions as well as pinpointing desired new knowledge and data to further empower the current KGML-ag. We believe the KGML-ag development in this study will stimulate a new body of research on interpretable ML for biogeochemistry and other related geoscience processes.

Funder

Advanced Research Projects Agency - Energy

National Science Foundation

Publisher

Copernicus GmbH

Subject

General Medicine

Cited by 19 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3