Tracking Major Sources of Water Contamination Using Machine Learning-Reference-Cited by-同舟云学术

Tracking Major Sources of Water Contamination Using Machine Learning

Published:2021-01-20 Issue: Volume:11 Page:
ISSN:1664-302X
Container-title:Frontiers in Microbiology
language:
Short-container-title:Front. Microbiol.

Author:

Wu Jianyong,Song Conghe,Dubinsky Eric A.,Stewart Jill R.

Abstract

Current microbial source tracking techniques that rely on grab samples analyzed by individual endpoint assays are inadequate to explain microbial sources across space and time. Modeling and predicting host sources of microbial contamination could add a useful tool for watershed management. In this study, we tested and evaluated machine learning models to predict the major sources of microbial contamination in a watershed. We examined the relationship between microbial sources, land cover, weather, and hydrologic variables in a watershed in Northern California, United States. Six models, including K-nearest neighbors (KNN), Naïve Bayes, Support vector machine (SVM), simple neural network (NN), Random Forest, and XGBoost, were built to predict major microbial sources using land cover, weather and hydrologic variables. The results showed that these models successfully predicted microbial sources classified into two categories (human and non-human), with the average accuracy ranging from 69% (Naïve Bayes) to 88% (XGBoost). The area under curve (AUC) of the receiver operating characteristic (ROC) illustrated XGBoost had the best performance (average AUC = 0.88), followed by Random Forest (average AUC = 0.84), and KNN (average AUC = 0.74). The importance index obtained from Random Forest indicated that precipitation and temperature were the two most important factors to predict the dominant microbial source. These results suggest that machine learning models, particularly XGBoost, can predict the dominant sources of microbial contamination based on the relationship of microbial contaminants with daily weather and land cover, providing a powerful tool to understand microbial sources in water.

Publisher

Frontiers Media SA

Subject

Microbiology (medical),Microbiology

Reference43 articles.

1. An introduction to kernel and nearest-neighbor nonparametric regression.;Altman;Am. Statist.,1992

2. Statistical approaches for modeling in microbial source tracking;Belanche;Microbial Source Tracking: Methods, Applications, and Case Studies,2011

3. Machine learning methods for microbial source tracking.;Belanche-Muñoz;Environ. Model. Softw.,2008

4. Decadal and shorter period variability of surf zone water quality at Huntington Beach, California.;Boehm;Environ. Sci. Technol.,2002

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models;Ecological Informatics;2024-09

2. Development of a logic regression-based approach for the discovery of host- and niche-informative biomarkers in Escherichia coli and their application for microbial source tracking;Applied and Environmental Microbiology;2024-07-24

3. Prediction of microbiological non-compliances using a Boosted Regression Trees model: application on the drinking water distribution system of a whole country;Water Supply;2024-03-20

4. Applications of XGBoost in water resources engineering: A systematic literature review (Dec 2018–May 2023);Environmental Modelling & Software;2024-03

5. Predicting the concentration range of trace organic contaminants in recycled water using supervised classification;Journal of Water Process Engineering;2024-02