A Classification-Based Machine Learning Approach to the Prediction of Cyanobacterial Blooms in Chilgok Weir, South Korea-Reference-Cited by-同舟云学术

A Classification-Based Machine Learning Approach to the Prediction of Cyanobacterial Blooms in Chilgok Weir, South Korea

Published:2022-02-11 Issue:4 Volume:14 Page:542
ISSN:2073-4441
Container-title:Water
language:en
Short-container-title:Water

Author:

Kim Jongchan,Jonoski Andreja^ORCID,Solomatine Dimitri P.^ORCID

Abstract

Cyanobacterial blooms appear by complex causes such as water quality, climate, and hydrological factors. This study aims to present the machine learning models to predict occurrences of these complicated cyanobacterial blooms efficiently and effectively. The dataset was classified into groups consisting of two, three, or four classes based on cyanobacterial cell density after a week, which was used as the target variable. We developed 96 machine learning models for Chilgok weir using four classification algorithms: k-Nearest Neighbor, Decision Tree, Logistic Regression, and Support Vector Machine. In the modeling methodology, we first selected input features by applying ANOVA (Analysis of Variance) and solving a multi-collinearity problem as a process of feature selection, which is a method of removing irrelevant features to a target variable. Next, we adopted an oversampling method to resolve the problem of having an imbalanced dataset. Consequently, the best performance was achieved for models using datasets divided into two classes, with an accuracy of 80% or more. Comparatively, we confirmed low accuracy of approximately 60% for models using datasets divided into three classes. Moreover, while we produced models with overall high accuracy when using logCyano (logarithm of cyanobacterial cell density) as a feature, several models in combination with air temperature and NO3-N (nitrate nitrogen) using two classes also demonstrated more than 80% accuracy. It can be concluded that it is possible to develop very accurate classification-based machine learning models with two features related to cyanobacterial blooms. This proved that we could make efficient and effective models with a low number of inputs.

Publisher

MDPI AG

Subject

Water Science and Technology,Aquatic Science,Geography, Planning and Development,Biochemistry

Link

https://www.mdpi.com/2073-4441/14/4/542/pdf

Reference51 articles.

1. Lake warming intensifies the seasonal pattern of internal nutrient cycling in the eutrophic lake and potential impacts on algal blooms;Tong;Water Res.,2021

2. Deciphering the key factors determining spatio-temporal heterogeneity of cyanobacterial bloom dynamics in the Nakdong River with consecutive large weirs

3. Warmer climates boost cyanobacterial dominance in shallow lakes

4. Climate change: a catalyst for global expansion of harmful cyanobacterial blooms

5. Throwing Fuel on the Fire: Synergistic Effects of Excessive Nitrogen Inputs and Global Warming on Harmful Algal Blooms

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cyanobacteria Harmful Algae Blooms: Causes, Impacts, and Risk Management;Water, Air, & Soil Pollution;2024-01

2. Machine Learning Approaches Reveal Future Harmful Algae Blooms in Jeju, Korea;2023-11-13

3. Machine learning-based design and monitoring of algae blooms: Recent trends and future perspectives – A short review;Critical Reviews in Environmental Science and Technology;2023-09-07

4. Decision Support Framework for Optimal Reservoir Operation to Mitigate Cyanobacterial Blooms in Rivers;Sustainability;2023-08-24

5. Optimal reaction pathways of carbon dioxide hydrogenation using P-graph attainable region technique (PART);Discover Chemical Engineering;2023-08-14