A unified Foot and Mouth Disease dataset for Uganda: evaluating machine learning predictive performance degradation under varying distributions-Reference-Cited by-同舟云学术

A unified Foot and Mouth Disease dataset for Uganda: evaluating machine learning predictive performance degradation under varying distributions

Published:2024-07-31 Issue: Volume:7 Page:
ISSN:2624-8212
Container-title:Frontiers in Artificial Intelligence
language:
Short-container-title:Front. Artif. Intell.

Author:

Kapalaga Geofrey,Kivunike Florence N.,Kerfua Susan,Jjingo Daudi,Biryomumaisho Savino,Rutaisire Justus,Ssajjakambwe Paul,Mugerwa Swidiq,Kiwala Yusuf

Abstract

In Uganda, the absence of a unified dataset for constructing machine learning models to predict Foot and Mouth Disease outbreaks hinders preparedness. Although machine learning models exhibit excellent predictive performance for Foot and Mouth Disease outbreaks under stationary conditions, they are susceptible to performance degradation in non-stationary environments. Rainfall and temperature are key factors influencing these outbreaks, and their variability due to climate change can significantly impact predictive performance. This study created a unified Foot and Mouth Disease dataset by integrating disparate sources and pre-processing data using mean imputation, duplicate removal, visualization, and merging techniques. To evaluate performance degradation, seven machine learning models were trained and assessed using metrics including accuracy, area under the receiver operating characteristic curve, recall, precision and F1-score. The dataset showed a significant class imbalance with more non-outbreaks than outbreaks, requiring data augmentation methods. Variability in rainfall and temperature impacted predictive performance, causing notable degradation. Random Forest with borderline SMOTE was the top-performing model in a stationary environment, achieving 92% accuracy, 0.97 area under the receiver operating characteristic curve, 0.94 recall, 0.90 precision, and 0.92 F1-score. However, under varying distributions, all models exhibited significant performance degradation, with random forest accuracy dropping to 46%, area under the receiver operating characteristic curve to 0.58, recall to 0.03, precision to 0.24, and F1-score to 0.06. This study underscores the creation of a unified Foot and Mouth Disease dataset for Uganda and reveals significant performance degradation in seven machine learning models under varying distributions. These findings highlight the need for new methods to address the impact of distribution variability on predictive performance.

Publisher

Frontiers Media SA

Reference104 articles.

1. Sero-prevalence, risk factors and distribution of foot and mouth disease in Ethiopia;Abdela;Acta Trop.,2017

2. Strong optimal classification trees;Aghaei,2021

3. Comparison of missing data imputation methods in time series forecasting;Ahn;Comput. Mater. Continua,2022

4. Picornaviruses;Alexandersen;Diseases of Swine, Chapter,2019

5. Patterns, risk factors and characteristics of reported and perceived foot-and-mouth disease (FMD) in Uganda;Ayebazibwe;Trop. Anim. Health Prod.,2010