Affiliation:
1. Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
2. Department of History and Archaeology, University of Patras, 26504 Patras, Greece
Abstract
In this work, we introduce an innovative Markov Chain Monte Carlo (MCMC) classifier, a synergistic combination of Bayesian machine learning and Apache Spark, highlighting the novel use of this methodology in the spectrum of big data management and environmental analysis. By employing a large dataset of air pollutant concentrations in Madrid from 2001 to 2018, we developed a Bayesian Logistic Regression model, capable of accurately classifying the Air Quality Index (AQI) as safe or hazardous. This mathematical formulation adeptly synthesizes prior beliefs and observed data into robust posterior distributions, enabling superior management of overfitting, enhancing the predictive accuracy, and demonstrating a scalable approach for large-scale data processing. Notably, the proposed model achieved a maximum accuracy of 87.91% and an exceptional recall value of 99.58% at a decision threshold of 0.505, reflecting its proficiency in accurately identifying true negatives and mitigating misclassification, even though it slightly underperformed in comparison to the traditional Frequentist Logistic Regression in terms of accuracy and the AUC score. Ultimately, this research underscores the efficacy of Bayesian machine learning for big data management and environmental analysis, while signifying the pivotal role of the first-ever MCMC Classifier and Apache Spark in dealing with the challenges posed by large datasets and high-dimensional data with broader implications not only in sectors such as statistics, mathematics, physics but also in practical, real-world applications.
Reference68 articles.
1. Sampling and analysis techniques for inorganic air pollutants in indoor air;Villanueva;Appl. Spectrosc. Rev.,2021
2. Martínez Torres, J., Pastor Pérez, J., Sancho Val, J., McNabola, A., Martínez Comesaña, M., and Gallagher, J. (2020). A Functional Data Analysis Approach for the Detection of Air Pollution Episodes and Outliers: A Case Study in Dublin, Ireland. Mathematics, 8.
3. Maglogiannis, I., Iliadis, L., Macintyre, J., and Cortez, P. (2022, January 17–20). Maximum Likelihood Estimators on MCMC Sampling Algorithms for Decision Making. Proceedings of the Artificial Intelligence Applications and Innovations, AIAI 2022 IFIP WG 12.5 International Workshops, Creta, Greece.
4. Wang, G., and Wang, T. (2022). Unbiased Multilevel Monte Carlo methods for intractable distributions: MLMC meets MCMC. arXiv.
5. Analysis of a non-Markovian queueing model: Bayesian statistics and MCMC methods;Braham;Monte Carlo Methods Appl.,2019
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献