A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients

Author:

Gohari Kimiya,Kazemnejad Anoshirvan,Mohammadi Marjan,Eskandari Farzad,Saberi Samaneh,Esmaieli Maryam,Sheidaei Ali

Abstract

Abstract Background The Naive Bayes (NB) classifier is a powerful supervised algorithm widely used in Machine Learning (ML). However, its effectiveness relies on a strict assumption of conditional independence, which is often violated in real-world scenarios. To address this limitation, various studies have explored extensions of NB that tackle the issue of non-conditional independence in the data. These approaches can be broadly categorized into two main categories: feature selection and structure expansion. In this particular study, we propose a novel approach to enhancing NB by introducing a latent variable as the parent of the attributes. We define this latent variable using a flexible technique called Bayesian Latent Class Analysis (BLCA). As a result, our final model combines the strengths of NB and BLCA, giving rise to what we refer to as NB-BLCA. By incorporating the latent variable, we aim to capture complex dependencies among the attributes and improve the overall performance of the classifier. Methods Both Expectation-Maximization (EM) algorithm and the Gibbs sampling approach were offered for parameter learning. A simulation study was conducted to evaluate the classification of the model in comparison with the ordinary NB model. In addition, real-world data related to 976 Gastric Cancer (GC) and 1189 Non-ulcer dyspepsia (NUD) patients was used to show the model's performance in an actual application. The validity of models was evaluated using the 10-fold cross-validation. Results The presented model was superior to ordinary NB in all the simulation scenarios according to higher classification sensitivity and specificity in test data. The NB-BLCA model using Gibbs sampling accuracy was 87.77 (95% CI: 84.87-90.29). This index was estimated at 77.22 (95% CI: 73.64-80.53) and 74.71 (95% CI: 71.02-78.15) for the NB-BLCA model using the EM algorithm and ordinary NB classifier, respectively. Conclusions When considering the modification of the NB classifier, incorporating a latent component into the model offers numerous advantages, particularly within medical and health-related contexts. By doing so, the researchers can bypass the extensive search algorithm and structure learning required in the local learning and structure extension approach. The inclusion of latent class variables allows for the integration of all attributes during model construction. Consequently, the NB-BLCA model serves as a suitable alternative to conventional NB classifiers when the assumption of independence is violated, especially in domains pertaining to health and medicine.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Epidemiology

Reference55 articles.

1. Langarizadeh M, Moghbeli F. Applying naive bayesian networks to disease prediction: a systematic review. Acta Informatica Medica. 2016;24(5):364.

2. Salma A, Silfianti W. Sentiment analysis of user reviews on covid-19 information applications using naive bayes classifier, Support Vector Machine, and K-Nearest Neighbor. Int Res J Adv Eng Sci. 2021;6(4):158–62

3. Bishop CM, Nasrabadi NM. Pattern recognition and machine learning. Springer; 2006;4(4):738–838.

4. Kelly A, Johnson MA: Investigating the statistical assumptions of Naïve Bayes classifiers. In: 2021 55th annual conference on information sciences and systems (CISS): 2021: IEEE; 2021: 1-6.

5. Rabe-Hesketh S, Skrondal A. Classical latent variable models for medical research. Stat Methods Med Res. 2008;17(1):5–32.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Processing of computer algorithms for traceability identification in scientific research;Revista de Gestão e Secretariado;2024-07-10

2. Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles;Lecture Notes on Data Engineering and Communications Technologies;2024

3. Sentiment Analysis of Electric Vehicles in Indonesia Using Support Vector Machine and Naïve Bayes;2023 3rd International Conference on Smart Cities, Automation & Intelligent Computing Systems (ICON-SONICS);2023-12-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3