Author:
Tran Van,Saad Tazmilur,Tesfaye Mehret,Walelign Sosina,Wordofa Moges,Abera Dessie,Desta Kassu,Tsegaye Aster,Ay Ahmet,Taye Bineyam
Abstract
AbstractBackgroundAlthough previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques.ObjectiveWe examinedH. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predictH. pyloriinfection status.MethodsWe applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms.ResultsThe XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%—a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples wereH. Pylorinegative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor forH. pyloriinfection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors forH. pyloriinfection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified moreH. pyloririsk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression.ConclusionThis study provides evidence that machine learning approaches are positioned to uncoverH. pyloriinfection risk factors and predictH. pyloriinfection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.
Publisher
Springer Science and Business Media LLC
Reference52 articles.
1. Miernyk KM, Bulkow LR, Gold BD, Bruce MG, Hurlburt DH, Griffin PM, et al. Prevalence of Helicobacter pylori among Alaskans: Factors associated with infection and comparison of urea breath test and anti-Helicobacter pylori IgG antibodies. Helicobacter. 2018;23(3): e12482.
2. Eshraghian A. Epidemiology of Helicobacter pylori infection among the healthy population in Iran and countries of the Eastern Mediterranean Region: A systematic review of prevalence and risk factors. World J Gastroenterol. 2014;20(46):17618–25.
3. Łaszewicz W, Iwańczak F, Iwańczak B, Annabhani A, Bała G, Bąk-Romaniszyn L, et al. Seroprevalence of Helicobacter pylori infection in Polish children and adults depending on socioeconomic status and living conditions. Adv Med Sci. 2014;59(1):147–50.
4. Mathewos B, Moges B, Dagnew M. Seroprevalence and trend of Helicobacter pylori infection in Gondar University Hospital among dyspeptic patients, Gondar, North West Ethiopia. BMC Res Notes. 2013;6:346.
5. Smith S, Jolaiya T, Fowora M, Palamides P, Ngoka F, Bamidele M, et al. Clinical and Socio- Demographic Risk Factors for Acquisition of Helicobacter pylori Infection in Nigeria. Asian Pac J Cancer Prev. 2018;19(7):1851–7.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献