Abstract
The development of gallstones is closely related to diet. As the prevalence of gallstones increases, it is crucial to identify risk factors to predict the development of gallstones. Data from the 2017–2020 U.S. National Health and Nutrition Examination Survey (NHANES) were analyzed, and 5,150 participants were randomly divided into a training set and a validation set in a 7:3 ratio. Variables were screened via Least absolute shrinkage and selection operator (LASSO) regression. Multilayer perceptron (MLP), support vector machines (SVM), K-nearest neighbor (KNN), eXtreme Gradient Boosting (XGBoost), decision tree (DT), logistic regression (LR), and random forest (RF) were used to construct the models. The performance of the model was evaluated through the area under the curve (AUC), receiver operating characteristic (ROC) curve, calibration curves and decision curve analysis (DCA). The random forest model was selected as the best model, and the variables in the model were ranked in order of importance. A machine learning model based on dietary intake has a better ability to predict the risk of gallstones and can be used to guide participants in the development of healthy eating patterns.