BACKGROUND
Small clinics are important in providing health care in local communities. Accurately predicting their closure would help manage health care resource allocation. There have been few studies on the prediction of clinic closure using machine learning techniques.
OBJECTIVE
This study aims to test the feasibility of predicting the closure of medical and dental clinics (MCs and DCs, respectively) and investigate important factors associated with their closure using machine running techniques.
METHODS
The units of analysis were MCs and DCs. This study used health insurance administrative data. The participants of this study ran and closed clinics between January 1, 2020, and December 31, 2021. Using all closed clinics, closed and run clinics were selected at a ratio of 1:2 based on the locality of study participants using the propensity matching score of logistic regression. This study used 23 and 19 variables to predict the closure of MCs and DCs, respectively. Key variables were extracted using permutation importance and the sequential feature selection technique. Finally, this study used 5 and 6 variables of MCs and DCs, respectively, for model learning. Furthermore, four machine learning techniques were used: (1) logistic regression, (2) support vector machine, (3) random forest (RF), and (4) Extreme Gradient Boost. This study evaluated the modeling accuracy using the area under curve (AUC) method and presented important factors critically affecting closures. This study used SAS (version 9.4; SAS Institute Inc) and Python (version 3.7.9; Python Software Foundation).
RESULTS
The best-fit model for the closure of MCs with cross-validation was the support vector machine (AUC 0.762, 95% CI 0.746-0.777; <i>P</i><.001) followed by RF (AUC 0.736, 95% CI 0.720-0.752; <i>P</i><.001). The best-fit model for DCs was Extreme Gradient Boost (AUC 0.700, 95% CI 0.675-0.725; <i>P</i><.001) followed by RF (AUC 0.687, 95% CI 0.661-0.712; <i>P</i><.001). The most significant factor associated with the closure of MCs was years of operation, followed by population growth, population, and percentage of medical specialties. In contrast, the main factor affecting the closure of DCs was the number of patients, followed by annual variation in the number of patients, year of operation, and percentage of dental specialists.
CONCLUSIONS
This study showed that machine running methods are useful tools for predicting the closure of small medical facilities with a moderate level of accuracy. Essential factors affecting medical facility closure also differed between MCs and DCs. Developing good models would prevent unnecessary medical facility closures at the national level.