Affiliation:
1. Department of Hematology/Oncology, Fox Chase Cancer Center, Philadelphia, Pennsylvania
2. Cancer Prevention and Control Research Program, Fox Chase Cancer Center, Philadelphia, Pennsylvania
3. Department of Biostatistics, Fox Chase Cancer Center, Philadelphia, Pennsylvania
4. Department of Surgical Oncology, Fox Chase Cancer Center, Philadelphia, Pennsylvania
Abstract
ImportanceDelays in starting cancer treatment disproportionately affect vulnerable populations and can influence patients’ experience and outcomes. Machine learning algorithms incorporating electronic health record (EHR) data and neighborhood-level social determinants of health (SDOH) measures may identify at-risk patients.ObjectiveTo develop and validate a machine learning model for estimating the probability of a treatment delay using multilevel data sources.Design, Setting, and ParticipantsThis cohort study evaluated 4 different machine learning approaches for estimating the likelihood of a treatment delay greater than 60 days (group least absolute shrinkage and selection operator [LASSO], bayesian additive regression tree, gradient boosting, and random forest). Criteria for selecting between approaches were discrimination, calibration, and interpretability/simplicity. The multilevel data set included clinical, demographic, and neighborhood-level census data derived from the EHR, cancer registry, and American Community Survey. Patients with invasive breast, lung, colorectal, bladder, or kidney cancer diagnosed from 2013 to 2019 and treated at a comprehensive cancer center were included. Data analysis was performed from January 2022 to June 2023.ExposuresVariables included demographics, cancer characteristics, comorbidities, laboratory values, imaging orders, and neighborhood variables.Main Outcomes and MeasuresThe outcome estimated by machine learning models was likelihood of a delay greater than 60 days between cancer diagnosis and treatment initiation. The primary metric used to evaluate model performance was area under the receiver operating characteristic curve (AUC-ROC).ResultsA total of 6409 patients were included (mean [SD] age, 62.8 [12.5] years; 4321 [67.4%] female; 2576 [40.2%] with breast cancer, 1738 [27.1%] with lung cancer, and 1059 [16.5%] with kidney cancer). A total of 1621 (25.3%) experienced a delay greater than 60 days. The selected group LASSO model had an AUC-ROC of 0.713 (95% CI, 0.679-0.745). Lower likelihood of delay was seen with diagnosis at the treating institution; first malignant neoplasm; Asian or Pacific Islander or White race; private insurance; and lacking comorbidities. Greater likelihood of delay was seen at the extremes of neighborhood deprivation. Model performance (AUC-ROC) was lower in Black patients, patients with race and ethnicity other than non-Hispanic White, and those living in the most disadvantaged neighborhoods. Though the model selected neighborhood SDOH variables as contributing variables, performance was similar when fit with and without these variables.Conclusions and RelevanceIn this cohort study, a machine learning model incorporating EHR and SDOH data was able to estimate the likelihood of delays in starting cancer therapy. Future work should focus on additional ways to incorporate SDOH data to improve model performance, particularly in vulnerable populations.
Publisher
American Medical Association (AMA)
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献