BACKGROUND
Patient navigation (PN) programs have demonstrated efficacy in improving health outcomes for marginalized populations across a range of clinical contexts by addressing barriers to health care, including social determinants of health (SDoHs). However, it can be challenging for navigators to identify SDoHs by asking patients directly because of many factors, including patients’ reluctance to disclose information, communication barriers, and the variable resources and experience levels of patient navigators. Navigators could benefit from strategies that augment their ability to gather SDoH data. Machine learning can be leveraged as one of these strategies to identify SDoH-related barriers. This could further improve health outcomes, particularly in underserved populations.
OBJECTIVE
In this formative study, we explored novel machine learning–based approaches to predict SDoHs in 2 Chicago area PN studies. In the first approach, we applied machine learning to data that include comments and interaction details between patients and navigators, whereas the second approach augmented patients’ demographic information. This paper presents the results of these experiments and provides recommendations for data collection and the application of machine learning techniques more generally to the problem of predicting SDoHs.
METHODS
We conducted 2 experiments to explore the feasibility of using machine learning to predict patients’ SDoHs using data collected from PN research. The machine learning algorithms were trained on data collected from 2 Chicago area PN studies. In the first experiment, we compared several machine learning algorithms (logistic regression, random forest, support vector machine, artificial neural network, and Gaussian naive Bayes) to predict SDoHs from both patient demographics and navigator’s encounter data over time. In the second experiment, we used multiclass classification with augmented information, such as transportation time to a hospital, to predict multiple SDoHs for each patient.
RESULTS
In the first experiment, the random forest classifier achieved the highest accuracy among the classifiers tested. The overall accuracy to predict SDoHs was 71.3%. In the second experiment, multiclass classification effectively predicted a few patients’ SDoHs based purely on demographic and augmented data. The best accuracy of these predictions overall was 73%. However, both experiments yielded high variability in individual SDoH predictions and correlations that become salient among SDoHs.
CONCLUSIONS
To our knowledge, this study is the first approach to applying PN encounter data and multiclass learning algorithms to predict SDoHs. The experiments discussed yielded valuable lessons, including the awareness of model limitations and bias, planning for standardization of data sources and measurement, and the need to identify and anticipate the intersectionality and clustering of SDoHs. Although our focus was on predicting patients’ SDoHs, machine learning can have a broad range of applications in the field of PN, from tailoring intervention delivery (eg, supporting PN decision-making) to informing resource allocation for measurement, and PN supervision.