Affiliation:
1. Division of Population Health Sciences, University of Alaska, Anchorage, AK, USA
2. College of Nursing & Public Health , Adelphi University, Garden City, NY, US
3. School of Nursing, Duke University, Durham, NC, US
Abstract
Background Electronic health systems contain large amounts of unstructured data (UD) which are often unanalyzed due to the time and costs involved. Unanalyzed data creates missed opportunities to improve health outcomes. Natural language processing (NLP) is the foundation of generative artificial intelligence (GAI), which is the basis for large language models, such as ChatGPT. NLP and GAI are machine learning methods that analyze large amounts of data in a short time at minimal cost. The ability of NLP to conduct qualitative analyses is increasing, yet the results can lack context and nuance in their findings, requiring human intervention. Methods Our study compared outcomes, time, and costs of a previously published qualitative study. Our approach partnered an NLP model and a qualitative researcher (NLP+). UD from behavioral health patients were analyzed using NLP and a Latent Dirichlet allocation to identify the topics using probability of word coherence scores. The topics were then analyzed by a qualitative researcher, translated into themes, and compared with the original findings. Results The NLP + method results aligned with the original, qualitative derived themes. Our model also identified two additional themes which were not originally detected. The NLP + method required 6 hours of labor, 3 minutes for transcription, and a transcription cost of $1.17. The original, qualitative researcher only method required more than 36 hours ($2,250) of time and $1,100 for transcription. Conclusions While natural language processing analyzes voluminous amounts of data in seconds, context and nuance in human language are regularly missed. Combining a qualitative researcher with NLP + could be deployed in many settings, reducing time and costs, and improving context. Until large language models are more prevalent, a human interaction can help translate the patient experience by contextualizing data rich in social determinant indicators which may otherwise go unanalyzed.