BACKGROUND
Inflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence. Online communities have become vital for communication among IBD patients, especially throughout the COVID-19 pandemic. However, these interactions remain largely underexplored.
OBJECTIVE
This study aims to analyze community posts from three of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pre-trained transformer model, and to validate the classification system's results via comparison to human scoring.
METHODS
We collected 53,333 posts and classified them using OpenAI's GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and COVID-19 mentions. Manual validation was performed on a subset of 397 posts to measure inter-rater agreement between human raters and the GPT-3.5 model.
RESULTS
Fleiss’ kappa and Gwet’s AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. Medication (n = 14,909) and Symptoms (n = 14,939) emerged as the most discussed topics. Most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily fell into the 20-29 (n = 2,392) and 30-39 (n = 859) age ranges. After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic.
CONCLUSIONS
Our study showcases the potential of generative pre-trained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further sub-analyses of our validated dataset or utilize OpenAI’s model to analyze social media data for other conditions, particularly those where patient experiences are challenging to collect.