Abstract
BackgroundPatients use social media as an alternative information source, where they share information and provide social support. Although large amounts of health-related data are posted on Twitter and other social networking platforms each day, research using social media data to understand chronic conditions and patients’ lifestyles is limited.ObjectiveIn this study, we contributed to closing this gap by providing a framework for identifying patients with inflammatory bowel disease (IBD) on Twitter and learning from their personal experiences. We enabled the analysis of patients’ tweets by building a classifier of Twitter users that distinguishes patients from other entities. This study aimed to uncover the potential of using Twitter data to promote the well-being of patients with IBD by relying on the wisdom of the crowd to identify healthy lifestyles. We sought to leverage posts describing patients’ daily activities and their influence on their well-being to characterize lifestyle-related treatments.MethodsIn the first stage of the study, a machine learning method combining social network analysis and natural language processing was used to automatically classify users as patients or not. We considered 3 types of features: the user’s behavior on Twitter, the content of the user’s tweets, and the social structure of the user’s network. We compared the performances of several classification algorithms within 2 classification approaches. One classified each tweet and deduced the user’s class from their tweet-level classification. The other aggregated tweet-level features to user-level features and classified the users themselves. Different classification algorithms were examined and compared using 4 measures: precision, recall, F1 score, and the area under the receiver operating characteristic curve. In the second stage, a classifier from the first stage was used to collect patients' tweets describing the different lifestyles patients adopt to deal with their disease. Using IBM Watson Service for entity sentiment analysis, we calculated the average sentiment of 420 lifestyle-related words that patients with IBD use when describing their daily routine.ResultsBoth classification approaches showed promising results. Although the precision rates were slightly higher for the tweet-level approach, the recall and area under the receiver operating characteristic curve of the user-level approach were significantly better. Sentiment analysis of tweets written by patients with IBD identified frequently mentioned lifestyles and their influence on patients’ well-being. The findings reinforced what is known about suitable nutrition for IBD as several foods known to cause inflammation were pointed out in negative sentiment, whereas relaxing activities and anti-inflammatory foods surfaced in a positive context.ConclusionsThis study suggests a pipeline for identifying patients with IBD on Twitter and collecting their tweets to analyze the experimental knowledge they share. These methods can be adapted to other diseases and enhance medical research on chronic conditions.