BACKGROUND
Social media platforms offer valuable insights into the patient’s experience, revealing organic conversations that reflect their immediate concerns and needs. Through active listening to lived experiences, we can identify unmet needs and discover real-world challenges patients and caregivers face.
OBJECTIVE
This study aimed to develop a reusable framework to collect and analyze evolving social media data, capturing insights into the experiences of individuals with MDS and higher-risk myelodysplastic syndromes (HR-MDS) and their caregivers. The findings can inform the development of appropriate patient support interventions.
METHODS
We conducted an extensive Google search to identify social posts of interest using validated URLs and keywords on English-language websites relevant to MDS. The search covered the period from 1/1/2008 to 12/31/2022. We utilized scraping algorithms to collect, clean, and standardize pertinent information. To classify the perspective of each experience as either that of a patient or caregiver, we employed classification algorithms. This involved contextualizing and summarizing all user posts, followed by decision tree tagging to assign them to the patient or caregiver category. Advanced algorithms were employed to analyze the semantic and temporal structure of the data. Patients or caregivers were categorized as HR-MDS based on contextual mentions of high-risk in their posts or specific factors aligned with NCCN guidelines (e.g., blast percentage, transplantation, use of high-intensity chemotherapy or hypomethylating agents, or disease progression). Each post was assigned major themes and sentiments using a supervised classification machine learning model. Additionally, we employed a semi-supervised machine learning approach for the identification of latent themes in the data corpus.
RESULTS
The data collected comprised approximately 5.5 million words from 42,000 posts across 5,500 threads, involving about 4,000 users predominantly from the US, UK, and Canada. Out of the 1,249 users classified as HR-MDS, 588 (47%) were patients and 661 (53%) were caregivers. Among the HR-MDS users, the predominant sentiments included concern (78%), anxiety (60%), frustration (58%), fear (58%), and confusion (49%). Concern was the predominant sentiment expressed by caregivers (n=971, 59%), and anxiety by patients (n=752, 55%). Common concerns were specifically related to blood counts (n=677, 54%), burden of the disease (43%), QoL (36%), available treatment options and effectiveness (31%), and disease progression and prognosis (31%). Anxiety related to health and disease (48%), treatment (26%), and the diagnostic process (20%) were also common. The most common sentiments related to fear were the potential development of health complications and the manifestation of symptoms (19%) and the progression and exacerbation of MDS (19%). Additionally, confusion was pervasive among participants, with 295 (24%) individuals finding it challenging to comprehend the nuances of MDS and its diagnosis. A systematic analysis of the principal domains for which information is being sought about HR-MDS revealed frequent mention amongst users of acquiring information on therapeutic intervention (19%), and an interest in ongoing research associated with the disease (17%)
CONCLUSIONS
The application of sophisticated NLP techniques demonstrates promise in effectively identifying the emerging complex themes and sentiments experienced by HR-MDS users, thereby highlighting the unmet needs, barriers, and facilitators associated with the disease.
CLINICALTRIAL
NA