Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach (Preprint)

Author:

Dolatabadi ElhamORCID,Moyano DianaORCID,Bales MichaelORCID,Spasojevic SofijaORCID,Bhambhoria RohanORCID,Bhatti JunaidORCID,Debnath ShyamolimaORCID,Hoell NicholasORCID,Li XinORCID,Leng CelineORCID,Nanda SashaORCID,Saab JadORCID,Sahak EsmatORCID,Sie FannyORCID,Uppal SaraORCID,Vadlamudi Nirma KhatriORCID,Vladimirova AntoanetaORCID,Yakimovich ArturORCID,Yang XiaoxueORCID,Kocak Sedef AkinliORCID,Cheung Angela MORCID

Abstract

BACKGROUND

While scientific knowledge of post–COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians.

OBJECTIVE

In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline’s potential as a surveillance tool.

METHODS

We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries.

RESULTS

UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as <i>fatigue</i>), neuropsychiatric (such as <i>anxiety</i> and <i>brain fog</i>), and respiratory (such as <i>shortness of breath</i>). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as <i>infection</i> and <i>pain</i>. Regarding the co-occurring symptoms, the pair of <i>fatigue</i> and <i>headaches</i> was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada.

CONCLUSIONS

The outcome of our social media–derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient’s journey that can help health care providers anticipate future needs.

INTERNATIONAL REGISTERED REPORT

RR2-10.1101/2022.12.14.22283419

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3