Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid-Reference-Cited by-同舟云学术

Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid

Published:2021-05-03 Issue:5 Volume:23 Page:e26616
ISSN:1438-8871
Container-title:Journal of Medical Internet Research
language:en
Short-container-title:J Med Internet Res

Author:

Yang Yuan-Chi^ORCID,Al-Garadi Mohammed Ali^ORCID,Bremer Whitney^ORCID,Zhu Jane M^ORCID,Grande David^ORCID,Sarker Abeed^ORCID

Abstract

Background The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity of content in social media chatter. Objective This study aims to develop and evaluate an automatic system involving natural language processing and machine learning to automatically characterize user-posted Twitter data about health services using Medicaid, the single largest source of health coverage in the United States, as an example. Methods We collected data from Twitter in two ways: via the public streaming application programming interface using Medicaid-related keywords (Corpus 1) and by using the website’s search option for tweets mentioning agency-specific handles (Corpus 2). We manually labeled a sample of tweets in 5 predetermined categories or other and artificially increased the number of training posts from specific low-frequency categories. Using the manually labeled data, we trained and evaluated several supervised learning algorithms, including support vector machine, random forest (RF), naïve Bayes, shallow neural network (NN), k-nearest neighbor, bidirectional long short-term memory, and bidirectional encoder representations from transformers (BERT). We then applied the best-performing classifier to the collected tweets for postclassification analyses to assess the utility of our methods. Results We manually annotated 11,379 tweets (Corpus 1: 9179; Corpus 2: 2200) and used 7930 (69.7%) for training, 1449 (12.7%) for validation, and 2000 (17.6%) for testing. A classifier based on BERT obtained the highest accuracies (81.7%, Corpus 1; 80.7%, Corpus 2) and F1 scores on consumer feedback (0.58, Corpus 1; 0.90, Corpus 2), outperforming the second best classifiers in terms of accuracy (74.6%, RF on Corpus 1; 69.4%, RF on Corpus 2) and F1 score on consumer feedback (0.44, NN on Corpus 1; 0.82, RF on Corpus 2). Postclassification analyses revealed differing intercorpora distributions of tweet categories, with political (400778/628411, 63.78%) and consumer feedback (15073/27337, 55.14%) tweets being the most frequent for Corpus 1 and Corpus 2, respectively. Conclusions The broad and variable content of Medicaid-related tweets necessitates automatic categorization to identify topic-relevant posts. Our proposed system presents a feasible solution for automatic categorization and can be deployed and generalized for health service programs other than Medicaid. Annotated data and methods are available for future studies.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference49 articles.

1. ChenPSWuSYoonJThe impact of online recommendations and consumer feedback on salesProceedings of the International Conference on Information Systems, ICIS 20042004International Conference on Information Systems, ICIS 2004December 12-15, 2004Washington, DC, USA

2. Research Note: What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com

3. Mining and summarizing customer reviews

4. A Novel Data-Mining Approach Leveraging Social Media to Monitor Consumer Opinion of Sitagliptin

5. Evaluating the Wisdom of Strangers: The Perceived Credibility of Online Consumer Reviews on Yelp

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Breaking New Ground "Medical Education" on Twitter: A Social Network Analysis Retrospective (Preprint);2023-10-21

2. Examining Natural Language Processing Techniques in the Education and Healthcare Fields;International Journal of Engineering and Advanced Technology;2022-12-30

3. AI-Based Interactive Agent for Health Care Using NLP and Deep Learning;Information and Communication Technology for Competitive Strategies (ICTCS 2021);2022-06-23

4. Automatic gender detection in Twitter profiles for health-related cohort studies;JAMIA Open;2021-04-01