Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study-Reference-Cited by-同舟云学术

Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study

Published:2022-07-21 Issue:7 Volume:10 Page:e33678
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Baker William^ORCID,Colditz Jason B^ORCID,Dobbs Page D^ORCID,Mai Huy^ORCID,Visweswaran Shyam^ORCID,Zhan Justin^ORCID,Primack Brian A^ORCID

Abstract

Background Twitter provides a valuable platform for the surveillance and monitoring of public health topics; however, manually categorizing large quantities of Twitter data is labor intensive and presents barriers to identify major trends and sentiments. Additionally, while machine and deep learning approaches have been proposed with high accuracy, they require large, annotated data sets. Public pretrained deep learning classification models, such as BERTweet, produce higher-quality models while using smaller annotated training sets. Objective This study aims to derive and evaluate a pretrained deep learning model based on BERTweet that can identify tweets relevant to vaping, tweets (related to vaping) of commercial nature, and tweets with provape sentiment. Additionally, the performance of the BERTweet classifier will be compared against a long short-term memory (LSTM) model to show the improvements a pretrained model has over traditional deep learning approaches. Methods Twitter data were collected from August to October 2019 using vaping-related search terms. From this set, a random subsample of 2401 English tweets was manually annotated for relevance (vaping related or not), commercial nature (commercial or not), and sentiment (positive, negative, or neutral). Using the annotated data, 3 separate classifiers were built using BERTweet with the default parameters defined by the Simple Transformer application programming interface (API). Each model was trained for 20 iterations and evaluated with a random split of the annotated tweets, reserving 10% (n=165) of tweets for evaluations. Results The relevance, commercial, and sentiment classifiers achieved an area under the receiver operating characteristic curve (AUROC) of 94.5%, 99.3%, and 81.7%, respectively. Additionally, the weighted F1 scores of each were 97.6%, 99.0%, and 86.1%, respectively. We found that BERTweet outperformed the LSTM model in the classification of all categories. Conclusions Large, open-source deep learning classifiers, such as BERTweet, can provide researchers the ability to reliably determine if tweets are relevant to vaping; include commercial content; and include positive, negative, or neutral content about vaping with a higher accuracy than traditional natural language processing deep learning models. Such enhancement to the utilization of Twitter data can allow for faster exploration and dissemination of time-sensitive data than traditional methodologies (eg, surveys, polling research).

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference23 articles.

1. YingL10 Twitter Statistics Every Marketer Should Know in 2021Infographic2021-04-16https://www.oberlo.com/blog/twitter-statistics

2. BakerWUsing Large Pre-Trained Language Models to Track Emotions of Cancer Patients on TwitterComputer Science and Computer Engineering Undergraduate Honors Theses2022-05-24https://scholarworks.uark.edu/csceuht/92/

3. Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study

4. Deep learning

5. Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying E-cigarette Content on TikTok: Using a BERTopic Modeling Approach;Nicotine and Tobacco Research;2024-07-13

2. Predicting Themes of Tweets on Earthquakes in Turkey & Syria for Real-Time Classification;2023 16th International Conference on Developments in eSystems Engineering (DeSE);2023-12-18

3. Unravelling the Impact of Generative Artificial Intelligence (GAI) in Industrial Applications: A Review of Scientific and Grey Literature;Global Journal of Flexible Systems Management;2023-09-28

4. A Comprehensive Review on Transformers Models For Text Classification;2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC);2023-09-27

5. Twitter Sentiment About the US Federal Tobacco 21 Law: Mixed Methods Analysis;JMIR Formative Research;2023-08-31