Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study-Reference-Cited by-同舟云学术

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study

Published:2020-08-12 Issue:8 Volume:22 Page:e17478
ISSN:1438-8871
Container-title:Journal of Medical Internet Research
language:en
Short-container-title:J Med Internet Res

Author:

Visweswaran Shyam^ORCID,Colditz Jason B^ORCID,O’Halloran Patrick^ORCID,Han Na-Rae^ORCID,Taneja Sanya B^ORCID,Welling Joel^ORCID,Chu Kar-Hai^ORCID,Sidani Jaime E^ORCID,Primack Brian A^ORCID

Abstract

Background Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. Objective This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. Methods We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. Results LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. Conclusions We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference44 articles.

1. Balancing the Benefits and Harms of E-Cigarettes: A National Academies of Science, Engineering, and Medicine Report

2. Secondhand Exposure to Vapors From Electronic Cigarettes

3. Pulmonary Illness Related to E-Cigarette Use in Illinois and Wisconsin — Final Report

4. Vaping-associated Acute Lung Injury: A Case Series

5. The Adolescent Vaping Epidemic in the United States—How It Happened and Where We Go From Here

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of high-risk population of pneumoconiosis using deep learning segmentation of lung 3D images and radiomics texture analysis;Computer Methods and Programs in Biomedicine;2024-02

2. Automating Detection of Drug-Related Harms on Social Media: Machine Learning Framework;Journal of Medical Internet Research;2023-09-19

3. Users’ Reactions to Announced Vaccines Against COVID-19 Before Marketing in France: Analysis of Twitter Posts;Journal of Medical Internet Research;2023-04-24

4. Discerning conversational context in online health communities for personalized digital behavior change solutions using Pragmatics to Reveal Intent in Social Media (PRISM) framework;Journal of Biomedical Informatics;2023-04

5. A Review of Deep Learning Models for Twitter Sentiment Analysis: Challenges and Opportunities;IEEE Transactions on Computational Social Systems;2023