Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter-Reference-Cited by-同舟云学术

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter

Published:2023-10-16 Issue:1 Volume:23 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Boligarla Srikanth,Laison Elda Kokoè Elolo,Li Jiaxin,Mahadevan Raja,Ng Austen,Lin Yangming,Thioub Mamadou Yamar,Huang Bruce,Ibrahim Mohamed Hamza,Nasri Bouchra

Abstract

Abstract Background Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than

$$90\%$$

90 % of all vector-borne diseases in North America. Objective In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. Methods The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. Results Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of

$$90\%$$

90 % . There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. Conclusions We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.

Funder

Natural Sciences and Engineering Research Council of Canada

Fonds de recherche du Québec – Santé

Fonds de recherche du Québec - Santé

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/s12911-023-02315-z.pdf

Reference76 articles.

1. Murphree Bacon R, Kugeler KJ, Mead PS. Surveillance for Lyme disease--United States, 1992-2006. 2008.

2. Kugeler KJ, Schwartz AM, Delorey MJ, Mead PS, Hinckley AF. Estimating the Frequency of Lyme Disease Diagnoses, United States, 2010–2018. Emerg Infect Dis. 2021;27(2):616–9. https://doi.org/10.3201/eid2702.202731. Accessed 17 Sep 2022.

3. Kumar D, Downs LP, Adegoke A, Machtinger E, Oggenfuss K, Ostfeld RS, et al. An Exploratory Study on the Microbiome of Northern and Southern Populations of Ixodes scapularis Ticks Predicts Changes and Unique Bacterial Interactions. Pathogens. 2022;11(2):130. https://doi.org/10.3390/pathogens11020130. Accessed 17 Sep 2022.

4. Marques AR, Strle F, Wormser GP. Comparison of Lyme Disease in the United States and Europe. Emerg Infect Dis. 2021;27(8):2017–2024. https://doi.org/10.3201/eid2708.204763. Accessed 17 Sep 2022.

5. Davidsson M. The Financial Implications of a Well-Hidden and Ignored Chronic Lyme Disease Pandemic. Healthcare. 2018;6(1):16. https://doi.org/10.3390/healthcare6010016. Accessed 17 Sep 2022.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Utilizing natural language processing and large language models in the diagnosis and prediction of infectious diseases: A systematic review;American Journal of Infection Control;2024-09

2. Utilizing Natural Language Processing and Large Language Models in the Diagnosis and Prediction of Infectious Diseases: A Systematic Review;2024-01-17

3. Clinical Text Classification in Healthcare: Leveraging BERT for NLP;2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI);2023-12-29