Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect-Reference-Cited by-同舟云学术

Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect

Published:2023-09-15 Issue:3 Volume:7 Page:151
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Alostad Hana¹^ORCID,Dawiek Shoug¹,Davulcu Hasan²

Affiliation:

1. Computer Science Department, Gulf University for Science and Technology, Mubarak Al-Abdullah 32093, Kuwait

2. Computer Science Department, School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA

Abstract

The Kuwaiti dialect is a particular dialect of Arabic spoken in Kuwait; it differs significantly from standard Arabic and the dialects of neighboring countries in the same region. Few research papers with a focus on the Kuwaiti dialect have been published in the field of NLP. In this study, we created Kuwaiti dialect language resources using Q8VaxStance, a vaccine stance labeling system for a large dataset of tweets. This dataset fills this gap and provides a valuable resource for researchers studying vaccine hesitancy in Kuwait. Furthermore, it contributes to the Arabic natural language processing field by providing a dataset for developing and evaluating machine learning models for stance detection in the Kuwaiti dialect. The proposed vaccine stance labeling system combines the benefits of weak supervised learning and zero-shot learning; for this purpose, we implemented 52 experiments on 42,815 unlabeled tweets extracted between December 2020 and July 2022. The results of the experiments show that using keyword detection in conjunction with zero-shot model labeling functions is significantly better than using only keyword detection labeling functions or just zero-shot model labeling functions. Furthermore, for the total number of generated labels, the difference between using the Arabic language in both the labels and prompt or a mix of Arabic labels and an English prompt is statistically significant, indicating that it generates more labels than when using English in both the labels and prompt. The best accuracy achieved in our experiments in terms of the Macro-F1 values was found when using keyword and hashtag detection labeling functions in conjunction with zero-shot model labeling functions, specifically in experiments KHZSLF-EE4 and KHZSLF-EA1, with values of 0.83 and 0.83, respectively. Experiment KHZSLF-EE4 was able to label 42,270 tweets, while experiment KHZSLF-EA1 was able to label 42,764 tweets. Finally, the average value of annotation agreement between the generated labels and human labels ranges between 0.61 and 0.64, which is considered a good level of agreement.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/7/3/151/pdf

Reference36 articles.

1. COVID-19 vaccine hesitancy among the public in Kuwait: A cross-sectional survey;Alibrahim;Int. J. Environ. Res. Public Health,2021

2. Sallam, M., Dababseh, D., Eid, H., Al-Mahzoum, K., Al-Haidar, A., Taim, D., Yaseen, A., Ababneh, N.A., Bakri, F.G., and Mahafzah, A. (2021). High Rates of COVID-19 Vaccine Hesitancy and Its Association with Conspiracy Beliefs: A Study in Jordan and Kuwait among Other Arab Countries. Vaccines, 9.

3. Determinants of hesitancy towards COVID-19 vaccines in State of Kuwait: An exploratory internet-based survey;Ramadan;Risk Manag. Healthc. Policy,2021

4. Social media and attitudes towards a COVID-19 vaccination: A systematic review of the literature;Cascini;eClinicalMedicine,2022

5. Greyling, T., and Rossouw, S. (2022). Positive attitudes towards COVID-19 vaccines: A cross-country analysis. PLoS ONE, 17.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Quantifying Variations in Controversial Discussions within Kuwaiti Social Networks;Big Data and Cognitive Computing;2024-06-04

2. Bridging the Kuwaiti Dialect Gap in Natural Language Processing;IEEE Access;2024