Abstract
AbstractThis paper presents models created for the Social Media Mining for Health 2023 shared task. Our team addressed the first task, classifying tweets that self-report Covid-19 diagnosis. Our approach involves a classification model that incorporates diverse textual augmentations and utilizes R-drop to augment data and mitigate overfitting, boosting model efficacy. Our leading model, enhanced with R-drop and augmentations like synonym substitution, reserved words, and back translations, outperforms the task mean and median scores. Our system achieves an impressive F1 score of 0.877 on the test set.
Publisher
Cold Spring Harbor Laboratory
Reference4 articles.
1. Klein AZ , Banda JM , Guo Y , Flores Amaro JI , Rodriguez-Esteban R , Sarker A , Schmidt AL , Xu D , and Gonzalez-Hernandez G. 2023. Overview of the eighth social media mining for health applications (smm4h) shared tasks at the amia 2023 annual symposium. In In Proceedings of the Eighth SMM4H Workshop and Shared Task; 2023.
2. Ari Z Klein , Shriya Kunatharaju , Karen O’Connor , and Graciela Gonzalez-Hernandez . 2023. Automatically identifying self-reports of covid-19 diagnosis on twitter: An annotated data set, deep neural network classifiers, and a large-scale cohort. Journal of Medical Internet Research, 25.
3. Martin Müller , Marcel Salathé , and Per E Kummervold . 2023. Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. Frontiers in Artificial Intelligence.
4. Lijun Wu , Juntao Li , Yue Wang , Qi Meng , Tao Qin , Wei Chen , Min Zhang , Tie-Yan Liu , et al. 2021. R-drop: Regularized dropout for neural networks. Advances in Neural Information Processing Systems, 34.