Author:
Athukoralage Dasun,Atapattu Thushari,Thilakaratne Menasha,Falkner Katrina
Abstract
AbstractThis paper presents our approaches for the SMM4H’24 Shared Task 5 on the binary classification of English tweets reporting children’s medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, out-performing the benchmark classifier by 1.18%.
Publisher
Cold Spring Harbor Laboratory
Reference12 articles.
1. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping;arXiv preprint,2020
2. Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance;PLoS One,2019
3. Language of adhd in adults on social media;Journal of Attention Disorders,2019
4. Yuting Guo , Xiangjue Dong , Mohammed Ali Al-Garadi , Abeed Sarker , Cécile Paris , and Diego Mollá-Aliod . 2020. Benchmarking of transformer-based pretrained models on social media text classification datasets. In Workshop of the Australasian Language Technology Association, pages 86–91.
5. Using Twitter to Detect Psychological Characteristics of Self-Identified Persons With Autism Spectrum Disorder: A Feasibility Study