Abstract
In recent years, social media has emerged as a crucial source of information for gauging public sentiment on a variety of topics. As a result, the need for automated data extraction from these platforms has grown. Stance detection, a subtask in natural language processing, plays a pivotal role in this process by automatically determining users' opinions regarding specific subjects, events, or individuals. To address this, we developed a labeled Turkish dataset focused on determining users' stances on the Russia-Ukraine War using social media content. The dataset, comprising 8215 tweets from Twitter, was meticulously cleaned and annotated for two key targets: Russia and Ukraine. We evaluated several machine learning methods, including Support Vector Machines, Random Forest, k-Nearest Neighbor, XGBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU), with word embeddings from GloVe and FastText. Additionally, we incorporated a transformer-based approach for stance detection. Given the dataset's imbalance between targets, we applied undersampling and oversampling techniques alongside these algorithms. Our experiment results indicate that BERT-based models outperformed all other methods, with LSTM and GRU producing similarly strong outcomes. The newly established Turkish corpus stands as a valuable resource in this field, with potential for future use in conjunction with transformer-based approaches. In summary, this study advances the field of stance detection research in the context of Turkish text.
Publisher
Afyon Kocatepe Universitesi Fen Ve Muhendislik Bilimleri Dergisi