Performance Analysis of Embedding Methods for Deep Learning-Based Turkish Sentiment Analysis Models-Reference-Cited by-同舟云学术

Performance Analysis of Embedding Methods for Deep Learning-Based Turkish Sentiment Analysis Models

Published:2024-08-01 Issue: Volume: Page:
ISSN:2193-567X
Container-title:Arabian Journal for Science and Engineering
language:en
Short-container-title:Arab J Sci Eng

Author:

Ba Alawi Abdulfattah^ORCID,Bozkurt Ferhat^ORCID

Abstract

AbstractThe complex syntactic structure of Turkish text makes sentiment analysis in natural language processing (NLP) a challenging task. Conventional sentiment analysis methods often fail to effectively identify attitudes in Turkish texts, creating an urgent need for more efficient approaches. To fill this need, our study investigates the effectiveness of embedding techniques including pre-trained Turkish models such as Word2Vec, GloVe, and FastText in addition to two character-level embedding methods, namely, character-integer embedding (CIE) and character one-hot encoding embedding (COE), in conjunction with deep learning models specifically long short-term memory (LSTM), convolution neural networks (CNNs), bidirectional LSTM (Bi-LSTM), and hybrid models, for Turkish short-texts sentiment analysis. DL-based models were investigated on two datasets (e.g., an original Twitter (X) dataset and an accessible hotel reviews dataset). In addition to providing an intensive performance analysis of different embedding strategies and assessing their efficacy in dealing with the linguistic intricacies of Turkish, this study proposed a previously unexplored method in Turkish text representation that relies on a character-level one-hot encoding technique. The obtained findings indicate positive progress using a novel approach utilizing a dual-pathway architecture for both character level and word level that constitutes a substantial contribution to the area of natural language processing (NLP), specifically in the context of complex morphological languages. By employing a hybrid strategy that combines character and word levels on Twitter (X) data, the LSTM model obtained an F1 score of

$$0.835 \pm 0.005$$

0.835 ± 0.005 concerning cross-validation while CNN-BiLSTM attained the highest F1 Score (0.8392) using holdout validation. This strategy consistently produced modest improvements across the second public dataset (hotel reviews dataset) by emerging as the runner-up embedding technique in effectiveness, surpassed only by FastText. Findings provide practical recommendations for practitioners on how to effectively use sentiment analysis to make informed decisions by introducing an extensive performance analysis of the use of embedding techniques and deep learning models for sentiment analysis in Turkish texts, which is crucial in the current age of data analysis.

Funder

Ataturk University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13369-024-09360-4.pdf

Reference40 articles.

1. Çoban, Ö., Özyer, B.; Özyer, G.T.: Sentiment analysis for Turkish twitter feeds. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU), pp. 2388–2391. IEEE (2015)

2. Köksal, A.; Özgür, A.: Twitter dataset and evaluation of transformers for turkish sentiment analysis. In: 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2021)

3. Munezero, M.; Montero, C.S.; Sutinen, E.; Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014)

4. Shanmuga Sundari, M.; Samyuktha, P.; Kranthi, A.; Das, S.: Evaluating performance on covid-19 tweet sentiment analysis outbreak using support vector machine. In: Smart Intelligent Computing and Applications, Volume 1: Proceedings of Fifth International Conference on Smart Computing and Informatics (SCI 2021), pp. 151–159. Springer (2022)

5. Coban, O.; Yağanoğlu, M.; Bozkurt, F.: Domain effect investigation for bert models fine-tuned on different text categorization tasks. Arab. J. Sci. Eng. 1–18 (2023)