Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent Method
-
Published:2024-06-11
Issue:1
Volume:15
Page:561-575
-
ISSN:2763-7719
-
Container-title:Journal on Interactive Systems
-
language:
-
Short-container-title:JIS
Author:
Viegas FelipeORCID, Canuto SergioORCID, Cunha WashingtonORCID, França CelsoORCID, Valiense ClaudioORCID, Fonseca GuilhermeORCID, Machado AnaORCID, Rocha LeonardoORCID, Gonçalves Marcos AndréORCID
Abstract
The challenge of constructing effective sentiment models is exacerbated by a lack of sufficient information, particularly in short texts. Enhancing short texts with semantic relationships becomes crucial for capturing affective nuances and improving model efficacy, albeit with the potential drawback of introducing noise. This article introduces a novel approach, CluSent, designed for customized dataset-oriented sentiment analysis. CluSent capitalizes on the CluWords concept, a proposed powerful representation of semantically related words. To address the issues of information scarcity and noise, CluSent addresses these challenges: (i) leveraging the semantic neighborhood of pre-trained word embedding representations to enrich document representation and (ii) introducing dataset-specific filtering and weighting mechanisms to manage noise. These mechanisms utilize part-of-speech and polarity/intensity information from lexicons. In an extensive experimental evaluation spanning 19 datasets and five state-of-the-art baselines, including modern transformer architectures, CluSent emerged as the superior method in the majority of scenarios (28 out of 38 possibilities), demonstrating noteworthy performance gains of up to 14% over the strongest baselines.
Publisher
Sociedade Brasileira de Computacao - SB
Reference50 articles.
1. Abiola, O., Abayomi-Alli, A., Tale, O. A., Misra, S., and Abayomi-Alli, O. (2023). Sentiment analysis of covid-19 tweets from selected hashtags in nigeria using vader and text blob analyser. Journal of Electrical Systems and Information Technology, 10(1):5. DOI: https://doi.org/10.1186/s43067-023-00070-9. 2. Alissa, M., Haddad, I., Meyer, J., Obeid, J., Vilaetis, K., Wiecek, N., and Wongariyakavee, S. (2021). Sentiment analysis for open domain conversational agent. CoRR, abs/2101.00675. DOI: https://doi.org/10.48550/arXiv.2101.00675. 3. Aljedaani, W., Rustam, F., Mkaouer, M. W., Ghallab, A., Rupapara, V., Washington, P. B., Lee, E., and Ashraf, I. (2022). Sentiment analysis on twitter data integrating textblob and deep learning models: The case of us airline industry. Knowledge-Based Systems, 255:109780. DOI: https://doi.org/10.1016/j.knosys.2022.109780. 4. Amin, A., Hossain, I., Akther, A., and Alam, K. M. (2019). Bengali vader: A sentiment analysis approach using modified vader. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pages 1–6. DOI: https://doi.org/10.1109/ECACE.2019.8679144. 5. Arkin, E. M., Banik, A., Carmi, P., Citovsky, G., Katz, M. J., Mitchell, J. S., and Simakov, M. (2018). Selecting and covering colored points. Discrete Applied Mathematics, 250:75–86. DOI: https://doi.org/10.1016/j.dam.2018.05.011.
|
|