Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning-Reference-Cited by-同舟云学术

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

Published:2022-08-25 Issue:5 Volume:21 Page:1-35
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Maity Shuverthi¹^ORCID,Sarkar Kamal¹^ORCID

Affiliation:

1. Jadavpur University, Kolkata, West Bengal, India

Abstract

Sentiment analysis of public opinions on social networks, such as Twitter or Facebook, can provide us with valuable information, which has a wide range of applications. But the efficiency and accuracy of the automated methods for Twitter sentiment analysis are hindered by the special characteristics of the Twitter data. The Twitter data is generally noisy, high-dimensional, and it has complex syntactic and semantic structures. Sentiment analysis of Twitter data in Indian languages is more challenging because the data is multilingual and code-mixed. In this article, we propose various composite kernel functions, each of which is used with Support Vector Machines (SVM) for developing a model for topic sentiment analysis of Twitter data in Indian languages. Each composite kernel function is constructed by taking the weighted summation of multiple single kernel functions defined by us. In addition to our proposed composite kernel SVM method, we use several state-of-the-art deep learning classifiers for topic sentiment classification. Since any suitable Twitter dataset in Indian languages is not available for conducting our experiments, we have developed our own datasets by collecting tweets related to five different Twitter trending topics in India. To prove the robustness and generalization capability of the proposed models, they are also evaluated on the US airline Twitter dataset which is a publicly available benchmark English dataset. The empirical study exhibits that the proposed composite kernel SVM method is effective for the sentiment classification task. In the case of Indian language datasets, the proposed composite kernel SVM method achieves the highest average accuracy of 74% and the highest average F-score of 0.73. On the other hand, the deep learning-based method achieves the average accuracy and the average F-score of 71.31% and 0.70, respectively. In the case of the US airline Twitter dataset, the proposed composite kernel SVM method achieves the average accuracy of 83% and the average F-score of 0.82, which are higher than that of the deep learning-based method.

Funder

Department of Science and Technology

Government of India under the SERB scheme

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3519297

Reference59 articles.

1. Immunocomputing-Based Approach for Optimizing the Topologies of LSTM Networks

2. A multimodal feature learning approach for sentiment analysis of social network multimedia

3. Code Mixing: A Challenge for Language Identification in the Language of Social Media

4. Combined Syntactic and Semantic Kernels for Text Classification

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Multilingual Indian Twitter Sentiment Analysis: A Comparative Study;2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT);2023-07-06