Multi-label emotion classification of Urdu tweets-Reference-Cited by-同舟云学术

Multi-label emotion classification of Urdu tweets

Published:2022-04-22 Issue: Volume:8 Page:e896
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ashraf Noman¹,Khan Lal²^ORCID,Butt Sabur¹^ORCID,Chang Hsien-Tsung²³⁴^ORCID,Sidorov Grigori¹,Gelbukh Alexander¹^ORCID

Affiliation:

1. CIC, Instituto Politécnico Nacional, Mexico City, Mexico

2. Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan

3. Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan

4. Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan

Abstract

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

Funder

CONACYT

Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-896.pdf

Reference79 articles.

1. Experiences in building Urdu wordnet;Adeeba,2011

2. Emotions from text: machine learning for text-based emotion prediction;Alm,2005