Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

Author:

Kumar S.,Kumar M. Anand,Soman K.P.

Abstract

Abstract The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Information Systems,Software

Reference118 articles.

1. Improved part-of-speech tagging for online conversational text with word clusters;The Association for Computational Linguistics in the Proceedings of Human Language Technologies,2013

2. Part of speech tagging for Hindi corpus;International Conference on Communication Systems and Network Technologies (CSNT),2011

3. A unified architecture for natural language processing: deep neural networks with multitask learning;Proceedings of the 25th International Conference on Machine Learning,2008

Cited by 30 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Emotion Detection System for Malayalam Text using Deep Learning and Transformers;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-05

2. Natural Language Processing in Knowledge-Based Support for Operator Assistance;Applied Sciences;2024-03-26

3. Identifying the Influences Behind the LinkedIn Posts using Topic Modeling and Sentiment Analysis;2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies;2024-03-22

4. Part-of-Speech Tagging Accuracy for Manufacturing Process Documents and Knowledge;Lecture Notes in Networks and Systems;2024

5. Real-Time Sign Language Recognition of Words and Sentence Generation using MediaPipe and LSTM;Algorithms for Intelligent Systems;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3