Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature

Author:

K. Nambiar Sindhya1ORCID,Peter S. David2ORCID,Mary Idicula Sumam3ORCID

Affiliation:

1. Department of Computer Science, Cochin University of Science and Technology, Kerala, India

2. School of Engineering, Cochin University of Science and Technology, Kerala, India

3. Department of Computer Science and Engineering, Muthoot Institute of Technology and Science, Ernakulam, Kerala, India

Abstract

Over the past few years, researchers are showing huge interest in sentiment analysis and summarization of documents. The primary reason being that huge volumes of information are available in textual format, and this data has proven helpful for real-world applications and challenges. The sentiment analysis of a document will help the user comprehend the content’s emotional intent. Abstractive summarization algorithms generate a condensed version of the text, which can then be used to determine the emotion represented in the text using sentiment analysis. Recent research in abstractive summarization concentrates on neural network-based models, rather than conjunctions-based approaches, which might improve the overall efficiency. Neural network models like attention mechanism are tried out to handle complex works with promising results. The proposed work aims to present a novel framework that incorporates the part of speech tagging feature to the word embedding layer, which is then used as the input to the attention mechanism. With POS feature being part of the input layer, this framework is capable of dealing with words containing contextual and morphological information. The relevance of POS tagging here is due to its strong reliance on the language’s syntactic, contextual, and morphological information. The three main elements in the work are pre-processing, POS tagging feature in the embedding phase, and the incorporation of it into the attention mechanism. The word embedding provides the semantic concept about the word, while the POS tags give an idea about how significant the words are in the context of the content, which corresponds to the syntactic information. The proposed work was carried out in Malayalam, one of the prominent Indian languages. A widely used and accepted dataset from the English language was translated to Malayalam for conducting the experiments. The proposed framework gives a ROUGE score of 28, which outperformed the baseline models.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference28 articles.

1. A. P. Ajees and Sumam Mary Idicula. 2018. A POS tagger for Malayalam using conditional random fields. Int. J. Appl. Eng. Res. 13 3 (2018).

2. Leveraging Linguistic Structure For Open Domain Information Extraction

3. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

4. Lidong Bing Piji Li Yi Liao Wai Lam Weiwei Guo and Rebecca J. Passonneau. 2015. Abstractive multi-document summarization via phrase selection and merging. arXiv preprint arXiv:1506.01597 (2015).

5. Deep communicating agents for abstractive summarization;Celikyilmaz Asli;arXiv preprint arXiv:1803.10357,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3