A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER

Author:

Alias Suraya1,Sainin Mohd Shamrie1,Mohammad Siti Khaotijah2

Affiliation:

1. Faculty of Computing and Informatics, Universiti Malaysia Sabah, Malaysia

2. School of Computer Sciences, Universiti Sains Malaysia, Malaysia

Abstract

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Publisher

UUM Press, Universiti Utara Malaysia

Subject

General Mathematics,General Computer Science

Reference76 articles.

1. Ab Aziz, M. J., Ahmad, F., Ghani, A. A. A., & Mahmod, R. (2006).

2. Pola grammar technique for grammatical relation extraction

3. in Malay language. Malaysian Journal of Computer Science,

4. 19(1), 59.

5. Alfred, R., Mujat, A., & Obit, J. H. (2013). A ruled-based part of speech

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3