A Hybrid Arabic text summarization Approach based on Seq-to-seq and Transformer

Author:

Elsaid asmaa1,mohamed ammar1,Fattouh lamiaa1,sakre mohamed2

Affiliation:

1. Cairo University, Faculty of Graduate Studies of Statistical Researches

2. Higher Institute for Computers & Information Technology, ElShorouk

Abstract

Abstract Text summarization is essential in natural language processing as the data volume increases quickly. Therefore, the user needs to summarize that data into a meaningful text in a short time. There are two common methods of text summarization: extractive and abstractive. There are many efforts to summarize Latin texts. However, summarizing Arabic texts is challenging for many reasons, including the language’s complexity, structure, and morphology. Also, there is a need for benchmark data sources and a gold standard Arabic evaluation metrics summary. Thus, the contribution of this paper is multi-fold: First, the paper proposes a hybrid approach consisting of a Modified Sequence-To-Sequence (MSTS) model and a transformer-based model. The seq-to-seq-based model is modified by adding multi-layer encoders and a one-layer decoder to its structure. The output of the MSTS model is the extractive summarization. To generate the abstractive summarization, the extractive summarization is manipulated by a transformer-based model. Second, it introduces a new Arabic benchmark dataset, called the HASD, which includes 43k articles with their extractive and abstractive summaries. Third, this work modifies the well-known extractive EASC benchmarks by adding to each text its abstractive summarization. Finally, this paper proposes a new measure called the Arabic-rouge measure for the abstractive summary depending on structure and similarity between words. The proposed method is tested using the proposed HASD and Modified EASC benchmarks and evaluated using Rouge, Bleu, and Arabic Rouge. The experimental results show satisfactory results compared to state-of-the-art methods.

Publisher

Research Square Platform LLC

Reference43 articles.

1. Arabic text summarization using AraBert model using extractive text summarization approach;Abu Nada AM;International Journal of Academic Information Systems Research (IJAISR),2020

2. Arabic text summarization using deep learning approach;Al-Maleh M;Journal of Big Data,2020

3. A hybrid approach for Arabic text summarization using domain knowledge and genetic algorithms;Al-Radaideh QA,2018

4. Cognitive Computation 10, 651–669.

5. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms;Al-Salemi B;Information Processing & Management,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3