Affiliation:
1. University of Manouba, Tunisia
Abstract
Topic Segmentation is one of the pillars of Natural Language Processing. Yet there is a remarkable research gap in this field, as far as the Arabic language is concerned. The purpose of this article is to improve Arabic Topic Segmentation (ATS) by inquiring into two segmenters: ArabC99 and ArabTextTiling. This study is carried out on two independent levels: the pre-processing level and the segmentation level. These levels represent the basic steps of topic segmentation. On the pre-processing level, we examine the effect of using different Arabic stemming algorithms on ATS. We find out that Light10 is more appropriate for the pre-processing step. Based on this conclusion, we proceed to the second level by proposing two Arabic segmenters called ArabC99-LS-LSA and ArabTextTiling-LS-LSA. These latter use external semantic knowledge related to the Latent Semantic Analysis (LSA). Based on the evaluation results, we notice that LSA provides improvements in this field. Hence, the main outcome of this article emphasizes the multilevel improvement of ATS based on Light10 and LSA.
Publisher
Association for Computing Machinery (ACM)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献