Exogenous approach to improve topic segmentation

Author:

Naili Marwa,Habacha Chaibi Anja,Hajjami Ben Ghezala Henda

Abstract

Purpose – Topic segmentation is one of the active research fields in natural language processing. Also, many topic segmenters have been proposed. However, the current challenge of researchers is the improvement of these segmenters by using external resources. Therefore, the purpose of this paper is to integrate study and evaluate a new external semantic resource in topic segmentation. Design/methodology/approach – New topic segmenters (TSS-Onto and TSB-Onto) are proposed based on the two well-known segmenters C99 and TextTiling. The proposed segmenters integrate semantic knowledge to the segmentation process by using a domain ontology as an external resource. Subsequently, an evaluation is made to study the effect of this resource on the quality of topic segmentation along with a comparative study with related works. Findings – Based on this study, the authors showed that adding semantic knowledge, which is extracted from a domain ontology, improves the quality of topic segmentation. Moreover, TSS-Ont outperforms TSB-Ont in terms of quality of topic segmentation. Research limitations/implications – The main limitation of this study is the used test corpus for the evaluation which is not a benchmark. However, we used a collection of scientific papers from well-known digital libraries (ArXiv and ACM). Practical implications – The proposed topic segmenters can be useful in different NLP applications such as information retrieval and text summarizing. Originality/value – The primary original contribution of this paper is the improvement of topic segmentation based on semantic knowledge. This knowledge is extracted from an ontological external resource.

Publisher

Emerald

Subject

General Computer Science

Reference26 articles.

1. Bayomi, M. , Levacher, K. , Ghorab, M.R. and Lawless, S. (2015), “OntoSeg: a novel approach to text segmentation using ontological similarity”, IEEE International Conference on Data Mining Workshop, ICDMW 2015, Atlantic City, NJ, pp. 1274-1283.

2. Bestgen, Y. (2006), “Improving text segmentation using latent semantic analysis: a reanalysis of Choi, Wiemer-Hastings and Moore”, Computational Linguistics , Vol. 32 No. 3, pp. 5-12.

3. Bestgen, Y. and Pierard, S. (2006), “Comment evaluer les algorithmes de segmentation thematique? Essai de construction d’un mmateriel de reference”, Actes de TALN: Verbum ex machina, Louvain-La-Neuve, Presse universitaire de Louvain, pp. 407-414.

4. Blei, D.M. , Ng, A.Y. and Jordan, M.I. (2003), “Latent dirichlet allocation”, The Journal of Machine Learning Research , Vol. 3, January, pp. 993-1022.

5. Brants, T. , Chen, F. and Farahat, A. (2002), “Arabic document topic analysis”, TREC, NIST, Gaithersburg, MD.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Idea plagiarism detection with recurrent neural networks and vector space model;International Journal of Intelligent Computing and Cybernetics;2021-03-26

2. The Contribution of Stemming and Semantics in Arabic Topic Segmentation;ACM Transactions on Asian and Low-Resource Language Information Processing;2018-02-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3