Affiliation:
1. Sfax University, MIRACL Laboratory, Sfax, Tunisia
Abstract
Document indexing phase plays a significant role in text mining applications such as text document classification. The common indexing paradigm is based on terms frequency in documents known as Bag Of Words (BOW)-based representation approach. However, such classical approach suffers from ambiguity and disparity of words. In addition, traditional term weighting schemes, such as TF-IDF, exploit only the statistical information of terms in documents. To overcome these problems, we have been interested in biomedical semantic document indexing using concepts extracted from the knowledge resource MeSH. Thus, we have focused first on a disambiguation method to identify the adequate senses of ambiguous MeSH concepts and we have considered four representation enrichment strategies to identify the best appropriate representatives of the adequate sense in the textual entities representation. Second, we propose to introduce a semantic weighting scheme that quantifies MeSH concept’s importance in documents through their occurrence frequency and semantic similarities with unambiguous MeSH concepts. Our contribution lies particularly in the in-depth experimental study of the performance of these methods and precisely the impact of the semantic weighting scheme on the performance. To do that, three benchmark datasets TREC 2004 genomics, BioCreative II and OHSUMED were used.
Publisher
World Scientific Pub Co Pte Ltd
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献