Affiliation:
1. University of Paris-Sorbonne, France
Abstract
A new model is proposed to retrieve information by building automatically a semantic metatext1 structure for texts that allow searching and extracting discourse and semantic information according to certain linguistic categorizations. This paper presents approaches for searching and mining full text with semantic categories. The model is built up from two engines: The first one, called EXCOM (Djioua et al., 2006; Alrahabi, 2010), is an automatic system for text annotation, related to discourse and semantic maps, which are specification of general linguistic ontologies founded on the Applicative and Cognitive Grammar. The annotation layer uses a linguistic method called Contextual Exploration, which handles the polysemic values of a term in texts. Several ‘semantic maps’ underlying ‘point of views’ for text mining guide this automatic annotation process. The second engine uses semantic annotated texts, produced previously in order to create a semantic inverted index, which is able to retrieve relevant documents for queries associated with discourse and semantic categories such as definition, quotation, causality, relations between concepts, etc. (Djioua & Desclés, 2007). This semantic indexation process builds a metatext layer for textual contents. Some data and linguistic rules sets as well as the general architecture that extend third-party software are expressed as supplementary information.
Reference19 articles.
1. Alrahabi, M. (2010). EXCOM2: Plate-forme d'annotation automatique de catégories sémantiques: Conception, modélisation et réalisation informatique. Applications à la catégorisation des citations en arabe et en français (Unpublished doctoral dissertation). University of Paris-Sorbonne, Paris, France.
2. Alrahabi, M., & Desclés, J.-P. (2008, August 25-27). Automatic annotation of direct reported speech in Arabic and French, according to semantic map of enunciative modalities. In Proceedings of the 6th International Conference on Natural Language Processing, Gothenburg, Sweden (pp. 41-51).
3. Text mining and its potential applications in systems biology
4. A survey of Web clustering engines