Affiliation:
1. IRIT, CNRS – University of Toulouse, 118 route de Narbonne, 31062 Toulouse Cedex 9, France
Abstract
The paper presents a preliminary investigation of potential methods for extracting semantic views of text contents under the form of structured sets of words, which go beyond standard statistical indexing. The aim is to build kinds of fuzzily weighted structured images of semantic contents. A preliminary step consists in identifying the different types of relations (is-a, part-of, related-to, synonymy, domain, glossary relations) that exist between the words of a text, using some general ontology such as WordNet. Then taking advantage of these relations, different types of fuzzy clusters of words can be built. Moreover, apart from its frequency of occurrence, the importance of a word may be also evaluated through some estimate of its specificity. A degree of "centrality" is also computed for each word in a cluster. The size of the clusters, the frequency, the specificity and the centrality of their words are indications that enable us to build a fuzzy set of sets of words that progressively "emerge" from a text, as being representative of its contents. The ideas advocated in the paper and their potential usefulness are illustrated on a running example and on two experiments. It is expected that obtaining a better representation of the semantic contents of texts may help in particular to give indications of what the text is about to a potential reader.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Information Systems,Control and Systems Engineering,Software
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献