Affiliation:
1. Technological Educational Institute of Athens, Greece
Abstract
This work is part of a project aiming to define a methodology for building simple but robust stemmers, having primitive knowledge of the stemmer’s target language. The methodology starts with a very simple primary stemmer that simply removes the longest suffix (using the primitive knowledge – the list of available suffixes) that matches the ending of the examined word. Information retrieval (IR) experts express their arguments against the results of the primary stemmer. These (the experts’ arguments) are valuable knowledge that offer us the ability to apply supervised learning in order to automatically produce better stemmers (that conform to the arguments expressed by the IR experts). We also conduct an evaluation of our supervised learning-based methodology that builds stemmers for languages that the experts do not have knowledge on.
Subject
Library and Information Sciences,Information Systems
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献