Author:
BIRU TESFAYE,EL‐HAMDOUCHI ABDELMOULA,REES RODNEY S.,WILLETT PETER
Abstract
The term discrimination value of an index term has been proposed as a quantitative measure of the extent to which that term can discriminate between documents in bibliographic databases. Previous work has suggested that the most discriminating terms are those with medium frequencies of occurrence. This paper discusses the effect of including relevance data on the calculation of term discrimination values. Two algorithms are described that calculate the ability of index terms to discriminate between relevant documents, between non‐relevant documents or between relevant and non‐relevant documents. The application of these algorithms to several standard document test collections demonstrates that the exact form of the relationship between term frequency and term discrimination depends upon the particular type of discrimination which is being measured; in particular, medium frequency terms are not necessarily the best discriminators when relevance data is available. These results are compared with the discriminatory ability of terms as measured by their relevance weights, where the most discriminating terms are those with low frequencies of occurrence.
Subject
Library and Information Sciences,Information Systems
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献