Affiliation:
1. University of Waikato, New Zealand
2. University of Illinois
Abstract
This article discusses a new approach to scholarly search and discovery in large-scale text corpora. While lexicographic search is at present the predominant means to access large document corpora, it cannot directly address the inherent ambiguity of natural language. As a pragmatic solution, many scholars manually build their own list of suitable search terms to be used in repeated searches in digital libraries and other online resources; however, scholars then have to resolve on a case-by-case basis issues caused by synonyms, homonyms and OCR errors. Our approach differs from this by supporting scholars in developing and refining a set of relevant concepts, searches a large document collection using semantic concepts, and categorizes the potentially relevant documents from search results into worksets. The developed technique revisits the notion of semantic search and redesigns both the underlying data representation and interface support. This is achieved through an end-to-end design that relies centrally on a Concept-in-Context network sourced through the link structure of Wikipedia. We discuss here the principles of our approach, its implementation in the
Capisco
prototype, and the relationship between established search techniques and our approach.
Publisher
Association for Computing Machinery (ACM)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献