“Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora”-Reference-Cited by-同舟云学术

“Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora”

Published:2015-11-03 Issue:Autumn 2015 Volume: Page:1-14
ISSN:1931-1745
Container-title:ACM SIGWEB Newsletter
language:en
Short-container-title:SIGWEB Newsl.

Author:

Hinze Annika¹,Taube-Schock Craig¹,Bainbridge David¹,Cunningham Sally Jo¹,Downie J. Stephen²

Affiliation:

1. University of Waikato, New Zealand

2. University of Illinois

Abstract

This article discusses a new approach to scholarly search and discovery in large-scale text corpora. While lexicographic search is at present the predominant means to access large document corpora, it cannot directly address the inherent ambiguity of natural language. As a pragmatic solution, many scholars manually build their own list of suitable search terms to be used in repeated searches in digital libraries and other online resources; however, scholars then have to resolve on a case-by-case basis issues caused by synonyms, homonyms and OCR errors. Our approach differs from this by supporting scholars in developing and refining a set of relevant concepts, searches a large document collection using semantic concepts, and categorizes the potentially relevant documents from search results into worksets. The developed technique revisits the notion of semantic search and redesigns both the underlying data representation and interface support. This is achieved through an end-to-end design that relies centrally on a Concept-in-Context network sourced through the link structure of Wikipedia. We discuss here the principles of our approach, its implementation in the Capisco prototype, and the relationship between established search techniques and our approach.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2833219.2833223

Reference18 articles.

1. Basile V. Bos J. Evang K. and Venhuizen N. 2012. Developing a large semantically annotated corpus. In LREC. Vol. 12. 3196--3200. Basile V. Bos J. Evang K. and Venhuizen N. 2012. Developing a large semantically annotated corpus. In LREC. Vol. 12. 3196--3200.

2. Erling O. and Mikhailov I. 2009. Rdf support in the virtuoso dbms. In Networked Knowledge-Networked Media. Springer 7--24. Erling O. and Mikhailov I. 2009. Rdf support in the virtuoso dbms. In Networked Knowledge-Networked Media. Springer 7--24.

3. A Case Study in Pragmatism

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Capisco: low-cost concept-based access to digital libraries;International Journal on Digital Libraries;2018-03-14

2. Writers of the Lost Paper: A Case Study on Barriers to (Re-) Finding Publications;Digital Libraries: Data, Information, and Knowledge for Digital Lives;2017