“Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora”

Author:

Hinze Annika1,Taube-Schock Craig1,Bainbridge David1,Cunningham Sally Jo1,Downie J. Stephen2

Affiliation:

1. University of Waikato, New Zealand

2. University of Illinois

Abstract

This article discusses a new approach to scholarly search and discovery in large-scale text corpora. While lexicographic search is at present the predominant means to access large document corpora, it cannot directly address the inherent ambiguity of natural language. As a pragmatic solution, many scholars manually build their own list of suitable search terms to be used in repeated searches in digital libraries and other online resources; however, scholars then have to resolve on a case-by-case basis issues caused by synonyms, homonyms and OCR errors. Our approach differs from this by supporting scholars in developing and refining a set of relevant concepts, searches a large document collection using semantic concepts, and categorizes the potentially relevant documents from search results into worksets. The developed technique revisits the notion of semantic search and redesigns both the underlying data representation and interface support. This is achieved through an end-to-end design that relies centrally on a Concept-in-Context network sourced through the link structure of Wikipedia. We discuss here the principles of our approach, its implementation in the Capisco prototype, and the relationship between established search techniques and our approach.

Publisher

Association for Computing Machinery (ACM)

Reference18 articles.

1. Basile V. Bos J. Evang K. and Venhuizen N. 2012. Developing a large semantically annotated corpus. In LREC. Vol. 12. 3196--3200. Basile V. Bos J. Evang K. and Venhuizen N. 2012. Developing a large semantically annotated corpus. In LREC. Vol. 12. 3196--3200.

2. Erling O. and Mikhailov I. 2009. Rdf support in the virtuoso dbms. In Networked Knowledge-Networked Media. Springer 7--24. Erling O. and Mikhailov I. 2009. Rdf support in the virtuoso dbms. In Networked Knowledge-Networked Media. Springer 7--24.

3. A Case Study in Pragmatism

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Capisco: low-cost concept-based access to digital libraries;International Journal on Digital Libraries;2018-03-14

2. Writers of the Lost Paper: A Case Study on Barriers to (Re-) Finding Publications;Digital Libraries: Data, Information, and Knowledge for Digital Lives;2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3