XML and information retrieval

Author:

Carmel David1,Maarek Yoelle1,Soffer Aya1

Affiliation:

1. IBM Research Lab in Haifa

Abstract

XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Interact. It is believed that it will become a universal format for data exchange on the Web and that in the near future we will find vast amounts of documents in XML format on the Web. As a result, it has become crucial to address the question of how large collections of XML documents can be sorted and retrieved efficiently and effectively.To date, most work on storing, indexing, querying, and searching documents in XML has stemmed from the database community's work on semi-structured data. An alternative approach, that has received less attention to date, is to view XML documents as a collection of text documents with additional tags and relations between these tags. IR techniques have traditionally been applied to search large sets of textual data and should thus be extended to encode the structure and semantics inherent in XML documents. Integrating IR and XML search techniques will enable more sophisticated search on the structure as well as the content of these documents, while leveraging the success of IR techniques in document similarity ranking and keyword search.The SIGIR workshop on XML and information retrieval was held July 28th, in Athens Greece. The goal of the workshop was to bring together researchers and practitioners interested in XML and IR to discuss and define the most relevant topics in the relation between these two technologies, present recent results, and propose future directions for research. The topics for discussion included:• How to extend IR technologies to search XML documents• How to integrate XML structure in IR indexing structures• How to query XML documents both on content and structure• How to introduce the semantics inherent in XML into the search process• How to adopt database indexing techniques in an IR frameworkThe opening session of the workshop consisted of a survey of search engines for XML documents. This was followed by three technical sessions: query languages, retrieval algorithms, and IR systems for XML documents. The final talk of the day, "Searching Annotated Language Resources in XML", by Nancy Ide was given from the perspective of potential users of XML search systems and opened many topics for discussion. The workshop was concluded with a panel discussion where the panelists outlined their vision of the future of XML search.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. BM25t: a BM25 extension for focused information retrieval;Knowledge and Information Systems;2011-06-14

2. A survey in indexing and searching XML documents;Journal of the American Society for Information Science and Technology;2002

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3