Template-based wrappers in the TSIMMIS system

Author:

Hammer Joachim1,García-Molina Héctor1,Nestorov Svetlozar1,Yerneni Ramana1,Breunig Marcus1,Vassalos Vasilis1

Affiliation:

1. Department of Computer Science, Stanford University, Stanford, CA

Abstract

In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided by so-called (source) wrappers [4,8] which convert queries into one or more commands/queries understandable by the underlying source and transform the native results into a format understood by the application. As part of the TSIMMIS project [1, 6] we have developed hard-coded wrappers for a variety of sources (e.g., Sybase DBMS, WWW pages, etc.) including legacy systems (Folio). However, anyone who has built a wrapper before can attest that a lot of effort goes into developing and writing such a wrapper. In situations where it is important or desirable to gain access to new sources quickly, this is a major drawback. Furthermore, we have also observed that only a relatively small part of the code deals with the specific access details of the source. The rest of the code is either common among wrappers or implements query and data transformation that could be expressed in a high level, declarative fashion. Based on these observations, we have developed a wrapper implementation toolkit [7] for quickly building wrappers. The toolkit contains a library for commonly used functions, such as for receiving queries from the application and packaging results. It also contains a facility for translating queries into source-specific commands, and for translating results into a model useful to the application. The philosophy behind our “template-based” translation methodology is as follows. The wrapper implementor specifies a set of templates (rules) written in a high level declarative language that describe the queries accepted by the wrapper as well as the objects that it returns. If an application query matches a template, an implementor-provided action associated with the template is executed to provide the native query for the underlying source 1 . When the source returns the result of the query, the wrapper transforms the answer which is represented in the data model of the source into a representation that is used by the application. Using this toolkit one can quickly design a simple wrapper with a few templates that cover some of the desired functionality, probably the one that is most urgently needed. However, templates can be added gradually as more functionality is required later on. Another important use of wrappers is in extending the query capabilities of a source. For instance, some sources may not be capable of answering queries that have multiple predicates. In such cases, it is necessary to pose a native query to such a source using only predicates that the source is capable of handling. The rest of the predicates are automatically separated from the user query and form a filter query . When the wrapper receives the results, a post-processing engine applies the filter query. This engine supports a set of built-in predicates based on the comparison operators =,≠,<,>, etc. In addition, the engine supports more complex predicates that can be specified as part of the filter query. The postprocessing engine is common to wrappers of all sources and is part of the wrapper toolkit. Note that because of postprocessing, the wrapper can handle a much larger class of queries than those that exactly match the templates it has been given. Figure 1 shows an overview of the wrapper architecture as it is currently implemented in our TSIMMIS testbed. Shaded components are provided by the toolkit, the white component is source-specific and must be generated by the implementor. The driver component controls the translation process and invokes the following services: the parser which parses the templates, the native schema, as well as the incoming queries into internal data structures, the matcher which matches a query against the set of templates and creates a filter query for postprocessing if necessary, the native component which submits the generated action string to the source, and extracts the data from the native result using the information given in the source schema, and the engine , which transforms and packages the result and applies a postprocessing filter if one has been created by the matcher. We now describe the sequence of events that occur at the wrapper during the translation of a query and its result using an example from our prototype system. The queries are formulated using a rule-based language called MSL that has been developed as a template specification and query language for the TSIMMIS project. Data is represented using our Object Exchange Model (OEM). We will briefly describe MSL and OEM in the next section. Details on MSL can be found in [5], a full introduction to OEM is given in [1].

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Reference8 articles.

1. J. Hammer H. Garcia.Molina R. Aranha and Y. Cho. Extracting semistructured information from the Web. Technical report Computer Science Department Stanford University Stanford CA 94305-9040 February 1997. J. Hammer H. Garcia.Molina R. Aranha and Y. Cho. Extracting semistructured information from the Web. Technical report Computer Science Department Stanford University Stanford CA 94305-9040 February 1997.

Cited by 26 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Interactive Table Synthesis with Natural Language;IEEE Transactions on Visualization and Computer Graphics;2024

2. Leveraging spatial join for robust tuple extraction from web pages;Information Sciences;2014-03

3. The H $\imath$ L ε X System for Semantic Information Extraction;Transactions on Large-Scale Data- and Knowledge-Centered Systems V;2012

4. Automatically extracting user reviews from forum sites;Computers & Mathematics with Applications;2011-10

5. Challenges, approaches and architecture for distributed process integration in heterogeneous environments;Advanced Engineering Informatics;2008-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3