A declarative approach to optimize bulk loading into databases

Author:

Amer-Yahia Sihem1,Cluet Sophie2

Affiliation:

1. AT&T Labs--Research, USA

2. INRIA, France

Abstract

Applications, such as warehouse maintenance, need to load large data volumes regularly. The efficiency of loading depends on the resources that are available at the source and at the target systems. Our work aims to understand the performance criteria that are involved in bulk loading data into a database and to devise tailored optimization strategies.Unlike commercial systems and previous research on the same topic, our approach follows the fundamental database principle of physical-logical independence. A loading program is represented as a sequence of algebraic expressions. This abstraction enables the use of appropriate algebraic rewritings to optimize a loading program and of a cost model that takes into consideration efficiency criteria such as the processing times at the source and target systems and the bandwidth between them. A slow-loading program may be preferable if it does not slow down other applications by consuming too much memory. Thus, we view the problem of optimizing a loading program as finding a compromise between several efficiency criteria.The ability to represent loading programs in an algebra and performance criteria in a cost model has two very desirable properties: reusability and efficiency. Database programmers do not have to write loading programs by hand. In addition, tuning loading programs becomes easier since programmers have a better control on the performance criteria specified in the cost model. The algebra captures data transformations that would have been otherwise hardcoded in loading programs. Consequently, richer optimizations can be explored. Finally, our optimization techniques are not specific to one particular system. They can be used for loading data and from to any structured store (e.g., relational, structured files).We implemented our ideas in a complete environment for migrating ODBC-compliant databases into the O 2 object-oriented database system. This prototype provides a declarative view language to specify loading, an interface to specify directives, such as desired database physical organization and constraints on several criteria, such as resource and bandwidth consumption, an algebraic optimizer, a code generator, and an execution environment to control failures and guarantee incremental loading. Our experiments show that a tailored optimization is necessary when loading large data volumes into a database.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference44 articles.

1. Objects and views

2. Automatic importation of relational schemas in Pegasus

3. Amer-Yahia S. 1999. The RelOO System Web Page. http://www.research.att.com/_˜sihem/relooweb/index.html.]] Amer-Yahia S. 1999. The RelOO System Web Page. http://www.research.att.com/_˜sihem/relooweb/index.html.]]

4. Object views and updates;Amer-Yahia S.;Ing. Syst. d'Inf.,1997

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DBMS Data Loading: An Analysis on Modern Hardware;Data Management on New Hardware;2017

2. Towards a User-Friendly Loading System for the Analysis of Big Data in the Internet of Things;2014 IEEE 38th International Computer Software and Applications Conference Workshops;2014-07

3. Simulation of Data Management in Group Purchasing System;Advanced Materials Research;2008-03

4. On Handling One-to-Many Transformations in Relational Systems;Enterprise Information Systems;2008

5. Bulk Loading a Linear Hash File;Data Warehousing and Knowledge Discovery;2006

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3