Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases-Reference-Cited by-同舟云学术

Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases

Published:2019-12-19 Issue:1 Volume:5 Page:65-79
ISSN:2364-1185
Container-title:Data Science and Engineering
language:en
Short-container-title:Data Sci. Eng.

Author:

Ji Yunhong^ORCID,Chai Yunpeng,Zhou Xuan,Ren Lipeng,Qin Yajie

Abstract

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.

Publisher

Springer Science and Business Media LLC

Subject

Computer Science Applications,Computational Mechanics

Link

http://link.springer.com/content/pdf/10.1007/s41019-019-00114-z.pdf

Reference33 articles.

1. Teradata. https://www.teradata.com/

2. Greenplum. http://greenplum.org/

3. Vertica. https://www.vertica.com/

4. Apache Impala. https://impala.apache.org/

5. Apache HAWQ. http://hawq.incubator.apache.org/

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Survey on performance optimization for database systems;Science China Information Sciences;2023-01-11

2. Sustainable Facial Authentication and Expression Prediction using Deep Learning Techniques;E3S Web of Conferences;2023

3. Erasable Virtual HyperLogLog for Approximating Cumulative Distribution over Data Streams;IEEE Transactions on Knowledge and Data Engineering;2022-11-01

4. Scalable and quantitative contention generation for performance evaluation on OLTP databases;Frontiers of Computer Science;2022-08-09

5. Design and Implementation of Data Analysis System for Ship Arrival and Departure Report;Scientific Journal of Technology;2022-06-20