Forester: Approximate Processing of an Imperative Procedure for Query-Time Exploratory Data Analysis in a Relational Database

Author:

Rahman Md Arif12,Lee Young-Koo1

Affiliation:

1. Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Republic of Korea

2. Department of Computer Science and Engineering, Jashore University of Science and Technology, Jessore 7408, Bangladesh

Abstract

Query-time Exploratory Data Analysis (qEDA) is an increasingly demanding aspect of the data analysis process that entails visually and quantitatively summarizing, comprehending, and interpreting the primary characteristics of a dataset. Nowadays, an imperative procedure is popular in relational databases for EDA because it enables us to write multiple dependent declarative queries with imperative logic. As online analytical processing (OLAP) systems contain extremely large datasets, data scientists often need quick visualizations of data, using approximate processing of imperative procedures, before analyzing them in their entirety. We identify gaps in the existing techniques, in that they are unable to sample both declarative-dependent statements and control logic at the same time and perform multi-dependent sampling-based approximate processing within the permitted time in qEDA. Traditional approximate query processing (AQP) involves tuple sampling for a single query approximation and enables queries to be executed over arbitrary random samples of tables. However, available AQP methods cannot produce a further representative sample of the data distribution for the dependent statements to estimate accurately and quickly for multiple dependent statements. On the other hand, sampling control structures, like loops and conditional statements, are discussed separately, without regard to the imperative structure of statements in a procedure. In this study, we propose Forester, a novel agile approximate processing method for imperative procedures that performs imperative program-aware sampling, which includes both statements with control regions (i.e., branch and loop) and processes them approximately within the permitted time in qEDA. Our method produces more targeted samples for each relation, while maintaining the data and control flow of dependent queries and imperative logic and determining all the conditions for a relation across all the statements in the sample that guarantee the existence of relevant data for dependent data distribution. Utilizing a workload of multi-statement imperative procedures from the Transaction Processing Performance Council Decision Support (TPC-DS) database, our experiment demonstrates that Forester outperforms the existing system in sampling, producing minimum error, and improving response time.

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3