A new framework based on features modeling and ensemble learning to predict query performance-Reference-Cited by-同舟云学术

A new framework based on features modeling and ensemble learning to predict query performance

Published:2021-10-18 Issue:10 Volume:16 Page:e0258439
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Zaghloul Mohamed^ORCID,Salem Mofreh,Ali-Eldin Amr^ORCID

Abstract

A query optimizer attempts to predict a performance metric based on the amount of time elapsed. Theoretically, this would necessitate the creation of a significant overhead on the core engine to provide the necessary query optimizing statistics. Machine learning is increasingly being used to improve query performance by incorporating regression models. To predict the response time for a query, most query performance approaches rely on DBMS optimizing statistics and the cost estimation of each operator in the query execution plan, which also focuses on resource utilization (CPU, I/O). Modeling query features is thus a critical step in developing a robust query performance prediction model. In this paper, we propose a new framework based on query feature modeling and ensemble learning to predict query performance and use this framework as a query performance predictor simulator to optimize the query features that influence query performance. In query feature modeling, we propose five dimensions used to model query features. The query features dimensions are syntax, hardware, software, data architecture, and historical performance logs. These features will be based on developing training datasets for the performance prediction model that employs the ensemble learning model. As a result, ensemble learning leverages the query performance prediction problem to deal with missing values. Handling overfitting via regularization. The section on experimental work will go over how to use the proposed framework in experimental work. The training dataset in this paper is made up of performance data logs from various real-world environments. The outcomes were compared to show the difference between the actual and expected performance of the proposed prediction model. Empirical work shows the effectiveness of the proposed approach compared to related work.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference36 articles.

1. How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study;S. F. Wamba;International Journal of Production Economics,2015

2. H. Baars and J. Ereth, “From data warehouses to analytical atoms–the internet of things as a centrifugal force in business intelligence and analytics,” in 24th European Conference on Information Systems (ECIS), Istanbul, Turkey, 2016.

3. Incorta https://www.incorta.com/

4. A. Dyck, R. Penners, and H. Lichter, “Towards definitions for release transformation and DevOps,” in 2015 IEEE/ACM 3rd International Workshop on Release Transformation, Florence, 2015, p. 3.

5. A. Palmer. (2015). From DevOps to DataOps. Available: https://www.tamr.com/from-devops-todataops-by-andy-palmer/, last accessed 2018/04/21.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Correction: A new framework based on features modeling and ensemble learning to predict query performance;PLOS ONE;2024-03-04

2. Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification;PLOS ONE;2023-07-28

3. Data Analytics and Techniques;ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY;2022-10-08