Performance prediction of data streams on high-performance architecture-Reference-Cited by-同舟云学术

Performance prediction of data streams on high-performance architecture

Published:2019-01-07 Issue:1 Volume:9 Page:
ISSN:2192-1962
Container-title:Human-centric Computing and Information Sciences
language:en
Short-container-title:Hum. Cent. Comput. Inf. Sci.

Author:

Gautam Bhaskar^ORCID,Basava Annappa

Abstract

Abstract Worldwide sensor streams are expanding continuously with unbounded velocity in volume, and for this acceleration, there is an adaptation of large stream data processing system from the homogeneous to rack-scale architecture which makes serious concern in the domain of workload optimization, scheduling, and resource management algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defined domain for dynamic data stream metrics including a self-driven model which tries to fit these metrics using ridge regularization regression algorithm. Another significant contribution lies in fully-automated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of five domain-specific topologies. To assess the proposed methodologies, we forcefully ingest tuple skewness among the benchmarking topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%.

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science

Link

https://link.springer.com/content/pdf/10.1186/s13673-018-0163-4.pdf

Reference32 articles.

1. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. pp 147–156

2. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38

3. Akidau T, Balikov A, Bekiroğlu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044

4. Apache heron git repository. https://github.com/apache/incubator-heron. Accessed 11 Apr 2018

5. Chun B-G, Condie T, Chen Y, Cho B, Chung A, Curino C, Douglas C, Interlandi M, Jeon B, Jeong JS, Lee G, Lee Y, Majestro T, Malkhi D, Matusevych S, Myers B, Mykhailova M, Narayanamurthy S, Noor J, Ramakrishnan R, Rao S, Sears R, Sezgin B, Um T, Wang J, Weimer M, Yang Y (2017) Apache reef: retainable evaluator execution framework. ACM Trans Comput Syst. 35(2):5

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hybrid Algorithm for Resource Aware Predictive Scheduling: A case-study to Human Activity Recognition;2022 International Conference on Knowledge Engineering and Communication Systems (ICKES);2022-12-28

2. Simulation Studies of Elastic Optical Networks Nodes with Multicast Connections;HUM-CENT COMPUT INFO;2022

3. Performance Prediction Method for Stream Computing Platform Based on Time Series;IEEE Access;2021

4. Prediction Framework Integration into ERP Systems;2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS);2020-10-15

5. LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network;Information Sciences;2020-10