An Event-Driven Serverless ETL Pipeline on AWS-Reference-Cited by-同舟云学术

An Event-Driven Serverless ETL Pipeline on AWS

Published:2020-12-28 Issue:1 Volume:11 Page:191
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Pogiatzis Antreas^ORCID,Samakovitis Georgios^ORCID

Abstract

This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout varying data volumes, event frequencies, and processing power in order to measure: (i) the consistency of pipeline executions; (ii) reliability on data delivery; (iii) maximum payload size per pipeline; and, (iv) economic scalability (cost of chargeable tasks). We run 92 parameterised experiments on a simple AWS architecture, thus avoiding any AWS-enhanced platform features, in order to allow for unbiased assessment of our model’s performance. Our results indicate that our reference architecture can achieve time-consistent data processing of event payloads of more than 100 MB, with a throughput of 750 KB/s across four event frequencies. It is also observed that, although the utilisation of an SQS queue for data transfer enables easy concurrency control and data slicing, it becomes a bottleneck on large sized event payloads. Finally, we develop and discuss a candidate pricing model for our reference architecture usage.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/1/191/pdf

Reference38 articles.

1. Kafka: A distributed messaging system for log processing;Kreps;Proc. NetDB,2011

2. Apache Flink: Stateful Computations over Data Streamshttps://flink.apache.org/

3. Apache Flumehttps://flume.apache.org/

4. Apache Airflowhttps://airflow.apache.org/

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Embedding automated function performance benchmarking, profiling and resource usage categorization in function as a service DevOps pipelines;Future Generation Computer Systems;2024-11

2. Data pipeline approaches in serverless computing: a taxonomy, review, and research trends;Journal of Big Data;2024-06-11

3. Metadata-Driven Cloud-Agnostic Data Integration Framework;2024 47th MIPRO ICT and Electronics Convention (MIPRO);2024-05-20

4. Pattern-based serverless data processing pipelines for Function-as-a-Service orchestration systems;Future Generation Computer Systems;2024-05

5. Unraveling the Fabric of Serverless Computing;Advances in Systems Analysis, Software Engineering, and High Performance Computing;2024-04-05