Affiliation:
1. Tsinghua University
2. 4Paradigm Inc.
3. National Univ. of Singapore
4. Tsinghua University, Zhongguancun Laboratory
Abstract
As the use of online AI inference services rapidly expands in various applications (e.g., fraud detection in banking, product recommendation in e-commerce), real-time feature extraction (RTFE) systems have been developed to compute the requested features from incoming data tuples in ultra-low latency. Similar to relational databases, these RTFE procedures can be expressed using SQL-like languages. However, there is a lack of research on the workload characteristics and specialized benchmarks for RTFE, especially in comparison with existing database workloads and benchmarks (e.g., concurrent transactions in TPC-C). In this paper, we study the RTFE workload characteristics using over one hundred real datasets from open repositories (e.g. Kaggle, Tianchi, UCI ML, KiltHub) and those from 4Paradigm. The study highlights the significant differences between RTFE workloads and existing database benchmarks in terms of application scenarios, operator distributions, and query structures. Based on these findings, we propose to develop a realtime feature extraction benchmark named FEBench based on the four important criteria for a domain-specific benchmark proposed by Jim Gray. FEBench consists of selected representative datasets, query templates, and an online request simulator. We use FEBench to evaluate the effectiveness of feature extraction systems including OpenMLDB and Flink and find that each system exhibits distinct advantages and limitations in terms of overall latency, tail latency, and concurrency performance.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference60 articles.
1. https://archive.ics.uci.edu/ml/index.php. Last accessed on 2023-2. https://archive.ics.uci.edu/ml/index.php. Last accessed on 2023-2.
2. https://github.com/4paradigm/openmldb. Last accessed on 2023-2. https://github.com/4paradigm/openmldb. Last accessed on 2023-2.
3. https://github.com/akopytov/sysbench. Last accessed on 2023-2. https://github.com/akopytov/sysbench. Last accessed on 2023-2.
4. https://github.com/alibaba/feathub. Last accessed on 2023-2. https://github.com/alibaba/feathub. Last accessed on 2023-2.
5. https://github.com/feathr-ai/feathr. Last accessed on 2023-2. https://github.com/feathr-ai/feathr. Last accessed on 2023-2.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献