Accelerating raw data analysis with the ACCORDA software and hardware architecture-Reference-Cited by-同舟云学术

Accelerating raw data analysis with the ACCORDA software and hardware architecture

Published:2019-07 Issue:11 Volume:12 Page:1568-1582
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Fang Yuanwei¹,Zou Chen¹,Chien Andrew A.¹

Affiliation:

1. University of Chicago

Abstract

The data science revolution and growing popularity of data lakes make efficient processing of raw data increasingly important. To address this, we propose the ACCelerated Operators for Raw Data Analysis (ACCORDA) architecture. By extending the operator interface (subtype with encoding) and employing a uniform runtime worker model, ACCORDA integrates data transformation acceleration seamlessly, enabling a new class of encoding optimizations and robust high-performance raw data processing. Together, these key features preserve the software system architecture, empowering state-of-art heuristic optimizations to drive flexible data encoding for performance. ACCORDA derives performance from its software architecture, but depends critically on the acceleration of the Unstructured Data Processor (UDP) that is integrated into the memory-hierarchy, and accelerates data transformation tasks by 16x-21x (parsing, decompression) to as much as 160x (deserialization) compared to an x86 core. We evaluate ACCORDA using TPC-H queries on tabular data formats, exercising raw data properties such as parsing and data conversion. The ACCORDA system achieves 2.9x-13.2x speedups when compared to SparkSQL, reducing raw data processing overhead to a geomean of 1.2x (20%). In doing so, ACCORDA robustly matches or outperforms prior systems that depend on caching loaded data, while computing on raw, unloaded data. This performance benefit is robust across format complexity, query predicates, and selectivity (data statistics). ACCORDA's encoding-extended operator interface unlocks aggressive encoding-oriented optimizations that deliver 80% average performance increase over the 7 affected TPC-H queries.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3342263.3342634

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of intelligent manufacturing monitoring system for internet of things based on encryption technology and intrusion detection technology;Thermal Science and Engineering Progress;2024-09

2. CXL and the Return of Scale-Up Database Engines;Proceedings of the VLDB Endowment;2024-06

3. Data Flow Architectures for Data Processing on Modern Hardware;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Data Processing with FPGAs on Modern Architectures;Companion of the 2023 International Conference on Management of Data;2023-06-04

5. Data Transformation Acceleration using Deterministic Finite-State Transducers;2022 IEEE International Conference on Big Data (Big Data);2022-12-17