Affiliation:
1. Microsoft Research
2. University of Pittsburgh and Microsoft Research
Abstract
The growing popularity of the JSON format has fueled increased interest in loading and processing JSON data within analytical data processing systems. However, in many applications, JSON parsing dominates performance and cost. In this paper, we present a new JSON parser called Mison that is particularly tailored to this class of applications, by pushing down both projection and filter operators of analytical queries into the parser. To achieve these features, we propose to deviate from the traditional approach of building parsers using finite state machines (FSMs). Instead, we follow a two-level approach that enables the parser to jump directly to the correct position of a queried field without having to perform expensive tokenizing steps to find the field. At the upper level, Mison speculatively predicts the logical locations of queried fields based on previously seen patterns in a dataset. At the lower level, Mison builds structural indices on JSON data to map logical locations to physical locations. Unlike all existing FSM-based parsers, building structural indices converts control flow into data flow, thereby largely eliminating inherently unpredictable branches in the program and exploiting the parallelism available in modern processors. We experimentally evaluate Mison using representative real-world JSON datasets and the TPC-H benchmark, and show that Mison produces significant performance benefits over the best existing JSON parsers; in some cases, the performance improvement is over one order of magnitude.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
58 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework;Proceedings of the VLDB Endowment;2024-07
2. SEREIA: document store exploration through keywords;Knowledge and Information Systems;2024-06-10
3. ABACUS: ASIP-Based Avro Schema-Customizable Parser Acceleration on FPGAs;2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS);2024-04-03
4. On‐demand JSON: A better way to parse documents?;Software: Practice and Experience;2024-01-18
5. SPEAR-JSON: Selective Parsing of JSON to Enable Accelerated Stream Processing on FPGAs;2023 33rd International Conference on Field-Programmable Logic and Applications (FPL);2023-09-04