Abstract
As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare the data, to load the data into the database and to execute the desired queries. Many applications already avoid using traditional database systems, e.g., scientific data analysis and social networks, due to their complexity and the increased
data-to-query
time, i.e. the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this
data deluge
will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze.
In this demonstration, we will showcase a new philosophy for designing database systems called NoDB. NoDB aims at minimizing the data-to-query time, most prominently by removing the need to load data before launching queries. We will present our prototype implementation, PostgresRaw, built on top of PostgreSQL, which allows for efficient query execution over raw data files with zero initialization overhead. We will visually demonstrate how PostgresRaw incrementally and adaptively touches, parses, caches and indexes raw data files autonomously and exclusively as a side-effect of user queries.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. On‐demand JSON: A better way to parse documents?;Software: Practice and Experience;2024-01-18
2. GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example;Proceedings of the ACM on Management of Data;2023-06-13
3. Fast JSON parser using metaprogramming on GPU;2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA);2022-10-13
4. Query Complexity Based Optimal Processing of Raw Data;2022 IEEE 10th Region 10 Humanitarian Technology Conference (R10-HTC);2022-09-16
5. Workload Aware Cost-Based Partial Loading of Raw Data for Limited Storage Resources;Futuristic Trends in Networks and Computing Technologies;2022